SMT Sibling Contention Dispatch-Jitter Slippage Playbook

Why this exists

Execution hosts can look idle enough on average while still leaking p95/p99 implementation shortfall.

A frequent hidden source is SMT (Hyper-Threading) sibling contention: a latency-critical thread shares one physical core with another runnable thread (or noisy IRQ/ksoftirq work), causing bursty dispatch delays.

These delays are often too small for coarse dashboards but large enough to degrade queue quality in fast books.

Core failure mode

When execution-critical threads share physical cores with active siblings:

run-queue competition adds variable scheduling delay,
shared core resources (frontend/backend/cache/TLB bandwidth) create execution-time jitter,
decision and cancel/replace loops become phase-noisy,
child orders arrive in uneven bursts,
queue-age quality decays,
late-cycle urgency rises and crossing cost convexifies.

Result: tail slippage rises even when mean CPU utilization looks acceptable.

Slippage decomposition with SMT term

For parent order (i):

[ IS_i = C_{delay} + C_{impact} + C_{miss} + C_{smt} ]

Where:

[ C_{smt} = C_{runqueue} + C_{shared-core-jitter} + C_{queue-decay} ]

Runqueue cost: scheduling wait added by sibling activity
Shared-core jitter cost: variable compute/dispatch latency from resource contention
Queue decay cost: stale quotes and reset-heavy retries caused by timing instability

Feature set (production-ready)

1) Host scheduling / CPU-topology features

physical-core vs logical-core pin map for execution threads
sibling runnable ratio (critical thread sibling busy-time %)
per-core runqueue depth quantiles
context-switch and migration rate on critical cores
IRQ/softirq load overlapping critical sibling

2) Execution-path timing features

decision-to-send latency quantiles (p50/p95/p99)
inter-dispatch gap variance and burst index
cancel-to-replace turnaround drift
timing phase error vs intended schedule grid

3) Outcome features

passive fill ratio by sibling-load bucket
markout ladder (10ms / 100ms / 1s / 5s)
completion deficit vs schedule under same liquidity regime
branch labels: isolated-core, mild-contention, contention-burst, deadline-chase

Model architecture

Use baseline + SMT-overlay design:

Baseline slippage model
- spread/impact/fill/deadline stack
SMT contention overlay
- predicts incremental uplift:
  - delta_is_mean
  - delta_is_q95

Final estimate:

[ \hat{IS}{final} = \hat{IS}{baseline} + \Delta\hat{IS}_{smt} ]

Train on matched market windows (symbol/session/volatility/liquidity bucket) with different sibling-load states to isolate host-topology effects from market confounders.

Regime controller

State A: `CORE_ISOLATED`

critical thread has low sibling pressure
normal execution policy

State B: `SIBLING_WATCH`

sibling runnable ratio rising, timing tails widening
reduce replace churn, smooth child pacing

State C: `CONTENTION_STRESS`

sustained sibling pressure + bursty dispatch
cap burst size, enforce minimum spacing, avoid fragile queue-chasing

State D: `SAFE_ISOLATION_MODE`

repeated stress + deadline pressure
route urgent flow only to isolated cores/hosts, conservative completion policy

Use hysteresis and minimum dwell times to prevent policy flapping.

Desk metrics

SCI (Sibling Contention Index): pressure from SMT sibling activity
RQS (RunQueue Stress): scheduler backlog severity on critical cores
DBI (Dispatch Burst Index): uneven child-order release intensity
QRL (Queue Reliability Loss): passive-fill quality degradation under contention
SUL (SMT Uplift Loss): realized IS - baseline IS in contention regimes

Track by host pool, core topology profile, symbol-liquidity bucket, and session segment.

Mitigation ladder

Critical-thread core isolation
- pin latency-critical paths away from busy siblings where possible
IRQ/softirq hygiene
- keep noisy interrupt paths off critical sibling pairs
Execution containment under watch/stress
- bounded catch-up pacing, no blind backlog flush
Topology-aware routing
- send urgency-sensitive parents only through validated low-contention hosts
Continuous recalibration
- re-fit SMT uplift after kernel, affinity, or fleet-profile changes

Failure drills (must run)

Sibling-load injection drill
- replay known contention patterns and verify early SIBLING_WATCH detection
Burst-containment drill
- confirm bounded recovery outperforms panic flush on q95 IS
Confounder drill
- separate SMT effects from NIC/network/venue latency spikes
Isolation fallback drill
- verify rapid migration to safe host/core pools under stress

Anti-patterns

Trusting average CPU% as latency-health truth
Co-locating critical execution threads with unbounded sibling workload
Ignoring topology/affinity drift after host maintenance
Aggressive retry loops that amplify contention-caused timing bursts

Bottom line

SMT is not inherently harmful, but unmanaged sibling contention can become a hidden slippage tax.

If you do not model host-topology-induced timing distortion, queue-quality erosion will keep leaking basis points in tail windows.

SMT Sibling Contention Dispatch-Jitter Slippage Playbook

SMT Sibling Contention Dispatch-Jitter Slippage Playbook

Why this exists

Core failure mode

Slippage decomposition with SMT term

Feature set (production-ready)

1) Host scheduling / CPU-topology features

2) Execution-path timing features

3) Outcome features

Model architecture

Regime controller

State A: CORE_ISOLATED

State B: SIBLING_WATCH

State C: CONTENTION_STRESS

State D: SAFE_ISOLATION_MODE

Desk metrics

Mitigation ladder

Failure drills (must run)

Anti-patterns

Bottom line

State A: `CORE_ISOLATED`

State B: `SIBLING_WATCH`

State C: `CONTENTION_STRESS`

State D: `SAFE_ISOLATION_MODE`