Runqueue Migration Cache-Cold Slippage Playbook

2026-03-18 · finance

Runqueue Migration Cache-Cold Slippage Playbook

Date: 2026-03-18
Category: research

Why this exists

Execution engines often optimize network and venue latency but under-model a quieter infra tax: scheduler-driven cross-core task migration.

When critical threads bounce between CPU cores, they lose cache warmth (L1/L2/LLC locality), pay branch-predictor relearn cost, and stretch decision→dispatch latency right where queue priority matters most.

The host can still show healthy average CPU utilization while p95/p99 execution cost worsens.


Core failure mode

A latency-sensitive strategy thread starts on core A, then is migrated to core B due to load balancing, IRQ pressure, cgroup placement, or competing tasks.

That migration can trigger:

  1. Cold-start compute window (cache/predictor warm-up)
  2. Decision loop stall inflation (longer compute-to-send gap)
  3. Dispatch bunching (late micro-bursts after catch-up)
  4. Queue-age loss (slower amend/cancel keeps stale intent live)
  5. Tail slippage expansion

In fast books, the hidden damage is not mean delay but timing-shape distortion of child flow.


Slippage decomposition with migration term

For parent order (i):

[ IS_i = C_{spread} + C_{impact} + C_{opportunity} + C_{migration} ]

Where:

[ C_{migration} = C_{cold} + C_{burst} + C_{queue_decay} + C_{phase_error} ]


Production observability (minimum)

1) Scheduler / placement telemetry

2) Hardware locality hints

3) Execution-path telemetry

4) Outcome telemetry


Desk metrics to track

Use rolling windows (e.g., 1m / 5m):

  1. TMR (Thread Migration Rate)

[ TMR = \frac{\Delta migrations}{\Delta time} ]

  1. RQS (Runqueue Stretch)

[ RQS = p95(runqueue_delay) - p50(runqueue_delay) ]

  1. CLI (Cache Locality Impairment)

Normalized LLC-miss uplift during high migration windows vs matched baseline.

  1. DGI (Dispatch Gap Inflation)

[ DGI = \frac{p95(\Delta dispatch_gap)}{median(\Delta dispatch_gap)} ]

  1. QDI (Queue Decay Impact)

Passive fill-ratio drop conditioned on migration spikes vs matched calm windows.


Modeling approach

Use baseline + migration-uplift overlay.

Stage A: baseline cost model

Standard spread/impact/fill model with market-state features.

Stage B: migration uplift model

Predict incremental:

with features:

Final estimate:

[ \hat{IS}{final} = \hat{IS}{baseline} + \Delta\hat{IS}_{migration} ]

Use matched windows (same symbol/session/volatility regime) to separate infra uplift from market turbulence.


Controller state machine

State 1 — PINNED_STABLE

Action: standard policy.

State 2 — MIGRATION_PRESSURE

Action: reduce replace churn, smooth child cadence, increase passive selectivity.

State 3 — CACHE_COLD_DRIFT

Action: cap burst size, enforce min inter-send spacing, reduce tactic oscillation.

State 4 — SAFE_AFFINITY_MODE

Action: affinity-hardened execution profile (pin critical workers, tighten concurrency, conservative completion mode).

Use hysteresis and minimum dwell times to avoid policy flapping.


Mitigation ladder

  1. Pin truly critical execution threads

    • CPU affinity/cpuset for decision + dispatch hot paths
    • keep housekeeping/background workers outside the same core island
  2. Tune migration sensitivity, not just CPU usage

    • review scheduler balancing behavior and migration-cost heuristics
    • avoid over-reactive balancing in latency-critical pools
  3. Isolate interrupt pressure

    • align IRQ affinity away from core(s) hosting execution-critical threads
    • avoid hidden preemption that triggers downstream migrations
  4. Bound catch-up behavior

    • never repay decision lag with uncontrolled child-order bursts
    • use capped repayment slope
  5. Topology-aware host classes

    • separate latency-critical execution nodes from noisy multi-tenant workloads

Validation drills

  1. Synthetic migration stress

    • induce controlled scheduler churn and verify uplift detector response.
  2. Affinity A/B canary

    • compare pinned vs floating thread placement on matched symbols/time slices.
  3. Burst-policy A/B

    • naive catch-up vs capped repayment under migration stress.
  4. Confounder split

    • prove migration uplift remains after controlling for spread/volatility regime shifts.

Anti-patterns


Practical rollout checklist


Bottom line

Cross-core task migration is not just an OS detail; in low-latency execution, it is a queue-priority and tail-cost control variable.

If you ignore scheduler locality dynamics, you may keep blaming “market noise” for slippage that is largely self-inflicted by cache-cold timing drift.


References