TLB Shootdown IPI-Storm Slippage Playbook
Why this exists
Execution systems can show stable network telemetry while still suffering sudden latency-tail blowups.
One under-modeled cause is TLB shootdown storms: frequent page-table changes force cross-core IPIs (inter-processor interrupts), stalling critical threads at the worst moments.
If strategy, risk, and gateway threads share NUMA nodes/cores with memory-churn-heavy processes, these shootdowns can create bursty dispatch delays that look like "random slippage" unless you model them explicitly.
Core failure mode
TLB shootdowns occur when memory mapping metadata changes (unmap/remap, page migration, aggressive allocator behavior, THP split/defrag paths). The kernel sends IPIs so other cores invalidate stale TLB entries.
During stressed windows:
- IPI rate spikes,
- on-core execution is repeatedly interrupted,
- p99 run-queue delay and scheduler latency widen,
- child-order dispatch cadence bunches,
- queue-priority decay and catch-up aggression increase.
Result: arrival-to-send drift rises exactly when microstructure is least forgiving.
Slippage decomposition with shootdown term
For parent order (i):
[ IS_i = C_{delay} + C_{impact} + C_{miss} + C_{tlb} ]
Where:
[ C_{tlb} = C_{ipi-preempt} + C_{dispatch-jitter} + C_{burst-recovery} ]
- (C_{ipi-preempt}): direct CPU interruption cost from shootdown IPIs
- (C_{dispatch-jitter}): timing variance of decision→send pipeline
- (C_{burst-recovery}): post-stall bunching that worsens market impact
Feature set (production-ready)
1) Kernel/CPU pressure features
Collect per-host, per-core, and per-NUMA metrics:
- TLB shootdown IPI rate (per second)
- IPI burst length and burst frequency
- scheduler latency (p50/p95/p99)
- context-switch and involuntary preemption rate
- run-queue depth and CPU steal-like interference signals
- NUMA page migration events and auto-balancing activity
2) Memory-churn context features
- page-fault rate (minor/major split)
- allocator churn proxies (high-frequency mmap/munmap behavior)
- THP split/collapse event rates
- compaction/reclaim activity indicators
- memory pressure/PSI (if available)
3) Execution-path timing features
- decision→risk-check latency
- risk-check→wire-send latency
- child inter-arrival jitter index
- cancel/replace burst ratio after latency spikes
- stale-quote attempt ratio
4) Microstructure outcome features
- fill ratio by dispatch-latency bucket
- short-horizon markout by latency-stress state
- queue-age-at-fill distribution shift
- missed-liquidity incidence near target participation bands
Labeling scheme for supervised overlay
Create host-time regime labels:
CLEAN: low IPI/shootdown pressure, stable dispatchWATCH: rising IPI slope with mild p99 wideningSTORM: sustained bursty shootdown activity and dispatch distortionSAFE_DEGRADED: prolonged storm with elevated miss/deadline risk
Use hysteresis (entry/exit thresholds) + minimum dwell time to avoid state flapping.
Modeling architecture
Use a two-layer design:
Baseline slippage model
- your existing impact/fill/deadline model conditioned on market state
Kernel-interference uplift model
- predicts incremental uplift:
delta_is_meandelta_is_q95delta_miss_prob
- predicts incremental uplift:
Final estimate:
[ \hat{IS}{final} = \hat{IS}{baseline} + \Delta\hat{IS}_{tlb} ]
Train with matched controls (same symbol/session/liquidity/volatility bins) so host-side interference uplift is disentangled from market regime changes.
Online controller policy
State CLEAN
- normal passive/active mix
- standard participation schedule
State WATCH
- reduce cancel/replace churn
- widen minimum inter-child spacing slightly
- cap microburst retries
State STORM
- disable fragile queue-chasing templates
- switch to smoother, lower-variance dispatch profile
- tighten stale-signal gating
State SAFE_DEGRADED
- prioritize completion certainty over queue finesse
- lower target participation ceiling
- activate failover to healthier host pool when available
Add cooldown before returning to aggressive tactics.
Desk metrics to monitor
- SBI (Shootdown Burst Index): weighted burstiness of shootdown IPIs
- DVR (Dispatch Variance Ratio): stressed/clean latency variance ratio
- QDI (Queue Decay Index): queue-age deterioration under host stress
- CBR (Catch-up Burst Ratio): share of children emitted in post-stall bunches
- HUL (Host Uplift Loss): realized IS minus baseline IS in stress windows
Slice by host, symbol-liquidity bucket, strategy type, and session segment.
Mitigation ladder (practical)
Isolation first
- pin critical execution threads to shielded cores
- isolate noisy background services from execution NUMA domains
Memory-path hygiene
- reduce high-frequency map/unmap behavior in latency-critical processes
- avoid allocator patterns that trigger mapping churn
- tune/contain auto-NUMA migration where harmful
Execution dampers
- bounded recovery pacing after detected stalls
- prevent immediate backlog flush that destroys queue priority
Failover discipline
- route high-urgency flow away from storming hosts
- keep host health score in pre-trade routing logic
Validation drills
Historical replay drill
- replay known IPI-storm windows and verify early
WATCHdetection
- replay known IPI-storm windows and verify early
Counterfactual dispatch drill
- compare naive catch-up vs bounded recovery pacing
Confounder drill
- separate kernel-interference effects from exchange/network-wide events
Failover drill
- validate stateful reroute to healthy hosts without tactic thrash
Anti-patterns
- Treating CPU utilization averages as sufficient observability
- Ignoring kernel-memory events because network RTT looks normal
- Letting delayed child intents flush immediately after stalls
- Mixing latency-critical threads with memory-churn workloads on same cores
Bottom line
TLB shootdown IPI storms are a real execution tax: not loud enough to look like outages, but large enough to erode basis points through timing quality decay.
If you do not model host-interference uplift explicitly, your slippage controller will misattribute losses to market noise and overfit the wrong levers.