Automatic NUMA Balancing Migration-Shock Slippage Playbook

Why this exists

In low-latency execution stacks, we often blame market volatility for tail slippage spikes.

But some spikes are self-inflicted by host-level memory locality churn:

Linux automatic NUMA balancing periodically unmaps pages,
accesses trigger NUMA hinting faults,
pages and/or tasks are migrated,
dispatch threads experience bursty latency and cache/TLB disruption,
queue priority decays while strategy logic still believes it is “on-time”.

This playbook treats that churn as a first-class slippage driver.

Core failure mode

Automatic NUMA balancing is page-fault driven and adaptive. That is useful for generic throughput workloads, but it can hurt deterministic execution when scanner/migration activity aligns with decision bursts.

The practical path to slippage:

Scanner marks regions for hinting faults
Fault bursts raise per-thread service-time variance
Page migration copies memory (overhead-heavy step)
Dispatch cadence becomes uneven (micro-clustering)
Cancel/replace timing drifts, queue age resets, adverse selection rises

Result: q95/q99 implementation shortfall rises even when p50 decision latency looks acceptable.

Slippage decomposition with NUMA terms

For parent order (i):

[ IS_i = C_{delay} + C_{impact} + C_{miss} + C_{numa} ]

where

[ C_{numa} = C_{hint-fault} + C_{migration-copy} + C_{dispatch-jitter} + C_{queue-decay} ]

Interpretation:

Hint-fault term: interrupt/fault handling overhead during active decisions
Migration-copy term: page-copy cost from misplaced memory correction
Dispatch-jitter term: irregular child-order release cadence
Queue-decay term: higher cancel/replace and late-arrival penalty

Observability blueprint (production-safe)

1) Kernel and NUMA control plane

cat /proc/sys/kernel/numa_balancing
Scan-rate knobs:
- numa_balancing_scan_delay_ms
- numa_balancing_scan_period_min_ms
- numa_balancing_scan_period_max_ms
- numa_balancing_scan_size_mb

2) NUMA activity counters (`/proc/vmstat`)

Track deltas per 1s/5s bucket:

numa_pte_updates
numa_huge_pte_updates
numa_hint_faults
numa_hint_faults_local
numa_pages_migrated

3) Execution-path joins

Join NUMA deltas with execution telemetry on a common clock:

decision→wire latency p50/p95/p99
child dispatch inter-arrival CV
cancel/replace burst density
passive fill ratio by latency bucket
short-horizon markout ladder

Desk metrics

RHF (Remote Hint Fault share)
(RHF = 1 - \frac{\Delta numa_hint_faults_local}{\Delta numa_hint_faults + \epsilon})
MCR (Migration Churn Rate)
(MCR = \frac{\Delta numa_pages_migrated}{\Delta t})
NSI (NUMA Scan Intensity)
proxy from scan-size and scan-period settings + hint-fault velocity
DCJ (Dispatch Cadence Jitter)
p99 child-gap / p50 child-gap
NUS (NUMA Uplift to Slippage)
realized IS minus baseline IS during matched windows with elevated RHF/MCR

Segment all five by host, strategy, symbol-liquidity bucket, and session phase.

Regime controller

State A: `NUMA_STABLE`

low RHF, low MCR, tight DCJ
normal tactics

State B: `NUMA_WATCH`

RHF rising and hint-fault velocity inflecting
soften cancel/replace aggressiveness
cap child burst size

State C: `NUMA_SHOCK`

sustained high MCR + degraded DCJ + queue decay signals
switch to smoother pacing template
avoid fragile queue-chasing logic

State D: `SAFE_NUMA_CONSERVATIVE`

repeated shock episodes with deadline risk
prioritize completion reliability over micro-priority games
stricter participation and retry ceilings

Use hysteresis and minimum dwell to prevent flip-flop.

Mitigation ladder

Placement first, balancing second
- Pin critical execution threads and memory policy where feasible.
- If workload is already statically NUMA-tuned, keep auto-balancing off for that path.
If balancing must stay enabled, slow the scanner for critical hosts
- Increase scan delay / reduce scan aggressiveness within tested guardrails.
- Validate no throughput cliff for non-latency-critical services.
Isolate migration-heavy components
- Separate research/backfill batch jobs from live execution NUMA domains.
Model-aware execution adaptation
- Feed RHF/MCR/DCJ into slippage overlay model and tactic gates in real time.

Validation protocol

A/B host policy test
- Compare baseline vs tuned NUMA settings using same symbols and session buckets.
Counterfactual uplift estimation
- Matched windows: same spread/volatility/participation, different RHF+MCR regimes.
Tail KPI acceptance gates
- Promote only if q95/q99 IS and deadline-miss rate improve without adverse completion drift.
Rollback criteria
- If completion deficit or market-impact term worsens beyond threshold, revert immediately.

Anti-patterns

Blaming all tail slippage on market regime without host-level attribution
Watching only CPU% while ignoring hint-fault and migration counters
Turning off auto-balancing globally without checking static-placement hygiene
Tuning scan knobs without matching execution outcomes (queue age, markout, IS tails)

Bottom line

Automatic NUMA balancing is neither “always good” nor “always bad.”

For latency-sensitive execution, its hint-fault/migration dynamics can create hidden queue-priority tax. Treat NUMA activity as a modeled slippage factor, not background noise.

References

Linux kernel sysctl docs (numa_balancing, memory-tiering mode): https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html
Linux kernel v5.9 sysctl docs (detailed scan knobs): https://www.kernel.org/doc/html/v5.9/admin-guide/sysctl/kernel.html
Red Hat RHEL 7 virtualization tuning guide (automatic NUMA balancing behavior): https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-auto_numa_balancing
openSUSE tuning guide (NUMA balancing steps, vmstat counters, overhead notes): https://doc.opensuse.org/documentation/leap/tuning/html/book-tuning/cha-tuning-numactl.html
proc_vmstat(5) field reference: https://www.man7.org/linux/man-pages/man5/proc_vmstat.5.html

Automatic NUMA Balancing Migration-Shock Slippage Playbook

Automatic NUMA Balancing Migration-Shock Slippage Playbook

Why this exists

Core failure mode

Slippage decomposition with NUMA terms

Observability blueprint (production-safe)

1) Kernel and NUMA control plane

2) NUMA activity counters (/proc/vmstat)

3) Execution-path joins

Desk metrics

Regime controller

State A: NUMA_STABLE

State B: NUMA_WATCH

State C: NUMA_SHOCK

State D: SAFE_NUMA_CONSERVATIVE

Mitigation ladder

Validation protocol

Anti-patterns

Bottom line

References

2) NUMA activity counters (`/proc/vmstat`)

State A: `NUMA_STABLE`

State B: `NUMA_WATCH`

State C: `NUMA_SHOCK`

State D: `SAFE_NUMA_CONSERVATIVE`