Transparent Hugepage Compaction Shock Slippage Playbook

Why this exists

Execution stacks can look perfectly healthy at CPU%, NIC, and app-level p50 latency, while still leaking tail slippage.

One overlooked source: Linux THP compaction behavior (direct reclaim/compaction on allocation paths, plus background khugepaged collapse pressure) creating short scheduler stalls and bursty dispatch cadence.

Those micro-stalls can desynchronize child-order timing and queue priority, especially in thin/fast books.

Core failure mode

When memory is fragmented and THP policy is too aggressive for low-latency workloads:

allocation path hits compaction/reclaim work,
event loop pauses briefly (or repeatedly),
child orders bunch after the pause,
queue-priority age is reset on clustered replaces,
toxicity rises as fills happen late in micro-regime transitions.

Result: tail implementation shortfall rises without obvious infra alarms.

Slippage decomposition with memory-compaction terms

For parent order (i):

[ IS_i = C_{delay} + C_{impact} + C_{miss} + C_{compaction-shock} ]

Where:

[ C_{compaction-shock} = C_{stall} + C_{burst-bunch} + C_{queue-reset} ]

Stall: execution-loop pause while memory path is contended
Burst-bunch: multiple child actions released together after pause
Queue-reset: replace/cancel clustering that destroys queue age

Feature set (production-ready)

1) Kernel/MM pressure features

thp_enabled_mode (always|madvise|never)
thp_defrag_mode (always|defer|defer+madvise|madvise|never)
/proc/vmstat: compact_stall, compact_fail, compact_success
/proc/vmstat: thp_fault_alloc, thp_fault_fallback, thp_collapse_alloc
pgscan_kswapd*, pgscan_direct* deltas

2) Runtime stall features

event-loop gap p95/p99 (µs)
scheduler delay p95/p99 (runnable -> running)
allocator latency tails for critical paths
GC/pause tags (to avoid confounding with language runtime pauses)

3) Dispatch microstructure features

child inter-send gap distribution
burstiness index (Fano/overdispersion of child sends)
replace cluster size per 100ms window
ACK dispersion before/after detected stall windows

4) Outcome features

passive fill ratio by latency bucket
markout ladders (10ms, 100ms, 1s, 5s)
completion deficit vs deadline
branch labels: smooth, stall_then_catchup, stall_then_panic_cross

Model architecture

Use a baseline + overlay structure.

Baseline slippage model
- existing desk model (impact/fill/toxicity/deadline)
Compaction-shock overlay
- predicts incremental uplift during MM stress windows:
  - delta_is_mean
  - delta_is_q95

Final estimate:

[ \hat{IS}{final} = \hat{IS}{baseline} + \Delta\hat{IS}_{mm-shock} ]

Train overlay with episode labeling around compaction-stall bursts and matched non-burst controls (same symbol/session regime) to separate MM effects from market volatility.

Regime controller

State A: `MM_CLEAN`

low compaction activity, stable dispatch gaps
normal tactic mix

State B: `MM_WATCH`

rising compaction stalls or THP fallback pressure
reduce replace churn, tighten child-size cap

State C: `MM_SHOCK`

confirmed stall+bunch signature
avoid fragile passive tactics, switch to smoother pacing template

State D: `SAFE_MM_INTEGRITY`

repeated bursts with completion risk
conservative fallback policy + symbol-level kill-switch availability

Use hysteresis + minimum dwell times to avoid control flapping.

Desk metrics

CSI (Compaction Stall Intensity): weighted compact_stall + stall-lat tail score
TFR (THP Fallback Ratio): thp_fault_fallback / (thp_fault_alloc + thp_fault_fallback)
DBI (Dispatch Burst Index): overdispersion of child-send counts per fixed window
QRL (Queue Reset Load): replace/cancel clusters per minute
MSU (Memory Shock Uplift): realized IS minus baseline IS during MM-shock windows

Track by host + strategy + symbol-liquidity bucket.

Practical mitigation ladder

Observe first
- instrument vmstat + dispatch gaps + slippage branch labels
Policy hardening
- prefer madvise over global always for THP in latency-critical services
- avoid aggressive defrag in critical execution paths
Memory layout discipline
- isolate latency-critical processes from heavy memory churn workloads
- pin/control background tasks that trigger compaction pressure
Execution fallback rails
- when MM_SHOCK, cap child-size jumps and replace frequency
- throttle panic catch-up to prevent queue-reset cascades

Failure drills (must run)

Synthetic memory-fragmentation drill
- verify MM_WATCH triggers before tail slippage explosion
Compaction-burst replay drill
- confirm overlay uplift captures q95 deterioration
Fallback-policy drill
- ensure SAFE_MM_INTEGRITY action path is one-command reversible
Confounder drill
- separate MM shock from GC/NIC spikes in incident tagging

Anti-patterns

Treating THP as globally “on/off good/bad” with no workload segmentation
Monitoring only CPU utilization while ignoring runnable delay/event-gap tails
Letting catch-up logic fire unbounded after a short stall
Explaining all tail slippage as “market noise” when host-level burst signatures are visible

Bottom line

THP can improve throughput/TLB behavior, but in execution systems the wrong compaction profile can quietly inject latency bursts that your router converts into queue-priority loss and tail-cost blowups.

Model memory-compaction shock as a first-class slippage component, and wire explicit regime controls before infra jitter becomes basis-point leakage.

Transparent Hugepage Compaction Shock Slippage Playbook

Transparent Hugepage Compaction Shock Slippage Playbook

Why this exists

Core failure mode

Slippage decomposition with memory-compaction terms

Feature set (production-ready)

1) Kernel/MM pressure features

2) Runtime stall features

3) Dispatch microstructure features

4) Outcome features

Model architecture

Regime controller

State A: MM_CLEAN

State B: MM_WATCH

State C: MM_SHOCK

State D: SAFE_MM_INTEGRITY

Desk metrics

Practical mitigation ladder

Failure drills (must run)

Anti-patterns

Bottom line

State A: `MM_CLEAN`

State B: `MM_WATCH`

State C: `MM_SHOCK`

State D: `SAFE_MM_INTEGRITY`