IOMMU TLB Flush-Storm DMA-Remap Slippage Playbook

Why this exists

Execution stacks can pass ordinary CPU/network health checks and still leak p95/p99 implementation shortfall.

One under-modeled source is IOMMU translation pressure: NIC DMA mappings/unmappings trigger IOTLB invalidations, and under bursty traffic or allocator churn this can become a flush storm.

When that happens, packet/descriptor handling latency becomes uneven, and order-path timing starts paying an invisible tax.

Core failure mode

Under high packet turnover and frequent DMA map updates:

IOTLB invalidations spike,
DMA completion latency becomes bursty,
RX/TX service cadence loses smoothness,
market-data ingest and order-send timing dephase,
cancel/replace loops bunch,
queue quality decays,
late-cycle urgency raises convex crossing cost.

Result: tail slippage rises even if average RTT/CPU still looks "normal."

Slippage decomposition with IOMMU term

For parent order (i):

[ IS_i = C_{delay} + C_{impact} + C_{miss} + C_{iommu} ]

Where:

[ C_{iommu} = C_{dma-jitter} + C_{service-burst} + C_{queue-decay} ]

DMA jitter cost: variable NIC DMA completion/descriptor service timing
Service burst cost: uneven packet processing cadence (microbursting at app layer)
Queue decay cost: stale reaction windows and reset-heavy retries

Feature set (production-ready)

1) Host / DMA-path features

IOTLB flush rate and burst quantiles
DMA map/unmap rate by queue
NIC ring occupancy oscillation amplitude
RX/TX NAPI poll duration variance
per-NUMA memory locality for NIC buffers

2) Execution timing features

market-data ingress gap variance
decision-to-send latency quantiles (p50/p95/p99)
cancel-to-ack and replace-to-ack drift
child-order inter-dispatch burst index
scheduler phase error vs intended dispatch grid

3) Outcome features

passive fill ratio by flush-pressure bucket
short-horizon markout ladder (10ms / 100ms / 1s / 5s)
completion deficit under matched liquidity regime
branch labels: map-stable, pressure-watch, flush-storm, deadline-chase

Model architecture

Use baseline + remap-overlay design:

Baseline slippage model
- spread/impact/fill/deadline stack
IOMMU pressure overlay
- predicts incremental uplift:
  - delta_is_mean
  - delta_is_q95

Final estimate:

[ \hat{IS}{final} = \hat{IS}{baseline} + \Delta\hat{IS}_{iommu} ]

Train with matched market windows (symbol/session/volatility/liquidity bucket) across different remap-pressure states to isolate infra effects from market confounders.

Regime controller

State A: `MAP_STABLE`

low flush pressure, stable DMA cadence
normal execution policy

State B: `PRESSURE_WATCH`

flush bursts rising, timing tails widening
reduce unnecessary replace churn, smooth pacing

State C: `FLUSH_STORM`

sustained invalidation bursts + packet-service oscillation
cap burst size, increase minimum spacing, avoid fragile queue races

State D: `SAFE_DMA_CONTAIN`

repeated storm + deadline pressure
route urgency-sensitive flow through validated low-pressure host/queue paths, conservative completion policy

Use hysteresis + minimum dwell time to prevent policy flapping.

Desk metrics

DFI (DMA Flush Intensity): invalidation pressure score
DJS (DMA Jitter Spread): completion-time variability severity
PSO (Packet Service Oscillation): ingest/send cadence instability
QDL (Queue Decay Loss): passive quality degradation under remap pressure
IUL (IOMMU Uplift Loss): realized IS minus baseline IS in high-pressure regimes

Track by host pool, NIC model/driver, NUMA placement, symbol-liquidity bucket, and session segment.

Mitigation ladder

Mapping churn reduction
- prefer stable DMA mapping strategies and buffer lifecycle discipline
NUMA and queue locality hygiene
- align NIC queues, CPU affinity, and memory locality
Burst-containment execution policy
- bounded catch-up pacing over panic flushes
Topology-aware routing
- route urgent flow away from hosts/queues with rising DFI/PSO
Change-aware recalibration
- re-fit overlay after kernel/NIC-driver/IOMMU config updates

Failure drills (must run)

Flush-burst replay drill
- verify early transition to PRESSURE_WATCH
Storm containment drill
- confirm bounded recovery beats panic catch-up on q95 IS
Confounder separation drill
- distinguish remap-pressure effects from pure venue/network latency shocks
Fallback path drill
- validate safe reroute to low-pressure host/queue pools under stress

Anti-patterns

Treating average RTT as complete timing truth
Ignoring DMA map/unmap churn in low-latency hosts
Disabling IOMMU blindly without security/compliance review
Running retry-heavy execution logic that amplifies cadence oscillation

Bottom line

IOMMU is often viewed as a security/performance toggle, but in execution systems the real issue is translation-pressure dynamics.

If IOTLB flush storms are not modeled as a slippage factor, tail cost will keep leaking through “normal-looking” infra dashboards.

IOMMU TLB Flush-Storm DMA-Remap Slippage Playbook

IOMMU TLB Flush-Storm DMA-Remap Slippage Playbook

Why this exists

Core failure mode

Slippage decomposition with IOMMU term

Feature set (production-ready)

1) Host / DMA-path features

2) Execution timing features

3) Outcome features

Model architecture

Regime controller

State A: MAP_STABLE

State B: PRESSURE_WATCH

State C: FLUSH_STORM

State D: SAFE_DMA_CONTAIN

Desk metrics

Mitigation ladder

Failure drills (must run)

Anti-patterns

Bottom line

State A: `MAP_STABLE`

State B: `PRESSURE_WATCH`

State C: `FLUSH_STORM`

State D: `SAFE_DMA_CONTAIN`