MSI-X Vector Affinity Drift as a Hidden Slippage Driver (Practical Playbook)
Date: 2026-03-21
Category: research
Audience: low-latency execution teams running Linux multi-queue NIC paths
Why this matters
Most execution stacks assume packet-ingest latency is stationary once the host is “tuned.” In practice, MSI-X interrupt vector affinity can drift after events like:
- NIC driver reload/reset,
- firmware update or link flap,
- irqbalance policy changes,
- CPU hotplug / cgroup placement changes,
- automation scripts that re-apply only partial pinning.
When vector→CPU mapping drifts away from the intended topology, market-data and order-ACK paths pick up hidden timing taxes:
- more cross-NUMA memory traffic,
- extra cache misses and softirq wakeups,
- bursty p95/p99 handler latency,
- decision-time vs market-time phase drift.
That often shows up in TCA as “random tail slippage,” even though the root cause is operational and measurable.
Failure mechanism (affinity drift -> execution timing tax)
- RX/TX queues are created with MSI-X vectors and expected CPU locality.
- Affinity drifts (vector lands on non-target core/NUMA node).
- NAPI poll/softirq execution shifts away from execution-thread locality.
- Packet handoff and cache-coherency overhead rise under burst flow.
- Decision loop sees stale/phase-shifted market state at critical moments.
- Child-order timing degrades -> queue position loss and urgency catch-up.
Result: tail-heavy slippage with little change in median latency.
Slippage decomposition with affinity term
For parent order (i):
[ IS_i = C_{impact} + C_{timing} + C_{routing} + C_{affinity} ]
Where:
[ C_{affinity} = C_{numa-cross} + C_{cache-miss} + C_{softirq-jitter} + C_{causal-drift} ]
- (C_{numa-cross}): remote memory/queue access penalty
- (C_{cache-miss}): lower locality between NIC handling and consumer threads
- (C_{softirq-jitter}): bursty service-time variation in NAPI/softirq path
- (C_{causal-drift}): decision errors from ingest/ACK timeline skew
Operational metrics (new)
1) VAM - Vector Affinity Mismatch
Share of active vectors pinned outside intended CPU mask.
[ VAM = \frac{#(vectors\ not\ in\ target\ mask)}{#(active\ vectors)} ]
2) NRS - NUMA Remote Share
Fraction of packet-processing events executed on non-local NUMA node versus NIC-local target policy.
3) HJ95 - Handler Jitter p95
p95 delta between packet hardware/software timestamp and first userspace-consumable event timestamp.
4) SAD - Softirq Asymmetry Delta
Imbalance score across per-CPU softirq load in the target CPU set.
5) CDT - Causality Drift Tax
Incremental markout/IS during high-VAM regimes versus matched low-VAM windows.
What to log in production
Host/kernel layer
/proc/interrupts(per-vector IRQ distribution)/proc/irq/<irq>/smp_affinity_list(actual affinity)- NIC queue config (
ethtool -l,ethtool -xwhere supported) - NUMA topology (
lscpu -e,numactl --hardware) /proc/net/softnet_stat(drops, time_squeeze)- per-CPU softirq utilization and ksoftirqd runtime
Execution layer
- decision timestamp vs ingest timestamp
- ACK/fill latency conditional on IRQ vector cohort
- child-order send jitter around affinity-change events
- queue-entry quality proxy (fill delay, adverse markout)
- TCA by host+symbol+session with affinity regime labels
Identification strategy (causal)
- Build two regimes:
AFFINITY_ALIGNED(low VAM),AFFINITY_DRIFTED(high VAM).
- Match windows by spread, volatility, participation, symbol liquidity, venue mix.
- Estimate incremental tail cost (CDT) with host/session fixed effects.
- Run controlled canary:
- re-apply deterministic IRQ pinning,
- freeze execution-thread CPU placement,
- disable conflicting auto-tuners during test.
- Promote only if CDT and p99 handler jitter improve without completion-rate damage.
Regime state machine
AFFINITY_ALIGNED
- low VAM, stable HJ95
- normal execution policy
AFFINITY_WARN
- intermittent mismatch, rising jitter
- tighten burst pacing and monitor softirq imbalance
AFFINITY_DRIFTED
- persistent mismatch + p95/p99 inflation
- reduce aggression and prioritize queue-safe placements
SAFE_CONTAIN
- repeated threshold breaches after attempted remap
- switch to conservative schedule until topology is restored
Use hysteresis and minimum dwell time to avoid policy flapping.
Control ladder
- Declare target topology explicitly
- maintain versioned vector->CPU policy by host class.
- Pin execution threads coherently
- align app critical threads with intended IRQ/NAPI CPUs.
- Apply idempotent affinity reconciler
- periodic verifier/remediator for
/proc/irq/*/smp_affinity_listdrift.
- periodic verifier/remediator for
- Guard against automation conflicts
- coordinate irqbalance, orchestration agents, and boot scripts.
- Integrate host-regime into slippage model
- include VAM/NRS/HJ95 as live features for tactic gating.
Failure drills
- Driver-reset drift drill
- trigger controlled NIC reset; verify remap automation and alerting.
- CPU-set perturbation drill
- move execution workers; confirm locality alarms and rollback.
- Burst replay drill
- replay peak traffic with induced mismatch; validate CDT sensitivity.
- Containment drill
- force transition to
SAFE_CONTAINand confirm loss containment.
- force transition to
Common mistakes
- Assuming one-time IRQ tuning is permanent
- Measuring only median latency and ignoring p95/p99 handler jitter
- Pinning app threads without re-validating vector placement
- Letting multiple automation layers fight IRQ affinity
- Treating topology drift as “market noise” in TCA
Bottom line
MSI-X affinity drift is a microstructure-relevant infra risk, not just a systems hygiene issue.
If vector locality drifts, your decision clock drifts from market clock. That leak appears as tail slippage and weak queue entry quality. Treat affinity state as a first-class model feature and attach explicit remediation + containment controls.
References
- Linux kernel docs: SMP IRQ affinity
https://docs.kernel.org/core-api/irq/irq-affinity.html - Linux kernel docs: Networking scaling (RSS/RPS/RFS/XPS)
https://docs.kernel.org/networking/scaling.html - Linux kernel docs: NAPI
https://docs.kernel.org/networking/napi.html proc_interrupts(5)manual
https://man7.org/linux/man-pages/man5/proc_interrupts.5.htmlethtool(8)manual
https://man7.org/linux/man-pages/man8/ethtool.8.html