XPS TX-Queue Polarization & Wire-Cadence Slippage Playbook
Why this exists
Execution stacks can look healthy on classic dashboards (median decision latency, low drop rate, acceptable CPU headroom) while still bleeding p95/p99 implementation shortfall.
A frequent blind spot is TX-path queue polarization:
- CPU→TX queue mappings (XPS) drift away from thread/IRQ reality,
- a subset of TX rings becomes chronically hot,
qdisc -> driver -> NICdequeue cadence turns bursty,- cancel/replace/order-send timing loses microstructure phase alignment,
- queue-priority outcomes decay in tails.
This is an infra-originated slippage tax that is often mislabeled as "alpha decay" or "random venue noise."
Core failure mode
- XPS maps too many active senders onto a small set of TX queues.
- Hot TX queues absorb bursty enqueue pressure; cold queues idle.
- Driver/NIC completion locality diverges from application locality, increasing lock/cache contention.
- Wire-time spacing becomes uneven (packet bunching + short droughts).
- Order/cancel cadence phase-shifts versus true order-book replenishment cadence.
- Passive queue-capture probability falls; corrective aggression rises.
Result: tail slippage inflation with deceptively stable medians.
Slippage decomposition with TX-polarization term
For parent order (i):
[ IS_i = C_{impact} + C_{timing} + C_{routing} + C_{tx-pol} ]
Where:
[ C_{tx-pol} = C_{wire-jitter} + C_{completion-drift} + C_{queue-miss} ]
- (C_{wire-jitter}): submit→wire timing variance from TX ring hot spots
- (C_{completion-drift}): ACK/completion timing distortion from poor locality/lock contention
- (C_{queue-miss}): adverse queue-priority outcomes after cadence mismatch
Production feature set
1) TX queue / kernel features
- XPS maps per queue:
/sys/class/net/<dev>/queues/tx-<n>/xps_cpus/sys/class/net/<dev>/queues/tx-<n>/xps_rxqs
- per-queue TX packet/byte counters (
ethtool -S <dev>) - per-CPU NET_TX softirq load (
/proc/softirqs) - TX completion IRQ distribution (
/proc/interrupts) - qdisc backlog/dequeue/requeue stats (
tc -s qdisc show dev <dev>) - BQL pressure hints (
/sys/class/net/<dev>/queues/tx-<n>/byte_queue_limits/*, where supported)
2) execution-timing features
- decision→send syscall latency quantiles
- send syscall→NIC timestamp/wire proxy quantiles
- cancel/replace submit spacing CV and burst scores
- completion/ACK delay conditioned on TX-queue-hotness regime
3) outcome features
- passive fill ratio in
BALANCEDvsPOLARIZEDwindows - short-horizon markout ladder (10ms/100ms/1s/5s)
- incremental IS by urgency bucket under equal market state
Practical metrics (new)
TQCI (TX Queue Concentration Index) [ TQCI = \frac{\max_q \lambda^{tx}_q}{\frac{1}{Q}\sum_q \lambda^{tx}_q} - 1 ] where (\lambda^{tx}_q) is per-queue TX packet rate.
XMD (XPS Map Drift) distance between configured CPU→queue map and observed sender/IRQ affinity reality.
WCV95 (Wire Cadence Variability p95) p95 coefficient-of-variation of inter-send/inter-wire spacing in rolling windows.
CDI (Completion Drift Index) divergence between enqueue-time and completion-time locality/timing distributions.
RPU-TX (Realized Polarization Uplift, TX) matched-window tail-IS uplift attributable to TX polarization.
Track by host, NIC/driver, kernel version, XPS profile, qdisc profile, and strategy cohort.
Identification strategy (causal, not just correlation)
Use a matched-window design:
- Match on spread, volatility, participation, urgency, and session segment.
- Compare high-TQCI/XMD windows vs low-TQCI/XMD windows within same host class.
- Add host and strategy fixed effects with interactions (
TQCI × urgency,XMD × volatility). - Run controlled canaries by rebalancing XPS maps (CPU and/or RXQ-based) while holding strategy logic constant.
If tail IS improves after map rebalance and cadence metrics normalize, the uplift is infra-causal.
Regime controller
State A: TX_BALANCED
- low TQCI/XMD/WCV95
- normal pacing and placement policy
State B: TX_DRIFT
- moderate concentration + rising cadence variance
- reduce cancel churn, modestly smooth child cadence
State C: TX_POLARIZED
- sustained hot rings + completion drift
- tighter burst caps, shorter passive horizon, stronger queue-risk penalties
State D: TX_CONTAIN
- persistent polarization + deadline stress
- reroute urgent flow to cleaner hosts/queues; prioritize certainty over queue capture
Use hysteresis + minimum dwell times to avoid policy flapping.
Mitigation ladder (ops + model)
- Audit and flatten XPS mapping
- rebalance
xps_cpus/xps_rxqsso active sender sets are not over-collapsed.
- rebalance
- Align completion locality
- verify TX completion IRQ affinity and app thread pinning coherence.
- Control qdisc pacing behavior
- validate
fq(or chosen qdisc) settings (quantum,maxrate, pacing mode) against burst envelope.
- validate
- Validate BQL behavior under burst load
- detect queue overfill/underfill oscillation and retune driver/stack knobs where possible.
- Elevate TX-polarization features into live policy
- downshift tactical aggressiveness when TQCI/XMD/WCV95 breach guardrails.
- Recalibrate after kernel/driver/qdisc changes
- infra upgrades invalidate old coefficients and thresholds.
Failure drills (must run)
- Synthetic TX-map skew drill
- intentionally collapse many active CPUs into few TX queues in staging.
- Completion-affinity drift drill
- perturb IRQ affinity and validate CDI detection + containment response.
- Burst replay drill
- replay high-burst sessions and verify regime transitions suppress tail IS.
- Rollback drill
- prove deterministic return to baseline XPS/qdisc profile and stable cadence.
Anti-patterns
- Treating XPS as one-time bring-up config
- Monitoring only aggregate TX throughput, ignoring per-queue concentration
- Using average send latency while ignoring inter-send cadence tails
- Tuning strategy logic without TX queue observability
Bottom line
RX-path balance is only half the story.
If TX queue polarization is left unmodeled, you get hidden cadence distortion that quietly taxes queue priority and markouts.
Make TX concentration and cadence first-class slippage features (TQCI/XMD/WCV95/CDI), and you convert a "mysterious tail" into an observable, controllable execution risk budget.
References
- Linux networking scaling guide (RPS/RFS/XPS, sysfs maps): https://docs.kernel.org/networking/scaling.html
- SMP IRQ affinity documentation: https://docs.kernel.org/core-api/irq/irq-affinity.html
ethtoolmanual (per-queue stats and queue config visibility): https://man7.org/linux/man-pages/man8/ethtool.8.htmltc-fqmanual (pacing/qdisc behavior): https://man7.org/linux/man-pages/man8/tc-fq.8.html- LWN: XPS background (
xps-mp/xps) https://lwn.net/Articles/409862/ and https://lwn.net/Articles/412062/ - BQL background and queue-latency intuition: https://www.coverfire.com/articles/queueing-in-the-linux-network-stack/