RPS/RFS Steering Churn as a Hidden Slippage Driver (Practical Playbook)
Date: 2026-03-21
Category: research
Audience: low-latency execution teams running Linux multi-queue ingest/routing paths
Why this matters
Most slippage stacks model market microstructure and tactic behavior, but ignore a kernel-side timing distortion: receive steering churn from RPS/RFS.
- RPS (Receive Packet Steering) hashes packets to target CPUs for protocol processing.
- RFS (Receive Flow Steering) tries to move packet processing toward the CPU where the consuming app thread runs.
In stable conditions this can improve cache locality and throughput. In unstable conditions (thread migration, undersized flow tables, poor queue/CPU maps), steering churn introduces:
- remote backlog queueing + extra IPIs,
- per-flow CPU handoff delays,
- event-time distortion between market-data ingest and order-state/control feedback.
That distortion often appears in TCA as "random tail slippage" while the root cause is deterministic control-plane behavior.
Failure mechanism (flow-steering churn -> execution timing tax)
- Packet enters RX queue and receives RSS/RPS hash.
get_rps_cpu()selects a target CPU fromrps_cpus/ RFS flow tables.- Packet is enqueued on remote CPU backlog; remote CPU is kicked by IPI.
- Scheduler moves consumer thread (or flow-table entry collides/ages), so desired CPU changes.
- Flow steering updates lag outstanding packets; CPU ownership flips in bursts.
- Market-data and order-state timelines de-synchronize at the application boundary.
Result: causality jitter - trading logic reacts to slightly stale or phase-shifted state, causing poorer queue entry timing and late urgency catch-up.
Slippage decomposition with steering term
For parent order (i):
[ IS_i = C_{impact} + C_{timing} + C_{routing} + C_{steer} ]
Where:
[ C_{steer} = C_{ipi} + C_{backlog} + C_{churn} + C_{causal-drift} ]
- (C_{ipi}): cost of remote wakeups/CPU handoff overhead
- (C_{backlog}): queueing delay on target backlog under burst load
- (C_{churn}): repeated per-flow CPU migration overhead
- (C_{causal-drift}): decision errors from MD/ACK/control timeline skew
Operational metrics (new)
1) SMI - Steering Migration Intensity
Per-flow rate of target-CPU changes over short windows.
[ SMI = \frac{#(cpu_target\ changes)}{\text{flow-time}} ]
2) RBI95 - Remote Backlog Injection p95
p95 delay from NIC receive timestamp (or earliest software ingress stamp) to start of protocol processing on target CPU.
3) IWB - IPI Wakeup Burden
Share of RX-path processing episodes requiring remote IPI wakeups.
4) FCR - Flow-Collision Ratio
Estimated collision pressure in rps_sock_flow_entries / per-queue rps_flow_cnt tables (proxy via high SMI + table occupancy/flow volume).
5) CDT - Causality Drift Tax
Incremental short-horizon markout / tail IS during high-SMI+high-RBI regimes versus matched low-churn regimes.
What to log in production
Kernel / network layer
/sys/class/net/<dev>/queues/rx-*/rps_cpus/proc/sys/net/core/rps_sock_flow_entries/sys/class/net/<dev>/queues/rx-*/rps_flow_cnt/proc/sys/net/core/netdev_max_backlog/proc/sys/net/core/flow_limit_cpu_bitmap(if Flow Limit enabled)- IRQ affinity (
/proc/irq/*/smp_affinity*,/proc/interrupts) /proc/net/softnet_stat(drops/time-squeeze/backlog stress signals)- per-CPU ksoftirqd utilization and scheduler migration counters
Execution layer
- decision timestamp vs packet-ingest timestamp
- child-order emit jitter relative to intended cadence
- cancel/replace burst factor in high-SMI windows
- ACK/fill ordering anomalies around steering transitions
- markout/IS conditioned on steering regime
Identification strategy (causal)
- Match windows by spread, volatility, participation, and venue mix.
- Segment into
STEER_STABLEvsSTEER_CHURNusing SMI/RBI thresholds. - Estimate incremental tail cost (CDT) with host + symbol + session fixed effects.
- Run controlled canaries:
- stabilize IRQ affinity / queue maps,
- right-size
rps_sock_flow_entriesandrps_flow_cnt, - narrow
rps_cpusto NUMA-local sets, - reduce app-thread migration on critical handlers.
- Promote only if CDT drops without completion-rate degradation.
Regime state machine
STEER_STABLE
- low SMI, low RBI, low causal drift
- normal tactic policy
STEER_IMBALANCED
- rising backlog/IPI burden but limited flow flips
- rebalance queue/CPU maps, reduce burst aggressiveness
STEER_CHURN
- high SMI + backlog jitter + ordering drift
- apply anti-burst pacing and stricter urgency gates
STEER_SAFE_CONTAIN
- repeated p95 breaches under persistent churn
- shift to conservative completion-safe policy until steering normalizes
Use hysteresis + minimum dwell to avoid controller flapping.
Control ladder
- Fix topology first
- Align IRQ affinity,
rps_cpus, and app CPU pinning by NUMA/cache locality.
- Align IRQ affinity,
- Right-size flow tables
- Increase
rps_sock_flow_entries/rps_flow_cntwhen collision pressure is visible.
- Increase
- Avoid redundant steering layers
- If RSS already gives clean 1:1 queue/CPU mapping, aggressive RPS may add churn with little upside.
- Bound scheduler migration on critical consumers
- Thread movement can create avoidable RFS target churn.
- Promote steering-health features into live execution logic
- Treat SMI/RBI as first-class slippage features, not just infra telemetry.
Failure drills
- Flow-fanout stress drill
- replay high concurrent-flow load; verify SMI/FCR alarms.
- Queue-map swap drill
- test controlled IRQ/RPS remaps with rollback triggers.
- Migration stress drill
- intentionally perturb app pinning; validate CDT sensitivity.
- Tail-protection drill
- force transition to
STEER_SAFE_CONTAINon repeated p95 breach.
- force transition to
Common mistakes
- Enabling RPS/RFS globally without per-queue NUMA-aware mapping
- Tiny flow tables under high active-flow counts (collision-churn amplifier)
- Ignoring scheduler migration effects on RFS locality assumptions
- Treating remote-backlog delay as "normal network jitter"
- Optimizing median latency while p95/p99 causal drift worsens
Bottom line
RPS/RFS are not just throughput knobs - they are slippage-relevant timing controls.
When steering churn rises, execution clocks dephase from market clocks, and tail costs inflate. Model steering regime explicitly and attach controls to it; otherwise infra-side causality drift will keep leaking hidden basis points.
References
- Linux Kernel Docs: Scaling in the Linux Networking Stack (RSS/RPS/RFS/Accelerated RFS/XPS)
https://docs.kernel.org/networking/scaling.html - Linux Kernel Docs: NAPI
https://docs.kernel.org/networking/napi.html - Linux Kernel Docs: SMP IRQ affinity
https://docs.kernel.org/core-api/irq/irq-affinity.html ethtool(8)manual (-c/-C, queue/coalesce visibility)
https://man7.org/linux/man-pages/man8/ethtool.8.html- Red Hat Docs: Interrupt coalescence tuning (driver-dependent behavior and low-latency caveats)
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/network_troubleshooting_and_performance_tuning/tuning-interrupt-coalescence-settings