RPS/RFS Steering Churn as a Hidden Slippage Driver (Practical Playbook)

Date: 2026-03-21
Category: research
Audience: low-latency execution teams running Linux multi-queue ingest/routing paths

Why this matters

Most slippage stacks model market microstructure and tactic behavior, but ignore a kernel-side timing distortion: receive steering churn from RPS/RFS.

RPS (Receive Packet Steering) hashes packets to target CPUs for protocol processing.
RFS (Receive Flow Steering) tries to move packet processing toward the CPU where the consuming app thread runs.

In stable conditions this can improve cache locality and throughput. In unstable conditions (thread migration, undersized flow tables, poor queue/CPU maps), steering churn introduces:

remote backlog queueing + extra IPIs,
per-flow CPU handoff delays,
event-time distortion between market-data ingest and order-state/control feedback.

That distortion often appears in TCA as "random tail slippage" while the root cause is deterministic control-plane behavior.

Failure mechanism (flow-steering churn -> execution timing tax)

Packet enters RX queue and receives RSS/RPS hash.
get_rps_cpu() selects a target CPU from rps_cpus / RFS flow tables.
Packet is enqueued on remote CPU backlog; remote CPU is kicked by IPI.
Scheduler moves consumer thread (or flow-table entry collides/ages), so desired CPU changes.
Flow steering updates lag outstanding packets; CPU ownership flips in bursts.
Market-data and order-state timelines de-synchronize at the application boundary.

Result: causality jitter - trading logic reacts to slightly stale or phase-shifted state, causing poorer queue entry timing and late urgency catch-up.

Slippage decomposition with steering term

For parent order (i):

[ IS_i = C_{impact} + C_{timing} + C_{routing} + C_{steer} ]

Where:

[ C_{steer} = C_{ipi} + C_{backlog} + C_{churn} + C_{causal-drift} ]

(C_{ipi}): cost of remote wakeups/CPU handoff overhead
(C_{backlog}): queueing delay on target backlog under burst load
(C_{churn}): repeated per-flow CPU migration overhead
(C_{causal-drift}): decision errors from MD/ACK/control timeline skew

Operational metrics (new)

1) SMI - Steering Migration Intensity

Per-flow rate of target-CPU changes over short windows.

[ SMI = \frac{#(cpu_target\ changes)}{\text{flow-time}} ]

2) RBI95 - Remote Backlog Injection p95

p95 delay from NIC receive timestamp (or earliest software ingress stamp) to start of protocol processing on target CPU.

3) IWB - IPI Wakeup Burden

Share of RX-path processing episodes requiring remote IPI wakeups.

4) FCR - Flow-Collision Ratio

Estimated collision pressure in rps_sock_flow_entries / per-queue rps_flow_cnt tables (proxy via high SMI + table occupancy/flow volume).

5) CDT - Causality Drift Tax

Incremental short-horizon markout / tail IS during high-SMI+high-RBI regimes versus matched low-churn regimes.

What to log in production

Kernel / network layer

/sys/class/net/<dev>/queues/rx-*/rps_cpus
/proc/sys/net/core/rps_sock_flow_entries
/sys/class/net/<dev>/queues/rx-*/rps_flow_cnt
/proc/sys/net/core/netdev_max_backlog
/proc/sys/net/core/flow_limit_cpu_bitmap (if Flow Limit enabled)
IRQ affinity (/proc/irq/*/smp_affinity*, /proc/interrupts)
/proc/net/softnet_stat (drops/time-squeeze/backlog stress signals)
per-CPU ksoftirqd utilization and scheduler migration counters

Execution layer

decision timestamp vs packet-ingest timestamp
child-order emit jitter relative to intended cadence
cancel/replace burst factor in high-SMI windows
ACK/fill ordering anomalies around steering transitions
markout/IS conditioned on steering regime

Identification strategy (causal)

Match windows by spread, volatility, participation, and venue mix.
Segment into STEER_STABLE vs STEER_CHURN using SMI/RBI thresholds.
Estimate incremental tail cost (CDT) with host + symbol + session fixed effects.
Run controlled canaries:
- stabilize IRQ affinity / queue maps,
- right-size rps_sock_flow_entries and rps_flow_cnt,
- narrow rps_cpus to NUMA-local sets,
- reduce app-thread migration on critical handlers.
Promote only if CDT drops without completion-rate degradation.

Regime state machine

`STEER_STABLE`

low SMI, low RBI, low causal drift
normal tactic policy

`STEER_IMBALANCED`

rising backlog/IPI burden but limited flow flips
rebalance queue/CPU maps, reduce burst aggressiveness

`STEER_CHURN`

high SMI + backlog jitter + ordering drift
apply anti-burst pacing and stricter urgency gates

`STEER_SAFE_CONTAIN`

repeated p95 breaches under persistent churn
shift to conservative completion-safe policy until steering normalizes

Use hysteresis + minimum dwell to avoid controller flapping.

Control ladder

Fix topology first
- Align IRQ affinity, rps_cpus, and app CPU pinning by NUMA/cache locality.
Right-size flow tables
- Increase rps_sock_flow_entries / rps_flow_cnt when collision pressure is visible.
Avoid redundant steering layers
- If RSS already gives clean 1:1 queue/CPU mapping, aggressive RPS may add churn with little upside.
Bound scheduler migration on critical consumers
- Thread movement can create avoidable RFS target churn.
Promote steering-health features into live execution logic
- Treat SMI/RBI as first-class slippage features, not just infra telemetry.

Failure drills

Flow-fanout stress drill
- replay high concurrent-flow load; verify SMI/FCR alarms.
Queue-map swap drill
- test controlled IRQ/RPS remaps with rollback triggers.
Migration stress drill
- intentionally perturb app pinning; validate CDT sensitivity.
Tail-protection drill
- force transition to STEER_SAFE_CONTAIN on repeated p95 breach.

Common mistakes

Enabling RPS/RFS globally without per-queue NUMA-aware mapping
Tiny flow tables under high active-flow counts (collision-churn amplifier)
Ignoring scheduler migration effects on RFS locality assumptions
Treating remote-backlog delay as "normal network jitter"
Optimizing median latency while p95/p99 causal drift worsens

Bottom line

RPS/RFS are not just throughput knobs - they are slippage-relevant timing controls.

When steering churn rises, execution clocks dephase from market clocks, and tail costs inflate. Model steering regime explicitly and attach controls to it; otherwise infra-side causality drift will keep leaking hidden basis points.

References

Linux Kernel Docs: Scaling in the Linux Networking Stack (RSS/RPS/RFS/Accelerated RFS/XPS)
https://docs.kernel.org/networking/scaling.html
Linux Kernel Docs: NAPI
https://docs.kernel.org/networking/napi.html
Linux Kernel Docs: SMP IRQ affinity
https://docs.kernel.org/core-api/irq/irq-affinity.html
ethtool(8) manual (-c/-C, queue/coalesce visibility)
https://man7.org/linux/man-pages/man8/ethtool.8.html
Red Hat Docs: Interrupt coalescence tuning (driver-dependent behavior and low-latency caveats)
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/network_troubleshooting_and_performance_tuning/tuning-interrupt-coalescence-settings