Linux NIC Queue Steering Playbook (RSS, IRQ Affinity, RPS/RFS, XPS)
Date: 2026-03-17
Category: knowledge
Why this matters
On multi-core Linux hosts, packet latency problems are often not bandwidth problems. They are CPU-placement problems:
- hot IRQs pile onto a few cores,
- app threads run on different cores than packet processing,
- TX queue choice causes lock/cache contention.
Good queue steering reduces p99 latency and improves throughput consistency without touching application code.
1) Quick mental model
Think in layers:
- RSS (hardware) chooses RX queue using packet hash.
- IRQ affinity decides which CPU handles that RX queue interrupt.
- RPS (software) can re-steer protocol processing to another CPU.
- RFS adds application locality (prefer CPU where consuming thread runs).
- XPS chooses TX queue based on CPU to reduce TX-side contention.
Rule of thumb:
- Prefer RSS + sane IRQ affinity first.
- Add RPS/RFS only when hardware queues are insufficient or locality is poor.
- Add XPS for high TX contention / multi-queue NICs.
2) Fast decision matrix
A) NIC has enough RX/TX queues for your cores
- Use RSS + IRQ affinity as primary path.
- Keep RPS minimal or off.
- Add XPS mapping.
B) NIC has fewer RX queues than useful cores
- Keep RSS enabled.
- Add RPS CPU masks per RX queue.
- Consider RFS for app-locality gains.
C) Single very hot flow dominating one core
- Re-check hashing/flow distribution first.
- Consider RPS Flow Limit under overload.
- Don’t expect miracles if traffic is truly one giant flow.
D) Tail latency spikes during traffic bursts
- Inspect IRQ distribution + softirq pressure.
- Rebalance affinities before increasing queue counts blindly.
- Tune coalescing only after steering is sane.
3) Baseline discovery (10 minutes)
# Queue/channel capability
ethtool -l eth0
# Current channel count
ethtool -l eth0
# Driver + firmware context
ethtool -i eth0
# Per-queue and protocol stats (driver-dependent)
ethtool -S eth0
# Interrupt distribution
cat /proc/interrupts | grep -iE 'eth0|mlx|ixg|ena|bnx|virtio'
# Softirq pressure by CPU
watch -n1 'cat /proc/softirqs | egrep "NET_RX|NET_TX"'
Also capture nproc, NUMA layout (lscpu, numactl -H), and app CPU pinning policy before changes.
4) Step-by-step tuning sequence
4.1 Set reasonable queue/channel count (RSS substrate)
# Example: set combined queues (RX/TX pair count)
sudo ethtool -L eth0 combined 8
Use physical-core-aware values first (not “max everything”). Too many queues can increase overhead and management complexity.
4.2 Align IRQ affinity with CPU/NUMA plan
For each NIC queue IRQ in /proc/interrupts, set smp_affinity/smp_affinity_list to spread load across target cores (prefer local NUMA domain).
If irqbalance is enabled, either:
- configure policy so it preserves your design, or
- disable it and manage affinity explicitly.
4.3 Add RPS when hardware queue fanout is insufficient
RPS is per RX queue:
# Example: allow CPUs 0-7 for rx-0 (bitmap value is example only)
echo ff | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus
Notes:
rps_cpus=0means disabled (default).- RPS can increase IPIs; use it when it solves a real imbalance/locality issue.
4.4 Enable RFS for application locality
RFS needs both global and per-queue settings:
# Global flow table entries (rounded internally to power of two)
echo 32768 | sudo tee /proc/sys/net/core/rps_sock_flow_entries
# Per-RX-queue flow entries (example for 8 queues)
for q in /sys/class/net/eth0/queues/rx-*/rps_flow_cnt; do
echo 4096 | sudo tee "$q"
done
Total per-queue flow entries should roughly align with global sizing. Tune based on active connection cardinality, not total historical connections.
4.5 Configure XPS on TX queues
# Example: map TX queue 0 to CPUs 0-1, TX queue 1 to CPUs 2-3, etc.
echo 3 | sudo tee /sys/class/net/eth0/queues/tx-0/xps_cpus
echo c | sudo tee /sys/class/net/eth0/queues/tx-1/xps_cpus
echo 30 | sudo tee /sys/class/net/eth0/queues/tx-2/xps_cpus
echo c0 | sudo tee /sys/class/net/eth0/queues/tx-3/xps_cpus
Keep mapping simple and consistent with app/worker CPU placement.
5) Optional advanced controls
RPS Flow Limit (protect small flows under RX CPU saturation)
# Enable on selected CPUs (bitmap)
echo ff | sudo tee /proc/sys/net/core/flow_limit_cpu_bitmap
# Queue depth guardrail often tuned with this
sudo sysctl -w net.core.netdev_max_backlog=4096
Use only when overload behavior justifies it; measure drop patterns carefully.
Coalescing (NIC interrupt moderation)
ethtool -c eth0
# then cautiously tune with ethtool -C if needed
Coalescing can lower CPU cost but may add latency; optimize for your SLO, not raw pps.
6) Observability checklist
Track before/after at minimum:
- p95/p99 request latency (and timeout/retry rate)
- per-CPU
NET_RX/NET_TXsoftirq skew /proc/interruptsqueue IRQ balance- NIC drops/errors (
ethtool -S) - context switches, runqueue pressure, CPU migrations
Success signal:
- lower tail latency,
- smoother CPU distribution,
- no hidden increase in packet drops/retries.
7) Common mistakes
Turning on every feature at once
You lose attribution and can’t tell what helped.Queue-count cargo culting
More queues is not always better.Ignoring NUMA locality
Cross-node steering can erase gains.Using RPS where RSS already maps well
Extra IPIs, little benefit.No persistence plan
sysfs/procfs tweaks disappear after reboot unless automated.
8) Practical rollout template
- Baseline metrics and IRQ/softirq snapshots.
- Tune queue count + IRQ affinity only.
- Re-measure.
- Add RPS selectively where imbalance remains.
- Add RFS only if locality still poor.
- Add XPS for TX contention cases.
- Persist config (systemd unit / tuned profile / provisioning script).
Treat steering as a control loop, not a one-time tweak.
Closing
Linux network performance on modern servers is mostly a placement problem. If you stage tuning in this order—RSS/IRQ → RPS/RFS → XPS—you usually get better tail behavior with fewer surprises than random “sysctl bundle” tuning.