Linux NIC Queue Steering Playbook (RSS, IRQ Affinity, RPS/RFS, XPS)

Date: 2026-03-17
Category: knowledge

Why this matters

On multi-core Linux hosts, packet latency problems are often not bandwidth problems. They are CPU-placement problems:

hot IRQs pile onto a few cores,
app threads run on different cores than packet processing,
TX queue choice causes lock/cache contention.

Good queue steering reduces p99 latency and improves throughput consistency without touching application code.

1) Quick mental model

Think in layers:

RSS (hardware) chooses RX queue using packet hash.
IRQ affinity decides which CPU handles that RX queue interrupt.
RPS (software) can re-steer protocol processing to another CPU.
RFS adds application locality (prefer CPU where consuming thread runs).
XPS chooses TX queue based on CPU to reduce TX-side contention.

Rule of thumb:

Prefer RSS + sane IRQ affinity first.
Add RPS/RFS only when hardware queues are insufficient or locality is poor.
Add XPS for high TX contention / multi-queue NICs.

2) Fast decision matrix

A) NIC has enough RX/TX queues for your cores

Use RSS + IRQ affinity as primary path.
Keep RPS minimal or off.
Add XPS mapping.

B) NIC has fewer RX queues than useful cores

Keep RSS enabled.
Add RPS CPU masks per RX queue.
Consider RFS for app-locality gains.

C) Single very hot flow dominating one core

Re-check hashing/flow distribution first.
Consider RPS Flow Limit under overload.
Don’t expect miracles if traffic is truly one giant flow.

D) Tail latency spikes during traffic bursts

Inspect IRQ distribution + softirq pressure.
Rebalance affinities before increasing queue counts blindly.
Tune coalescing only after steering is sane.

3) Baseline discovery (10 minutes)

# Queue/channel capability
ethtool -l eth0

# Current channel count
ethtool -l eth0

# Driver + firmware context
ethtool -i eth0

# Per-queue and protocol stats (driver-dependent)
ethtool -S eth0

# Interrupt distribution
cat /proc/interrupts | grep -iE 'eth0|mlx|ixg|ena|bnx|virtio'

# Softirq pressure by CPU
watch -n1 'cat /proc/softirqs | egrep "NET_RX|NET_TX"'

Also capture nproc, NUMA layout (lscpu, numactl -H), and app CPU pinning policy before changes.

4) Step-by-step tuning sequence

4.1 Set reasonable queue/channel count (RSS substrate)

# Example: set combined queues (RX/TX pair count)
sudo ethtool -L eth0 combined 8

Use physical-core-aware values first (not “max everything”). Too many queues can increase overhead and management complexity.

4.2 Align IRQ affinity with CPU/NUMA plan

For each NIC queue IRQ in /proc/interrupts, set smp_affinity/smp_affinity_list to spread load across target cores (prefer local NUMA domain).

If irqbalance is enabled, either:

configure policy so it preserves your design, or
disable it and manage affinity explicitly.

4.3 Add RPS when hardware queue fanout is insufficient

RPS is per RX queue:

# Example: allow CPUs 0-7 for rx-0 (bitmap value is example only)
echo ff | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus

Notes:

rps_cpus=0 means disabled (default).
RPS can increase IPIs; use it when it solves a real imbalance/locality issue.

4.4 Enable RFS for application locality

RFS needs both global and per-queue settings:

# Global flow table entries (rounded internally to power of two)
echo 32768 | sudo tee /proc/sys/net/core/rps_sock_flow_entries

# Per-RX-queue flow entries (example for 8 queues)
for q in /sys/class/net/eth0/queues/rx-*/rps_flow_cnt; do
  echo 4096 | sudo tee "$q"
done

Total per-queue flow entries should roughly align with global sizing. Tune based on active connection cardinality, not total historical connections.

4.5 Configure XPS on TX queues

# Example: map TX queue 0 to CPUs 0-1, TX queue 1 to CPUs 2-3, etc.
echo 3   | sudo tee /sys/class/net/eth0/queues/tx-0/xps_cpus
echo c   | sudo tee /sys/class/net/eth0/queues/tx-1/xps_cpus
echo 30  | sudo tee /sys/class/net/eth0/queues/tx-2/xps_cpus
echo c0  | sudo tee /sys/class/net/eth0/queues/tx-3/xps_cpus

Keep mapping simple and consistent with app/worker CPU placement.

5) Optional advanced controls

RPS Flow Limit (protect small flows under RX CPU saturation)

# Enable on selected CPUs (bitmap)
echo ff | sudo tee /proc/sys/net/core/flow_limit_cpu_bitmap

# Queue depth guardrail often tuned with this
sudo sysctl -w net.core.netdev_max_backlog=4096

Use only when overload behavior justifies it; measure drop patterns carefully.

Coalescing (NIC interrupt moderation)

ethtool -c eth0
# then cautiously tune with ethtool -C if needed

Coalescing can lower CPU cost but may add latency; optimize for your SLO, not raw pps.

6) Observability checklist

Track before/after at minimum:

p95/p99 request latency (and timeout/retry rate)
per-CPU NET_RX / NET_TX softirq skew
/proc/interrupts queue IRQ balance
NIC drops/errors (ethtool -S)
context switches, runqueue pressure, CPU migrations

Success signal:

lower tail latency,
smoother CPU distribution,
no hidden increase in packet drops/retries.

7) Common mistakes

Turning on every feature at once
You lose attribution and can’t tell what helped.
Queue-count cargo culting
More queues is not always better.
Ignoring NUMA locality
Cross-node steering can erase gains.
Using RPS where RSS already maps well
Extra IPIs, little benefit.
No persistence plan
sysfs/procfs tweaks disappear after reboot unless automated.

8) Practical rollout template

Baseline metrics and IRQ/softirq snapshots.
Tune queue count + IRQ affinity only.
Re-measure.
Add RPS selectively where imbalance remains.
Add RFS only if locality still poor.
Add XPS for TX contention cases.
Persist config (systemd unit / tuned profile / provisioning script).

Treat steering as a control loop, not a one-time tweak.

Closing

Linux network performance on modern servers is mostly a placement problem. If you stage tuning in this order—RSS/IRQ → RPS/RFS → XPS—you usually get better tail behavior with fewer surprises than random “sysctl bundle” tuning.