Linux NIC Queue Steering Playbook (RSS, IRQ Affinity, RPS/RFS, XPS)

2026-03-17 · software

Linux NIC Queue Steering Playbook (RSS, IRQ Affinity, RPS/RFS, XPS)

Date: 2026-03-17
Category: knowledge

Why this matters

On multi-core Linux hosts, packet latency problems are often not bandwidth problems. They are CPU-placement problems:

Good queue steering reduces p99 latency and improves throughput consistency without touching application code.


1) Quick mental model

Think in layers:

  1. RSS (hardware) chooses RX queue using packet hash.
  2. IRQ affinity decides which CPU handles that RX queue interrupt.
  3. RPS (software) can re-steer protocol processing to another CPU.
  4. RFS adds application locality (prefer CPU where consuming thread runs).
  5. XPS chooses TX queue based on CPU to reduce TX-side contention.

Rule of thumb:


2) Fast decision matrix

A) NIC has enough RX/TX queues for your cores

B) NIC has fewer RX queues than useful cores

C) Single very hot flow dominating one core

D) Tail latency spikes during traffic bursts


3) Baseline discovery (10 minutes)

# Queue/channel capability
ethtool -l eth0

# Current channel count
ethtool -l eth0

# Driver + firmware context
ethtool -i eth0

# Per-queue and protocol stats (driver-dependent)
ethtool -S eth0

# Interrupt distribution
cat /proc/interrupts | grep -iE 'eth0|mlx|ixg|ena|bnx|virtio'

# Softirq pressure by CPU
watch -n1 'cat /proc/softirqs | egrep "NET_RX|NET_TX"'

Also capture nproc, NUMA layout (lscpu, numactl -H), and app CPU pinning policy before changes.


4) Step-by-step tuning sequence

4.1 Set reasonable queue/channel count (RSS substrate)

# Example: set combined queues (RX/TX pair count)
sudo ethtool -L eth0 combined 8

Use physical-core-aware values first (not “max everything”). Too many queues can increase overhead and management complexity.

4.2 Align IRQ affinity with CPU/NUMA plan

For each NIC queue IRQ in /proc/interrupts, set smp_affinity/smp_affinity_list to spread load across target cores (prefer local NUMA domain).

If irqbalance is enabled, either:

4.3 Add RPS when hardware queue fanout is insufficient

RPS is per RX queue:

# Example: allow CPUs 0-7 for rx-0 (bitmap value is example only)
echo ff | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus

Notes:

4.4 Enable RFS for application locality

RFS needs both global and per-queue settings:

# Global flow table entries (rounded internally to power of two)
echo 32768 | sudo tee /proc/sys/net/core/rps_sock_flow_entries

# Per-RX-queue flow entries (example for 8 queues)
for q in /sys/class/net/eth0/queues/rx-*/rps_flow_cnt; do
  echo 4096 | sudo tee "$q"
done

Total per-queue flow entries should roughly align with global sizing. Tune based on active connection cardinality, not total historical connections.

4.5 Configure XPS on TX queues

# Example: map TX queue 0 to CPUs 0-1, TX queue 1 to CPUs 2-3, etc.
echo 3   | sudo tee /sys/class/net/eth0/queues/tx-0/xps_cpus
echo c   | sudo tee /sys/class/net/eth0/queues/tx-1/xps_cpus
echo 30  | sudo tee /sys/class/net/eth0/queues/tx-2/xps_cpus
echo c0  | sudo tee /sys/class/net/eth0/queues/tx-3/xps_cpus

Keep mapping simple and consistent with app/worker CPU placement.


5) Optional advanced controls

RPS Flow Limit (protect small flows under RX CPU saturation)

# Enable on selected CPUs (bitmap)
echo ff | sudo tee /proc/sys/net/core/flow_limit_cpu_bitmap

# Queue depth guardrail often tuned with this
sudo sysctl -w net.core.netdev_max_backlog=4096

Use only when overload behavior justifies it; measure drop patterns carefully.

Coalescing (NIC interrupt moderation)

ethtool -c eth0
# then cautiously tune with ethtool -C if needed

Coalescing can lower CPU cost but may add latency; optimize for your SLO, not raw pps.


6) Observability checklist

Track before/after at minimum:

Success signal:


7) Common mistakes

  1. Turning on every feature at once
    You lose attribution and can’t tell what helped.

  2. Queue-count cargo culting
    More queues is not always better.

  3. Ignoring NUMA locality
    Cross-node steering can erase gains.

  4. Using RPS where RSS already maps well
    Extra IPIs, little benefit.

  5. No persistence plan
    sysfs/procfs tweaks disappear after reboot unless automated.


8) Practical rollout template

  1. Baseline metrics and IRQ/softirq snapshots.
  2. Tune queue count + IRQ affinity only.
  3. Re-measure.
  4. Add RPS selectively where imbalance remains.
  5. Add RFS only if locality still poor.
  6. Add XPS for TX contention cases.
  7. Persist config (systemd unit / tuned profile / provisioning script).

Treat steering as a control loop, not a one-time tweak.


Closing

Linux network performance on modern servers is mostly a placement problem. If you stage tuning in this order—RSS/IRQ → RPS/RFS → XPS—you usually get better tail behavior with fewer surprises than random “sysctl bundle” tuning.