Linux PREEMPT_RT + Threaded IRQ Playbook (Practical Tail-Latency Control)

2026-03-18 · software

Linux PREEMPT_RT + Threaded IRQ Playbook (Practical Tail-Latency Control)

Date: 2026-03-18
Category: knowledge

Why this matters

If you already tuned CPU isolation, IRQ affinity, and NIC queues but still see stubborn p99/p999 spikes, kernel scheduling behavior itself can be the bottleneck.

PREEMPT_RT helps by making Linux much more preemptible and moving most interrupt work into schedulable threads. That gives you better control over who runs first when latency pressure hits.

It is not a free lunch:

Treat RT as a control-plane upgrade for latency, not a blanket speed boost.


1) Mental model: from “best effort latency” to “priority-governed latency”

Without RT:

With PREEMPT_RT:

Practical effect: jitter sources become more visible and tunable.


2) What changes under PREEMPT_RT (operator view)

2.1 Locking behavior changes

Implication: better bounded latency, but different contention behavior than non-RT.

2.2 Interrupt handling becomes more schedulable

Implication: IRQ priority and CPU placement become first-class tuning knobs.

2.3 Scheduling policy matters more than before

Implication: you need explicit priority policy, not ad-hoc chrt tweaks.


3) When to use RT (and when not to)

Good candidates:

Use caution / reconsider:

Rule of thumb: if tail spikes are expensive enough to justify extra ops complexity, RT is worth piloting.


4) Baseline before any RT rollout

Collect these on current (non-RT) canary hosts first:

  1. Request latency p50/p95/p99/p999
  2. CPU run-queue pressure and migration stats
  3. IRQ rate + per-CPU distribution
  4. Softirq pressure and ksoftirqd activity
  5. Worst-case scheduler latency (cyclictest or equivalent)

Quick checks:

uname -a
grep -E 'PREEMPT|HZ=' /boot/config-$(uname -r)
cat /proc/softirqs
cat /proc/interrupts

If you skip baseline, you won’t know whether RT improved tails or just moved bottlenecks.


5) Rollout strategy (safe sequence)

Phase A: Non-RT hygiene first

Before switching kernel type, ensure these are already sane:

RT cannot compensate for chaotic placement.

Phase B: Single-host RT canary

Install RT kernel on one production-like host.

Verify kernel mode:

uname -r
grep CONFIG_PREEMPT_RT /boot/config-$(uname -r)

Track same workload side-by-side vs non-RT control host.

Phase C: IRQ thread policy

Inspect IRQ threads:

ps -eLo pid,cls,rtprio,pri,psr,comm | grep -E 'irq/|softirq|ksoftirqd'

Then tune progressively:

Avoid “everything high priority.” That usually creates hidden starvation.

Phase D: Application + IRQ priority contract

Define explicit ordering, for example:

The exact numbers matter less than consistent hierarchy.


6) Observability that actually catches RT failures

Latency outcomes

Scheduler/IRQ health

Starvation indicators

Useful tools:


7) Common footguns

  1. No baseline control host
    You can’t distinguish real gain from placebo.

  2. Over-prioritizing everything
    RT priorities are relative scarcity, not badges.

  3. Ignoring IRQ affinity after enabling RT
    Threaded IRQs still need deliberate CPU placement.

  4. Mixing latency-critical and batch jobs on same RT cores
    Determinism collapses quickly.

  5. Judging success by average latency only
    RT is a tail-latency lever; p50 may barely move.


8) Minimal canary checklist (one-week loop)

  1. Pick one service and one host class.
  2. Capture 24h non-RT baseline.
  3. Switch one canary host to RT kernel.
  4. Keep affinity identical to baseline first.
  5. Tune IRQ thread priority/affinity in small steps.
  6. Compare p99/p999 + starvation signals daily.
  7. Promote only if tail gains persist through peak windows.
  8. Keep rollback path simple (kernel fallback + config revert).

References