Linux Scheduler Policy Selection Playbook (SCHED_OTHER / FIFO / RR / DEADLINE)

2026-03-14 · software

Linux Scheduler Policy Selection Playbook (SCHED_OTHER / FIFO / RR / DEADLINE)

Why this matters

For latency-sensitive services, scheduler policy is one of the highest-leverage choices after architecture.

Get it right, and tail latency becomes predictable. Get it wrong, and one runaway thread can starve the box.

This playbook is for practical policy selection under production constraints.


Mental model: class precedence first

On Linux, scheduler classes are not equal peers.

In broad priority order:

  1. SCHED_DEADLINE (earliest-deadline-first + CBS bandwidth control)
  2. SCHED_FIFO / SCHED_RR (fixed-priority real-time)
  3. SCHED_OTHER / SCHED_BATCH / SCHED_IDLE (normal/fair classes)

Implication: admitting RT/DEADLINE threads changes system-level fairness. You are not only tuning one process—you are redefining who can run.


Start here: policy decision table

Use this as your default decision path.

If unsure, default to SCHED_OTHER and optimize CPU topology + contention first.


Policy profiles

1) SCHED_OTHER (default, safest baseline)

Best for:

Practical levers before touching RT:

Most “need RT” incidents are actually “need less contention/jitter” incidents.

2) SCHED_FIFO (strict fixed-priority, no timeslice)

Best for:

Risk profile:

Use only with explicit watchdogs and bounded critical work.

3) SCHED_RR (fixed-priority + quantum at same priority)

Best for:

Risk profile:

Choose RR when FIFO fairness problems appear among same-priority workers.

4) SCHED_DEADLINE (runtime/deadline/period contracts)

Best for:

Kernel constraints include:

This is the strongest model, but the easiest to misconfigure if workload math is weak.


Production guardrails (non-optional)

  1. Protect rescue capacity for non-RT work
    • Validate RT throttling settings (sched_rt_runtime_us, sched_rt_period_us) and policy intent.
  2. Isolate critical threads
    • Use cpuset/affinity to avoid accidental interference.
  3. Bound work per activation
    • RT/DEADLINE tasks must have strict upper bounds on per-cycle compute.
  4. Add watchdog + demotion path
    • If loop time exceeds threshold, alert and demote policy automatically.
  5. Canary first
    • Never roll RT policy globally in one step.

Minimal command toolbox

Inspect current policy/priority:

chrt -p <pid>

Set a process to FIFO with priority 50:

sudo chrt -f -p 50 <pid>

Launch with DEADLINE parameters (priority must be 0):

sudo chrt -d --sched-runtime 1000000 --sched-deadline 5000000 --sched-period 5000000 0 ./app

Interpret DEADLINE numbers as nanoseconds in userspace interface conventions.


cgroup v2 + scheduler: practical layering

Use scheduler policy and cgroup controls together:

Typical safe layering:

  1. Keep most services on SCHED_OTHER.
  2. Shape competition with cpu.weight and optional cpu.max.
  3. Reserve isolated CPUs for special low-latency workers.
  4. Introduce RT/DEADLINE only for the narrow critical path.

This avoids the common anti-pattern of solving every latency issue with RT priority inflation.


Validation checklist (before and after policy change)

If recoverability fails, rollout fails—regardless of benchmark gains.


Common failure patterns

  1. Priority inversion disguised as random jitter

    • Fix locking/resource ordering before raising priorities.
  2. RT policy applied to too many threads

    • Result: normal services starve; incident response gets harder.
  3. DEADLINE reservations copied from docs, not measured WCET

    • Admission passes in staging, misses explode in production.
  4. No rollback primitive

    • Real-time tuning without one-command rollback is operational debt.

Practical recommendation

For most production stacks:

  1. Win first with topology/affinity/cgroup tuning on SCHED_OTHER.
  2. Promote only a tiny critical set to SCHED_FIFO or SCHED_RR if strictly needed.
  3. Use SCHED_DEADLINE only when you can express real timing contracts and monitor them continuously.
  4. Treat scheduler policy as a reliability feature, not a benchmark trick.

References