Linux Scheduler Policy Selection Playbook (SCHED_OTHER / FIFO / RR / DEADLINE)

Why this matters

For latency-sensitive services, scheduler policy is one of the highest-leverage choices after architecture.

Get it right, and tail latency becomes predictable. Get it wrong, and one runaway thread can starve the box.

This playbook is for practical policy selection under production constraints.

Mental model: class precedence first

On Linux, scheduler classes are not equal peers.

In broad priority order:

SCHED_DEADLINE (earliest-deadline-first + CBS bandwidth control)
SCHED_FIFO / SCHED_RR (fixed-priority real-time)
SCHED_OTHER / SCHED_BATCH / SCHED_IDLE (normal/fair classes)

Implication: admitting RT/DEADLINE threads changes system-level fairness. You are not only tuning one process—you are redefining who can run.

Start here: policy decision table

Use this as your default decision path.

General backend services, mixed I/O, bursty request flow
- Start with SCHED_OTHER + affinity/isolation + cgroup cpu.weight/cpu.max
Control loops needing strict fixed-priority preemption
- Consider SCHED_FIFO (or SCHED_RR if equal-priority peers must time-slice)
Periodic/sporadic jobs with explicit runtime/deadline/period guarantees
- Consider SCHED_DEADLINE
Best-effort batch
- Keep on SCHED_OTHER/SCHED_BATCH; avoid RT unless proven necessary

If unsure, default to SCHED_OTHER and optimize CPU topology + contention first.

Policy profiles

1) SCHED_OTHER (default, safest baseline)

Best for:

multi-tenant services,
request/response systems with variable work,
systems where fairness and recovery are more important than strict preemption.

Practical levers before touching RT:

CPU affinity / NUMA locality,
IRQ pinning,
cgroup v2 cpu.weight and cpu.max,
lock contention reduction,
allocator and memory locality fixes.

Most “need RT” incidents are actually “need less contention/jitter” incidents.

2) SCHED_FIFO (strict fixed-priority, no timeslice)

Best for:

short critical sections that must preempt normal tasks immediately,
event loops where response bound matters more than fairness.

Risk profile:

no round-robin among same-priority tasks,
a badly behaved high-priority thread can monopolize CPU until block/yield/preempted by higher priority.

Use only with explicit watchdogs and bounded critical work.

3) SCHED_RR (fixed-priority + quantum at same priority)

Best for:

same as FIFO but with multiple equal-priority peers that should share CPU.

Risk profile:

still real-time class (can starve normal tasks if overused),
quantum behavior can add jitter if mis-sized.

Choose RR when FIFO fairness problems appear among same-priority workers.

4) SCHED_DEADLINE (runtime/deadline/period contracts)

Best for:

periodic/sporadic compute where timing contracts are explicit,
workloads that can be modeled as (runtime, deadline, period).

Kernel constraints include:

runtime <= deadline <= period,
admission control may reject unschedulable reservations (EBUSY),
typically requires privileged capability (e.g., CAP_SYS_NICE).

This is the strongest model, but the easiest to misconfigure if workload math is weak.

Production guardrails (non-optional)

Protect rescue capacity for non-RT work
- Validate RT throttling settings (sched_rt_runtime_us, sched_rt_period_us) and policy intent.
Isolate critical threads
- Use cpuset/affinity to avoid accidental interference.
Bound work per activation
- RT/DEADLINE tasks must have strict upper bounds on per-cycle compute.
Add watchdog + demotion path
- If loop time exceeds threshold, alert and demote policy automatically.
Canary first
- Never roll RT policy globally in one step.

Minimal command toolbox

Inspect current policy/priority:

chrt -p <pid>

Set a process to FIFO with priority 50:

sudo chrt -f -p 50 <pid>

Launch with DEADLINE parameters (priority must be 0):

sudo chrt -d --sched-runtime 1000000 --sched-deadline 5000000 --sched-period 5000000 0 ./app

Interpret DEADLINE numbers as nanoseconds in userspace interface conventions.

cgroup v2 + scheduler: practical layering

Use scheduler policy and cgroup controls together:

Policy defines who can preempt whom.
cgroup CPU controller defines how much bandwidth each subtree gets.

Typical safe layering:

Keep most services on SCHED_OTHER.
Shape competition with cpu.weight and optional cpu.max.
Reserve isolated CPUs for special low-latency workers.
Introduce RT/DEADLINE only for the narrow critical path.

This avoids the common anti-pattern of solving every latency issue with RT priority inflation.

Validation checklist (before and after policy change)

p99/p999 latency by endpoint or loop cycle
run queue pressure (cpu.pressure / PSI)
throttling counters (nr_throttled, throttled_usec where relevant)
deadline miss / overrun signals (for DEADLINE workloads)
CPU steal time and IRQ saturation on isolated cores
host recoverability under fault injection (can you still ssh, restart, roll back?)

If recoverability fails, rollout fails—regardless of benchmark gains.

Common failure patterns

Priority inversion disguised as random jitter
- Fix locking/resource ordering before raising priorities.
RT policy applied to too many threads
- Result: normal services starve; incident response gets harder.
DEADLINE reservations copied from docs, not measured WCET
- Admission passes in staging, misses explode in production.
No rollback primitive
- Real-time tuning without one-command rollback is operational debt.

Practical recommendation

For most production stacks:

Win first with topology/affinity/cgroup tuning on SCHED_OTHER.
Promote only a tiny critical set to SCHED_FIFO or SCHED_RR if strictly needed.
Use SCHED_DEADLINE only when you can express real timing contracts and monitor them continuously.
Treat scheduler policy as a reliability feature, not a benchmark trick.

References

sched(7), Linux man-pages: https://man7.org/linux/man-pages/man7/sched.7.html
chrt(1), Linux man-pages: https://man7.org/linux/man-pages/man1/chrt.1.html
Linux kernel docs — SCHED_DEADLINE: https://docs.kernel.org/scheduler/sched-deadline.html
Linux kernel docs — RT group scheduling / throttling context: https://docs.kernel.org/scheduler/sched-rt-group.html
Linux kernel docs — cgroup v2: https://docs.kernel.org/admin-guide/cgroup-v2.html
cgroup2 CPU controller overview: https://facebookmicrosites.github.io/cgroup2/docs/cpu-controller.html