Linux Scheduler Policy Selection Playbook (SCHED_OTHER / FIFO / RR / DEADLINE)
Why this matters
For latency-sensitive services, scheduler policy is one of the highest-leverage choices after architecture.
Get it right, and tail latency becomes predictable. Get it wrong, and one runaway thread can starve the box.
This playbook is for practical policy selection under production constraints.
Mental model: class precedence first
On Linux, scheduler classes are not equal peers.
In broad priority order:
- SCHED_DEADLINE (earliest-deadline-first + CBS bandwidth control)
- SCHED_FIFO / SCHED_RR (fixed-priority real-time)
- SCHED_OTHER / SCHED_BATCH / SCHED_IDLE (normal/fair classes)
Implication: admitting RT/DEADLINE threads changes system-level fairness. You are not only tuning one process—you are redefining who can run.
Start here: policy decision table
Use this as your default decision path.
- General backend services, mixed I/O, bursty request flow
- Start with SCHED_OTHER + affinity/isolation + cgroup
cpu.weight/cpu.max
- Start with SCHED_OTHER + affinity/isolation + cgroup
- Control loops needing strict fixed-priority preemption
- Consider SCHED_FIFO (or SCHED_RR if equal-priority peers must time-slice)
- Periodic/sporadic jobs with explicit runtime/deadline/period guarantees
- Consider SCHED_DEADLINE
- Best-effort batch
- Keep on SCHED_OTHER/SCHED_BATCH; avoid RT unless proven necessary
If unsure, default to SCHED_OTHER and optimize CPU topology + contention first.
Policy profiles
1) SCHED_OTHER (default, safest baseline)
Best for:
- multi-tenant services,
- request/response systems with variable work,
- systems where fairness and recovery are more important than strict preemption.
Practical levers before touching RT:
- CPU affinity / NUMA locality,
- IRQ pinning,
- cgroup v2
cpu.weightandcpu.max, - lock contention reduction,
- allocator and memory locality fixes.
Most “need RT” incidents are actually “need less contention/jitter” incidents.
2) SCHED_FIFO (strict fixed-priority, no timeslice)
Best for:
- short critical sections that must preempt normal tasks immediately,
- event loops where response bound matters more than fairness.
Risk profile:
- no round-robin among same-priority tasks,
- a badly behaved high-priority thread can monopolize CPU until block/yield/preempted by higher priority.
Use only with explicit watchdogs and bounded critical work.
3) SCHED_RR (fixed-priority + quantum at same priority)
Best for:
- same as FIFO but with multiple equal-priority peers that should share CPU.
Risk profile:
- still real-time class (can starve normal tasks if overused),
- quantum behavior can add jitter if mis-sized.
Choose RR when FIFO fairness problems appear among same-priority workers.
4) SCHED_DEADLINE (runtime/deadline/period contracts)
Best for:
- periodic/sporadic compute where timing contracts are explicit,
- workloads that can be modeled as
(runtime, deadline, period).
Kernel constraints include:
runtime <= deadline <= period,- admission control may reject unschedulable reservations (
EBUSY), - typically requires privileged capability (e.g.,
CAP_SYS_NICE).
This is the strongest model, but the easiest to misconfigure if workload math is weak.
Production guardrails (non-optional)
- Protect rescue capacity for non-RT work
- Validate RT throttling settings (
sched_rt_runtime_us,sched_rt_period_us) and policy intent.
- Validate RT throttling settings (
- Isolate critical threads
- Use cpuset/affinity to avoid accidental interference.
- Bound work per activation
- RT/DEADLINE tasks must have strict upper bounds on per-cycle compute.
- Add watchdog + demotion path
- If loop time exceeds threshold, alert and demote policy automatically.
- Canary first
- Never roll RT policy globally in one step.
Minimal command toolbox
Inspect current policy/priority:
chrt -p <pid>
Set a process to FIFO with priority 50:
sudo chrt -f -p 50 <pid>
Launch with DEADLINE parameters (priority must be 0):
sudo chrt -d --sched-runtime 1000000 --sched-deadline 5000000 --sched-period 5000000 0 ./app
Interpret DEADLINE numbers as nanoseconds in userspace interface conventions.
cgroup v2 + scheduler: practical layering
Use scheduler policy and cgroup controls together:
- Policy defines who can preempt whom.
- cgroup CPU controller defines how much bandwidth each subtree gets.
Typical safe layering:
- Keep most services on
SCHED_OTHER. - Shape competition with
cpu.weightand optionalcpu.max. - Reserve isolated CPUs for special low-latency workers.
- Introduce RT/DEADLINE only for the narrow critical path.
This avoids the common anti-pattern of solving every latency issue with RT priority inflation.
Validation checklist (before and after policy change)
- p99/p999 latency by endpoint or loop cycle
- run queue pressure (
cpu.pressure/ PSI) - throttling counters (
nr_throttled,throttled_usecwhere relevant) - deadline miss / overrun signals (for DEADLINE workloads)
- CPU steal time and IRQ saturation on isolated cores
- host recoverability under fault injection (can you still ssh, restart, roll back?)
If recoverability fails, rollout fails—regardless of benchmark gains.
Common failure patterns
Priority inversion disguised as random jitter
- Fix locking/resource ordering before raising priorities.
RT policy applied to too many threads
- Result: normal services starve; incident response gets harder.
DEADLINE reservations copied from docs, not measured WCET
- Admission passes in staging, misses explode in production.
No rollback primitive
- Real-time tuning without one-command rollback is operational debt.
Practical recommendation
For most production stacks:
- Win first with topology/affinity/cgroup tuning on
SCHED_OTHER. - Promote only a tiny critical set to
SCHED_FIFOorSCHED_RRif strictly needed. - Use
SCHED_DEADLINEonly when you can express real timing contracts and monitor them continuously. - Treat scheduler policy as a reliability feature, not a benchmark trick.
References
- sched(7), Linux man-pages: https://man7.org/linux/man-pages/man7/sched.7.html
- chrt(1), Linux man-pages: https://man7.org/linux/man-pages/man1/chrt.1.html
- Linux kernel docs — SCHED_DEADLINE: https://docs.kernel.org/scheduler/sched-deadline.html
- Linux kernel docs — RT group scheduling / throttling context: https://docs.kernel.org/scheduler/sched-rt-group.html
- Linux kernel docs — cgroup v2: https://docs.kernel.org/admin-guide/cgroup-v2.html
- cgroup2 CPU controller overview: https://facebookmicrosites.github.io/cgroup2/docs/cpu-controller.html