Linux cgroup v2 CPU Latency Protection Playbook (cpu.max, cpu.weight, cpuset, uclamp)

2026-03-17 · software

Linux cgroup v2 CPU Latency Protection Playbook (cpu.max, cpu.weight, cpuset, uclamp)

Date: 2026-03-17
Category: knowledge

Why this matters

Most Linux performance incidents are not “CPU ran out” incidents. They are contention-shape incidents:

cgroup v2 gives practical controls to protect latency-critical workloads without pretending all services are equal.


1) Quick mental model

Use these knobs for different jobs:

Rule of thumb:


2) Fast decision matrix

A) Background batch jobs hurting API p99

B) Multiple online services contending on same node

C) Strict noisy-neighbor isolation required

D) Latency-sensitive service with frequency-droop risk


3) Setup & discovery (10 minutes)

Check cgroup mode:

stat -fc %T /sys/fs/cgroup
# should be cgroup2fs

See available controllers:

cat /sys/fs/cgroup/cgroup.controllers
cat /sys/fs/cgroup/cgroup.subtree_control

Enable controllers at parent (example root-level):

# enable cpu + cpuset for child groups
sudo sh -c 'echo "+cpu +cpuset" > /sys/fs/cgroup/cgroup.subtree_control'

Create groups:

sudo mkdir -p /sys/fs/cgroup/api
sudo mkdir -p /sys/fs/cgroup/batch

Move a PID:

echo <PID> | sudo tee /sys/fs/cgroup/api/cgroup.procs

4) Minimal safe baseline policy

Example: protect api, constrain batch.

# API: higher share, no hard cap
echo 400 | sudo tee /sys/fs/cgroup/api/cpu.weight
echo "max 100000" | sudo tee /sys/fs/cgroup/api/cpu.max

# Batch: lower share + 2 CPU cap (period 100ms)
echo 100 | sudo tee /sys/fs/cgroup/batch/cpu.weight
echo "200000 100000" | sudo tee /sys/fs/cgroup/batch/cpu.max

Interpretation:


5) cpuset isolation pattern (when fairness is not enough)

# ensure parent has a valid cpuset first
cat /sys/fs/cgroup/cpuset.cpus
cat /sys/fs/cgroup/cpuset.mems

# isolate API to cores 0-3, batch to 4-7 (example)
echo 0-3 | sudo tee /sys/fs/cgroup/api/cpuset.cpus
echo 0   | sudo tee /sys/fs/cgroup/api/cpuset.mems

echo 4-7 | sudo tee /sys/fs/cgroup/batch/cpuset.cpus
echo 0   | sudo tee /sys/fs/cgroup/batch/cpuset.mems

Use cpuset when strict latency SLOs justify lower average utilization efficiency.


6) uclamp usage (advanced, kernel-dependent)

If cpu.uclamp.min/cpu.uclamp.max files exist:

# keep critical group from dropping too low (example)
echo 25 | sudo tee /sys/fs/cgroup/api/cpu.uclamp.min

# cap non-critical burst aggressiveness (example)
echo 60 | sudo tee /sys/fs/cgroup/batch/cpu.uclamp.max

Caution: this influences scheduler utilization signals and can increase power draw. Treat as a canary-only feature first.


7) Observability checklist

At minimum, track per-cgroup:

Quick read:

cat /sys/fs/cgroup/api/cpu.stat
cat /sys/fs/cgroup/batch/cpu.stat
cat /proc/pressure/cpu

Interpretation:


8) Rollout sequence (practical)

  1. Measure baseline: 24h diurnal p95/p99 + cpu.stat snapshots.
  2. Apply weights only on canary nodes.
  3. Add cpu.max caps to noisy batch classes.
  4. Use cpuset only where SLO still unstable.
  5. Use uclamp last, and only with power/thermal guardrails.
  6. Promote gradually (10% → 30% → 100%) with rollback script ready.

9) Common mistakes

  1. Using only quota (cpu.max) for everything
    Leads to throttle storms and p99 cliffs.

  2. Skipping weight tuning
    Misses the easiest contention control lever.

  3. cpuset without parent/memory sanity
    Causes confusing task placement behavior.

  4. Treating cgroup controls as static
    Workload mix changes; policy should be periodically recalibrated.

  5. No per-cgroup telemetry
    You cannot tune what you cannot attribute.


10) One-page starter policy

If you need a default today:

This alone removes a large fraction of "mystery latency" incidents on shared Linux nodes.


Closing

cgroup v2 CPU control is best treated as a latency-shaping system, not just a resource limiter.

When teams combine cpu.weight (fairness), cpu.max (blast-radius), and selective cpuset/uclamp (hard protection), they usually get: