Linux resctrl + Intel RDT (CAT/MBA) Playbook for Latency Isolation
Date: 2026-03-19
Category: knowledge
Why this matters
In shared hosts, one memory-hungry workload can quietly destroy another workload’s p99:
- LLC thrash raises cache-miss penalties,
- memory bandwidth contention stretches service time,
- tail latency explodes even when average CPU looks “fine”.
resctrl (Linux Resource Control FS) gives you practical knobs to reduce this noisy-neighbor tax:
- CAT (Cache Allocation Technology): partition LLC ways,
- MBA (Memory Bandwidth Allocation): throttle memory bandwidth usage,
- CMT/MBM: monitor cache occupancy and bandwidth behavior.
Treat this as performance SLO protection, not as a benchmark toy.
1) Mental model: protect the critical path, constrain the bully path
For low-latency services, the target is not “maximum throughput for everyone”. It is:
- reserve enough cache residency for critical threads,
- cap/shape memory pressure from noisy jobs,
- verify with tail metrics (p95/p99/p999), not only CPU%.
Think of CAT/MBA as a QoS firewall between service classes.
2) Feature map (quick refresher)
From Linux resctrl docs and Intel RDT docs:
- CAT: control cache bitmasks (CBM) per class of service (CLOS/COS).
- CDP: separate code/data prioritization (optional).
- MBM: monitor local/total memory bandwidth counters.
- MBA: enforce bandwidth throttling by class.
- mba_MBps mode: software controller exposing MB/s-style caps via OS interface.
Important nuance:
- MBA settings are quantized (hardware granularity),
- actual applied bandwidth can differ from requested cap,
- sibling-thread behavior depends on platform mode (
thread_throttle_mode: oftenmaxon some Intel generations).
3) Preflight checklist before touching production
- Hardware/Kernel support
- Check CPU flags and kernel support for
resctrl.
- Check CPU flags and kernel support for
- Mount options clarity
- If you need MB/s style caps, mount/use
mba_MBpspath.
- If you need MB/s style caps, mount/use
- Single control plane rule
- Don’t mix MSR and OS interfaces casually (
pqosdefault MSR vs-IOS mode).
- Don’t mix MSR and OS interfaces casually (
- SLO-first baseline
- Capture current latency, throughput, cache miss rates, MBM counters.
- Canary scope defined
- Start with one service + one known noisy workload.
4) Baseline observability commands (before policy)
# resctrl capability tree
ls -R /sys/fs/resctrl/info
# Key resource info
cat /sys/fs/resctrl/info/L3/num_closids
cat /sys/fs/resctrl/info/L3/cbm_mask
cat /sys/fs/resctrl/info/L3/min_cbm_bits
# MBA properties
cat /sys/fs/resctrl/info/MB/min_bandwidth
cat /sys/fs/resctrl/info/MB/bandwidth_gran
cat /sys/fs/resctrl/info/MB/thread_throttle_mode
# Monitoring feature discovery
cat /sys/fs/resctrl/info/L3_MON/mon_features
Also track app-side telemetry in parallel:
- request latency quantiles,
- timeout/retry rates,
- LLC miss trend,
- CPU migrations and run queue pressure.
5) Practical policy design
A) Class layout (simple and operable)
Use 3 tiers first:
- Tier A (critical latency path): larger cache share, no/low MBA throttling.
- Tier B (important but tolerant): moderate cache share, moderate MBA limit.
- Tier C (batch/noisy/background): smaller cache share, stronger MBA cap.
Start simple. Over-fragmenting CLOS classes adds ops complexity and fragile tuning.
B) CAT principles
- Give Tier A contiguous, sufficient LLC ways.
- Avoid tiny partitions that increase miss churn.
- Respect
min_cbm_bitsand valid mask constraints. - Confirm no accidental overlap unless sharing is intentional.
C) MBA principles
- Use MBA to contain sustained bandwidth hogs, not to “perfectly rate-limit” every spike.
- Expect granularity/rounding effects.
- In MB/s controller mode, validate observed MBM against intended caps.
- If
thread_throttle_mode=max, one harsh setting can affect sibling threads on the same core.
6) Rollout sequence (safe)
- Observe baseline for at least one realistic busy window.
- Apply CAT-only canary to protect critical cache residency.
- Re-measure tails and miss rates.
- Add MBA gradually on noisy class only.
- Validate no hidden regressions (throughput collapse, starvation, huge queueing).
- Expand by service class, not entire fleet.
- Keep rollback profile ready (known-good default schemata).
7) Common failure modes
Mixing interface modes
- MSR and OS
resctrlcontrol paths collide and create config confusion.
- MSR and OS
Benchmark-only tuning
- Throughput may improve while p99 gets worse.
Too many classes too fast
- Human/operator error rises faster than performance gain.
Ignoring topology
- Socket/NUMA/core placement can dominate policy outcome.
Assuming requested MBA == exact delivered BW
- Always verify applied behavior with counters + app SLO.
8) What “good” looks like
After stable deployment, you should see:
- lower p99/p999 volatility on critical services,
- fewer cache-miss shock bursts during noisy-neighbor activity,
- controlled background-memory pressure,
- predictable performance tradeoff between classes.
If critical tails are still unstable, revisit:
- class-to-core placement,
- CAT mask widths,
- MBA aggressiveness,
- thread sibling interference.
9) Minimal operator checklist
-
resctrlcapability confirmed on target hosts - One control interface policy (OS vs MSR) declared
- Baseline SLO + counter snapshot captured
- CAT canary applied first
- MBA added in small steps
- Tail metrics improved without unacceptable starvation
- Rollback command/profile tested
References
- Linux kernel docs —
resctrluser interface:
https://docs.kernel.org/filesystems/resctrl.html - Intel RDT overview:
https://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html - Intel
intel-cmt-cat(pqos/rdtset):
https://github.com/intel/intel-cmt-cat - Intel MBM/MBA usage guide:
https://github.com/intel/intel-cmt-cat/wiki/MBM-MBA-how-to-guide