Cgroup CPU Quota Throttling Burst Slippage Playbook
Date: 2026-03-18
Category: research
Why this exists
Execution stacks are increasingly containerized. That helps isolation, but it also introduces a subtle slippage tax: CFS quota throttling (cpu.max) can pause a latency-critical strategy exactly when microstructure is moving.
The host can still look “fine” on average CPU usage while the strategy experiences:
- short hard stalls,
- period-boundary burst release,
- queue-priority decay,
- panic catch-up that crosses spread.
This playbook treats quota throttling as a first-class slippage factor.
Core failure mode
In cgroup v2, cpu.max = <quota> <period> grants CPU runtime budget each period. If budget is exhausted, runnable threads in that cgroup are throttled until the next period refill.
For an execution engine, this creates a repeated pattern:
- Budget depletion during bursty compute/IO windows.
- Throttle pause in decision / amend / cancel loop.
- Unthrottle surge at period boundary.
- Dispatch clumping (late child-order burst).
- Queue reset + adverse selection.
Result: tail implementation shortfall worsens, even if mean completion stays acceptable.
Slippage decomposition with quota-throttle term
For parent order (i):
[ IS_i = C_{spread} + C_{impact} + C_{opportunity} + C_{throttle} ]
Where:
[ C_{throttle} = C_{pause} + C_{catchup} + C_{queue_decay} + C_{timing_alias} ]
- (C_{pause}): missed passive windows during throttled intervals.
- (C_{catchup}): convex impact from post-throttle burst sends.
- (C_{queue_decay}): queue position loss from delayed amend/cancel decisions.
- (C_{timing_alias}): pathological synchronization with fixed quota period boundaries.
Production observability (minimum)
1) cgroup CPU telemetry
From cpu.stat (cgroup v2):
nr_periodsnr_throttledthrottled_usecusage_usec
2) Scheduler / pressure context
- CPU PSI (
/proc/pressure/cpu) for global contention backdrop - per-thread run-queue delay quantiles (if available from scheduler telemetry)
3) Execution-path telemetry
- decision-to-send latency (p50/p95/p99)
- cancel-to-replace turnaround
- inter-dispatch gap variance
- child-order burst index
- passive fill ratio by latency bucket
4) Outcome telemetry
- IS by parent and urgency bucket
- short-horizon markout ladder (10ms / 100ms / 1s)
- completion deficit near deadline
Desk metrics to track
Define these over rolling windows (e.g., 1m / 5m):
- TRR (Throttle Ratio Rate)
[ TRR = \frac{\Delta nr_throttled}{\Delta nr_periods} ]
- TDR (Throttle Duty Ratio)
[ TDR = \frac{\Delta throttled_usec}{\Delta window_usec} ]
- PBA (Period-Boundary Aliasing)
Correlation between dispatch spikes and quota period refill boundaries.
- CBI (Catch-up Burst Index)
Post-unthrottle child-send rate divided by baseline send rate.
- QDL (Queue Decay Loss)
Passive fill-ratio drop conditioned on throttle events vs matched non-throttle windows.
Modeling approach
Use baseline + throttle overlay architecture.
Stage A: Baseline slippage model
Normal spread/impact/fill model with market-state features.
Stage B: Quota-throttle uplift model
Predict incremental uplift:
delta_is_meandelta_is_q95
using features:
- TRR, TDR, PBA, CBI, QDL
cpu.maxeffective core ceiling (quota/period)- latency-tail stretch metrics
- urgency, participation, symbol-liquidity regime
Final estimate:
[ \hat{IS}{final} = \hat{IS}{baseline} + \Delta\hat{IS}_{throttle} ]
Train with matched market windows (same symbol/session/volatility buckets) to isolate infra-induced uplift from market confounders.
Controller state machine
State 1 — NORMAL
TRR < 0.01TDR < 0.5%- no period-locking pattern
Action: normal pacing.
State 2 — QUOTA_EDGE
TRR >= 0.01orTDR >= 0.5%- latency tails widening
Action: reduce replace churn, smooth child spacing, raise passive selectivity.
State 3 — THROTTLE_BURST
TRR >= 0.05orTDR >= 3%- burst index elevated after unthrottle
Action: cap burst size, disable panic catch-up, enforce inter-send minimum gap.
State 4 — SAFE_CONTAIN
- persistent
THROTTLE_BURST+ deadline pressure
Action: conservative completion mode, stricter aggression cap, optional route to non-throttled host pool.
Use hysteresis + minimum dwell times to prevent flapping.
Mitigation ladder
Right-size quota headroom
- Avoid sizing from average CPU use; size from p95/p99 burst demand.
- Explicitly budget risk checks + serialization + retry spikes.
Tune quota period deliberately
- Shorter period reduces max single stall, but can increase scheduling overhead.
- Validate period choice with latency-tail experiments, not CPU-average metrics.
Use
cpu.weight+ topology isolation together- Quota alone is a blunt tool.
- Combine fair-share (
cpu.weight) and cpuset isolation for critical paths.
Throttle-aware pacer
- Detect recent throttle events and suppress catch-up bursts.
- Prefer bounded repayment over immediate backlog flush.
Host-pool policy
- Keep urgent flow off aggressively capped multi-tenant pools.
- Separate “latency-critical” and “batch-contended” deployment classes.
Validation drills (must run)
Quota squeeze drill
- Temporarily tighten
cpu.max; verify uplift detector catches IS tail rise.
- Temporarily tighten
Period sensitivity drill
- Sweep period values (e.g., 100ms -> 50ms -> 20ms) with fixed effective quota.
- Compare q95 dispatch latency, CBI, and q95 IS.
Catch-up policy A/B
- naive backlog flush vs bounded repayment policy.
- Select policy by q95 IS + completion stability, not mean IS alone.
Confounder separation drill
- Distinguish quota-throttle signatures from network/venue incidents.
Anti-patterns
- “CPU usage is only 40%, so throttling cannot matter.”
- fixed 100ms period with no aliasing analysis
- quota-capped strategy colocated with bursty batch jobs
- panic catch-up logic after unthrottle
- monitoring only host CPU, not cgroup
cpu.stat
Practical rollout checklist
- Add cgroup
cpu.statsampling to execution telemetry pipeline. - Build TRR/TDR/PBA/CBI dashboards by host pool and strategy class.
- Label throttle windows in historical TCA.
- Fit throttle-uplift overlay model and backtest contribution.
- Enable state-machine controller in shadow mode.
- Run quota/period canary and compare q95 IS before broad rollout.
Bottom line
In containerized execution systems, cgroup quota is not just a resource-control setting; it is a microstructure timing control.
If you do not model and govern quota-throttle bursts explicitly, you will keep paying a hidden tail-slippage tax that looks like “market noise” but is mostly self-inflicted.
References
- Linux kernel docs — CFS bandwidth control:
https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html - Linux kernel docs — cgroup v2 CPU controller:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html - Linux kernel docs — PSI (Pressure Stall Information):
https://www.kernel.org/doc/html/latest/accounting/psi.html