Linux Timer-Slack Coalescing & Timer-Migration Slippage Playbook
Date: 2026-03-24
Category: research
Scope: How Linux timer coalescing (timerslack) and cross-CPU timer migration leak into execution cadence and create hidden slippage tails
Why this matters
Execution stacks often focus on wire/network microbursts, but many child-order bursts are born before the socket write.
A common hidden path:
- strategy threads paced by
nanosleep/clock_nanosleep/epoll_wait/futextimeouts, - kernel groups near-expiry timers via timer slack,
- timer callbacks can be migrated away from idle CPUs (
/proc/sys/kernel/timer_migration=1), - wakeups arrive a bit late and sometimes together,
- scheduler/dispatcher catches up in bursts,
- market impact convexity turns small timing errors into large cost tails.
The median loop latency can still look fine while q95/q99 slippage degrades.
Failure mechanism (operator timeline)
- Parent execution loop targets smooth cadence (e.g., every 200–500µs).
- Critical thread keeps default timer slack (often inherited), or slack drifts too large for loop period.
- Under load, timer expirations are coalesced and/or wakeups land on a different CPU path.
- Effective wakeups are delayed by tens of microseconds to sub-millisecond bursts.
- Child emission becomes “quiet then clustered” instead of evenly spaced.
- Schedule deficit accumulates; urgency logic increases aggression.
- Burst re-entry crosses thinner queue depth and pays impact + queue-reset tax.
Key point: this is OS timer-policy leakage into execution cost, not purely market randomness.
Extend slippage decomposition with timer-policy term
[ IS = IS_{market} + IS_{impact} + IS_{timing} + IS_{fees} + \underbrace{IS_{timer}}_{\text{coalescing/migration cadence tax}} ]
Practical approximation:
[ IS_{timer,t} \approx a\cdot TSR_t + b\cdot WJL_t + c\cdot CBI_t + d\cdot TMR_t + e\cdot PHE_t ]
Where:
- (TSR): Timer Slack Ratio,
- (WJL): Wakeup Jitter Lag,
- (CBI): Coalesced Burst Index,
- (TMR): Timer Migration Rate,
- (PHE): Phase Error between target and actual child dispatch.
Metrics to add in production
1) Timer Slack Ratio (TSR)
[ TSR = \frac{timerslack_ns}{loop_period_ns + \epsilon} ]
If loop period is 200µs and slack is 50µs, TSR=0.25 (already material).
2) Wakeup Jitter Lag (WJL)
[ WJL = p99(t_{wake,actual} - t_{wake,target}) ]
Measured in microseconds from monotonic timestamps.
3) Coalesced Burst Index (CBI)
[ CBI = \frac{p95(\text{childs per 250}\mu s)}{median(\text{childs per 250}\mu s)+\epsilon} ]
High CBI indicates “dribble then burst” behavior.
4) Timer Migration Rate (TMR)
[ TMR = \frac{#(timer\ wakeups\ where\ target\ CPU\neq\ dispatch\ CPU)}{#(timer\ wakeups)} ]
Proxy with scheduler tracepoint joins if direct timer ownership is hard.
5) Phase Error (PHE)
[ PHE = p95\left(|t_{child,actual} - t_{child,target}|\right) ]
Directly translates kernel timing drift into execution timing damage.
6) Slack-At-Risk Exposure (SARE)
[ SARE = P(TSR > \tau \land urgency > u^*) ]
This interaction is usually where tail IS explodes.
Modeling architecture
Stage 1: timer-regime detector
Features:
timerslack_nsby thread/process,- wakeup lateness distribution,
- CPU residency / isolate-core status,
timer_migrationsetting and wakeup CPU mismatch proxies,- short-horizon CBI/PHE trends,
- urgency and schedule deficit.
Output:
- (P(\text{TIMER_DISTORTION_REGIME}))
Stage 2: conditional slippage uplift model
[ \Delta IS \sim \beta_1,urgency + \beta_2,p_{timer} + \beta_3,(urgency\times p_{timer}) ]
Interpretation: urgency alone hurts, timer distortion alone hurts, but the interaction hurts the most.
Controller state machine
GREEN — CADENCE_STABLE
- Low WJL/CBI/PHE, TSR below threshold.
- Baseline participation.
YELLOW — COALESCING_RISK
- TSR high or WJL drifting up.
- Actions:
- raise sampling rate,
- tighten alerting on dispatch phase,
- limit discretionary burst fanout.
ORANGE — DISTORTION_ACTIVE
- Sustained WJL + CBI elevation, timer migration proxies active.
- Actions:
- cap catch-up slope,
- switch to smoother schedule template,
- reduce route churn and avoid aggressive venue hopping.
RED — TAIL_CONTAINMENT
- Tail budget breach with active timer distortion.
- Actions:
- hard-limit urgency escalation,
- temporarily move to conservative completion policy,
- pin to known-stable thread/CPU profile.
Use hysteresis to avoid flip-flopping.
Engineering mitigations (high ROI first)
Set explicit low timer slack on critical execution threads
Useprctl(PR_SET_TIMERSLACK, ...)or/proc/<pid>/timerslack_nspolicy. Keep non-critical threads relaxed for power.Separate critical and non-critical work
Don’t let logging/housekeeping threads share timing policy with dispatch-critical loops.Review
kernel.timer_migrationand CPU isolation strategy together
Co-tune with core pinning/isolcpus/nohz_full design; avoid one-size-fits-all toggles.Prefer absolute-time pacing (
TIMER_ABSTIME) over drift-prone relative loops
Reduces cumulative phase walk when wakeups are occasionally late.Add short spin window only near deadline cliffs
Hybrid sleep-then-spin can reduce worst-tail phase error while controlling thermal burn.Promote by tail metrics, not mean latency
Gate on q95/q99 slippage + completion quality.
Validation protocol
- Label windows by timer regime (stable / distorted) from telemetry.
- Match cohorts by symbol, spread, volatility, urgency, participation.
- Compare mean + q95/q99 slippage and markout between cohorts.
- Canary mitigations:
- explicit low slack on critical threads,
- CPU affinity/isolation adjustments,
- migration/pacing policy changes.
- Promote only when tail improves without reliability regressions.
Observability checklist
/proc/<pid>/timerslack_nssnapshots for strategy/execution PIDs- wakeup target vs actual timestamps (monotonic)
- dispatch target vs actual child timestamp (phase error)
- per-CPU run queue and wakeup CPU mismatch proxy
- burstiness series (CBI) around schedule cutoffs
- slippage and short-horizon markout conditioned on timer regime
Success criterion: smaller tail slippage during urgency windows, not just lower average wakeup delay.
Pseudocode sketch
obs = collect_timer_obs() # TSR, WJL, CBI, TMR, PHE, urgency
p_timer = timer_regime_model.predict_proba(obs)
state = decode_state(p_timer, obs)
if state == "GREEN":
params = baseline_policy()
elif state == "YELLOW":
params = guarded_policy()
elif state == "ORANGE":
params = smooth_catchup_policy()
else: # RED
params = containment_policy()
apply_execution_params(params)
log(state=state, p_timer=p_timer)
Bottom line
Timer policy is a real execution variable.
If your slippage stack ignores timer slack and timer-migration-induced wakeup distortion, you will over-attribute losses to “market conditions” and under-fix the true cadence problem.
References
- Linux man page:
PR_SET_TIMERSLACK
https://man7.org/linux/man-pages/man2/pr_set_timerslack.2const.html - Linux man page:
/proc/<pid>/timerslack_ns
https://man7.org/linux/man-pages/man5/proc_pid_timerslack_ns.5.html - Linux kernel docs: hrtimers subsystem
https://docs.kernel.org/timers/hrtimers.html - Linux kernel docs: NO_HZ (tick reduction / jitter trade-offs)
https://docs.kernel.org/timers/no_hz.html - Linux kernel docs:
/proc/sys/kernel/timer_migration
https://docs.kernel.org/admin-guide/sysctl/kernel.html#timer-migration - Linux Foundation RT wiki: cyclictest latency methodology
https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start