Linux Timer-Slack Coalescing & Timer-Migration Slippage Playbook

2026-03-24 · finance

Linux Timer-Slack Coalescing & Timer-Migration Slippage Playbook

Date: 2026-03-24
Category: research
Scope: How Linux timer coalescing (timerslack) and cross-CPU timer migration leak into execution cadence and create hidden slippage tails

Why this matters

Execution stacks often focus on wire/network microbursts, but many child-order bursts are born before the socket write.

A common hidden path:

The median loop latency can still look fine while q95/q99 slippage degrades.


Failure mechanism (operator timeline)

  1. Parent execution loop targets smooth cadence (e.g., every 200–500µs).
  2. Critical thread keeps default timer slack (often inherited), or slack drifts too large for loop period.
  3. Under load, timer expirations are coalesced and/or wakeups land on a different CPU path.
  4. Effective wakeups are delayed by tens of microseconds to sub-millisecond bursts.
  5. Child emission becomes “quiet then clustered” instead of evenly spaced.
  6. Schedule deficit accumulates; urgency logic increases aggression.
  7. Burst re-entry crosses thinner queue depth and pays impact + queue-reset tax.

Key point: this is OS timer-policy leakage into execution cost, not purely market randomness.


Extend slippage decomposition with timer-policy term

[ IS = IS_{market} + IS_{impact} + IS_{timing} + IS_{fees} + \underbrace{IS_{timer}}_{\text{coalescing/migration cadence tax}} ]

Practical approximation:

[ IS_{timer,t} \approx a\cdot TSR_t + b\cdot WJL_t + c\cdot CBI_t + d\cdot TMR_t + e\cdot PHE_t ]

Where:


Metrics to add in production

1) Timer Slack Ratio (TSR)

[ TSR = \frac{timerslack_ns}{loop_period_ns + \epsilon} ]

If loop period is 200µs and slack is 50µs, TSR=0.25 (already material).

2) Wakeup Jitter Lag (WJL)

[ WJL = p99(t_{wake,actual} - t_{wake,target}) ]

Measured in microseconds from monotonic timestamps.

3) Coalesced Burst Index (CBI)

[ CBI = \frac{p95(\text{childs per 250}\mu s)}{median(\text{childs per 250}\mu s)+\epsilon} ]

High CBI indicates “dribble then burst” behavior.

4) Timer Migration Rate (TMR)

[ TMR = \frac{#(timer\ wakeups\ where\ target\ CPU\neq\ dispatch\ CPU)}{#(timer\ wakeups)} ]

Proxy with scheduler tracepoint joins if direct timer ownership is hard.

5) Phase Error (PHE)

[ PHE = p95\left(|t_{child,actual} - t_{child,target}|\right) ]

Directly translates kernel timing drift into execution timing damage.

6) Slack-At-Risk Exposure (SARE)

[ SARE = P(TSR > \tau \land urgency > u^*) ]

This interaction is usually where tail IS explodes.


Modeling architecture

Stage 1: timer-regime detector

Features:

Output:

Stage 2: conditional slippage uplift model

[ \Delta IS \sim \beta_1,urgency + \beta_2,p_{timer} + \beta_3,(urgency\times p_{timer}) ]

Interpretation: urgency alone hurts, timer distortion alone hurts, but the interaction hurts the most.


Controller state machine

GREEN — CADENCE_STABLE

YELLOW — COALESCING_RISK

ORANGE — DISTORTION_ACTIVE

RED — TAIL_CONTAINMENT

Use hysteresis to avoid flip-flopping.


Engineering mitigations (high ROI first)

  1. Set explicit low timer slack on critical execution threads
    Use prctl(PR_SET_TIMERSLACK, ...) or /proc/<pid>/timerslack_ns policy. Keep non-critical threads relaxed for power.

  2. Separate critical and non-critical work
    Don’t let logging/housekeeping threads share timing policy with dispatch-critical loops.

  3. Review kernel.timer_migration and CPU isolation strategy together
    Co-tune with core pinning/isolcpus/nohz_full design; avoid one-size-fits-all toggles.

  4. Prefer absolute-time pacing (TIMER_ABSTIME) over drift-prone relative loops
    Reduces cumulative phase walk when wakeups are occasionally late.

  5. Add short spin window only near deadline cliffs
    Hybrid sleep-then-spin can reduce worst-tail phase error while controlling thermal burn.

  6. Promote by tail metrics, not mean latency
    Gate on q95/q99 slippage + completion quality.


Validation protocol

  1. Label windows by timer regime (stable / distorted) from telemetry.
  2. Match cohorts by symbol, spread, volatility, urgency, participation.
  3. Compare mean + q95/q99 slippage and markout between cohorts.
  4. Canary mitigations:
    • explicit low slack on critical threads,
    • CPU affinity/isolation adjustments,
    • migration/pacing policy changes.
  5. Promote only when tail improves without reliability regressions.

Observability checklist

Success criterion: smaller tail slippage during urgency windows, not just lower average wakeup delay.


Pseudocode sketch

obs = collect_timer_obs()  # TSR, WJL, CBI, TMR, PHE, urgency
p_timer = timer_regime_model.predict_proba(obs)
state = decode_state(p_timer, obs)

if state == "GREEN":
    params = baseline_policy()
elif state == "YELLOW":
    params = guarded_policy()
elif state == "ORANGE":
    params = smooth_catchup_policy()
else:  # RED
    params = containment_policy()

apply_execution_params(params)
log(state=state, p_timer=p_timer)

Bottom line

Timer policy is a real execution variable.

If your slippage stack ignores timer slack and timer-migration-induced wakeup distortion, you will over-attribute losses to “market conditions” and under-fix the true cadence problem.


References