DVFS / CPU Frequency-Scaling Jitter as a Slippage Driver (Practical Playbook)

Date: 2026-03-16
Category: research
Audience: small quant teams running Linux-based low-latency execution stacks

Why this matters

Most slippage attribution frameworks model market microstructure, not host power-management dynamics.
In production, dynamic voltage and frequency scaling (DVFS) can inject compute-latency variance exactly when execution loops need deterministic timing:

child-order schedulers miss intended cadence,
cancel/replace cycles arrive late,
queue priority decays,
catch-up aggressiveness rises,
tail implementation shortfall widens.

This is not only an infra concern; it is a P&L pathway.

1) Failure mechanism (from power policy to bps leakage)

A typical chain:

Runtime load shifts quickly (burst traffic, rebalance, event window)
CPU governor / pstate logic lags or oscillates
Effective compute per decision interval drops unpredictably
Dispatch and amend timing drift from target schedule
Queue-entry timing loses priority against faster participants
Residual inventory rises toward deadline
Router escalates to taker-heavy completion
q95+ slippage degrades

In short: frequency instability → timing instability → microstructure disadvantage.

2) Metrics that expose frequency-linked slippage

System/runtime telemetry

cpu_freq_mhz_p50/p95/p99 per pinned execution core
freq_transition_count per second (up/down switches)
cpu_throttle_time_ms (thermal/power caps)
event_loop_lag_ms_p95 or scheduler-loop lag
dispatch_gap_error_ms = |actual_child_gap - target_gap|

Execution-coupled telemetry

Frequency Stability Index (FSI)

[ FSI = 1 - \frac{\sigma(f_{core})}{\mu(f_{core})} ]

Lower FSI indicates unstable compute budget.

Transition Churn Rate (TCR)

[ TCR = \frac{#\text{freq transitions}}{\text{second}} ]

High TCR often correlates with jittery decision latency.

Dispatch Timing Deviation (DTD)

[ DTD = \text{median}(|\Delta t_{actual}-\Delta t_{target}|) ]

Frequency-Linked Slippage Uplift (FLSU)

[ FLSU = IS_{bps}^{low_FSI/high_TCR} - IS_{bps}^{stable_FSI} ]

If FLSU is persistent after market-state matching, DVFS is a first-class slippage factor.

3) Modeling branch cost (frequency-aware expected cost)

At each control step, estimate:

[ E[\Delta IS] = p_{stable}C_{stable} + p_{transition}C_{transition} + p_{throttled}C_{throttled} + p_{deadline}C_{deadline} ]

Where branch probabilities are conditioned on:

host state: freq variance, transition churn, throttle events,
process state: queue backlog, decision-latency budget usage,
market state: spread/depth resiliency/toxicity,
schedule state: residual shares vs time left.

Most desks underprice C_transition and C_throttled tails; calibrate on q90/q95/q99, not mean only.

4) Control policy (state machine)

STABLE

Frequency stable enough for normal cadence
Standard passive/active mix
Normal venue ranking

UNSTABLE

Trigger: FSI drop or TCR spike for N windows
Action: reduce child-size variance, widen dispatch buffers, cap cancel/replace churn

THROTTLED

Trigger: throttle-time breach or sustained low effective frequency
Action: disable burst catch-up, switch to paced repayment ladder, tighten max participation

SAFE_TAIL

Trigger: repeated FLSU breach or deadline-risk escalation
Action: conservative completion policy with explicit slippage budget stop

Key rule: never repay missed schedule debt in a single burst after a degraded-compute window.

5) Mitigation levers (ranked by practicality)

A) Host/power-policy layer

Pin execution-critical threads to isolated cores
Use predictable governor/pstate mode for execution cores during market hours
Bound deep idle-state latency where required (carefully, with thermal checks)
Separate noisy batch workloads from execution CPUs

B) Process architecture

Isolate dispatch loop from parsing/feature pipelines with bounded queues
Precompute expensive features off critical timing path
Add backpressure and shed non-critical work when timing budget is tight

C) Execution policy

Replace naive catch-up with bounded repayment
Add cooldown after unstable/throttled windows
Couple aggression to tail-risk budget, not only schedule deficit

6) 7-day rollout plan

Day 1-2
Add core-frequency, transition, throttle, and dispatch-jitter telemetry.

Day 3-4
Build matched-window attribution: stable vs unstable/throttled host windows.

Day 5
Run state-machine policy in shadow mode (log decisions, no live action).

Day 6
Canary on low-risk symbol basket with hard rollback gates.

Day 7
Review q95 IS, completion reliability, burst incidence, and throttle-linked anomalies.

Example rollback gate:

q95 IS worsens > 8 bps vs control for 2 sessions, or
completion reliability falls below predefined floor.

7) Common anti-patterns

"CPU frequency is ops-only" thinking
If timing moves queue rank, it is execution economics.
Average-frequency dashboards only
Transition churn and tails matter more than mean MHz.
Aggressive debt repayment
Burst recovery often converts timing debt into adverse selection.
No market-state conditioning
Same compute jitter is cheap in calm books, expensive in fragile books.

Bottom line

DVFS behavior is effectively a hidden execution parameter.
If slippage models ignore host compute-state transitions, teams will misclassify avoidable tail losses as random market noise.

Make frequency stability observable, model branch costs explicitly, and enforce a bounded-recovery controller.

References

Linux Kernel Docs, CPU Performance Scaling (CPUFreq)
https://docs.kernel.org/admin-guide/pm/cpufreq.html
Linux Kernel Docs, intel_pstate CPU Performance Scaling Driver
https://docs.kernel.org/admin-guide/pm/intel_pstate.html
Linux Kernel Docs, amd-pstate CPU Performance Scaling Driver
https://docs.kernel.org/admin-guide/pm/amd-pstate.html
Linux Kernel Docs, Schedutil
https://docs.kernel.org/scheduler/schedutil.html
Linux Kernel Docs, CPU Idle Time Management (CPUIdle)
https://docs.kernel.org/admin-guide/pm/cpuidle.html