Latency-Jitter Burst EVT Tail Slippage Controller Playbook
Date: 2026-02-27
Category: research (quant execution / slippage modeling)
Why this playbook
Most slippage models include a latency feature (or a few percentile buckets), but they usually treat latency as a smooth average effect. In production, real damage often comes from bursty jitter clusters:
- gateway/OMS queue buildup,
- exchange ACK micro-stalls,
- websocket backpressure,
- token-refresh or throttle contention,
- network micro-congestion.
Those bursts are short, but they coincide with adverse microstructure shifts and create disproportionate p95/p99 slippage. This playbook models that explicitly and connects it to a live controller.
Problem framing
For each child order decision at time (t), define:
- (c_t): intended control action (join/improve/cross, clip size, POV),
- (L_t): end-to-end decision-to-book latency,
- (J_t): latency jitter proxy,
- (S_t): realized short-horizon slippage contribution (bps).
A robust decomposition:
[ S_t = f(x_t, c_t) + g(x_t, c_t, L_t, J_t) + \epsilon_t ]
- (f): baseline microstructure cost (spread/impact/fill),
- (g): latency-jitter interaction term,
- (\epsilon_t): residual noise.
Key insight: (g) is highly nonlinear in tails, so mean-latency correction is not enough.
Latency state variables that matter
Instead of one scalar latency, track a state vector:
- Segment latencies
- decision→gateway, gateway→ACK, ACK→book update.
- Burstiness metrics
- rolling p95/p99, Fano factor of delayed ACK counts,
- exceedance count over dynamic threshold (u_t).
- Queue pressure proxies
- in-flight orders, cancel backlog, retry queue depth.
- Freshness drift
- quote age at decision and at actual exchange acceptance.
- Throttle pressure
- requests/sec utilization vs limit, near-limit dwell time.
A practical jitter proxy:
[ J_t = \frac{L_t - \operatorname{median}(L_{t-w:t})}{\operatorname{MAD}(L_{t-w:t}) + \delta} ]
with window (w) and small (\delta) for stability.
Tail model: EVT over latency exceedances
Use Peaks-Over-Threshold (POT) on latency exceedances:
[ Y_t = L_t - u_t \quad \text{for } L_t > u_t ]
Model (Y_t) by Generalized Pareto Distribution (GPD):
[ \Pr(Y \le y) = 1 - \left(1 + \xi \frac{y}{\beta}\right)^{-1/\xi} ]
- (\xi): tail shape,
- (\beta): scale,
- (u_t): adaptive threshold (e.g., rolling 90th/95th percentile by regime).
Then map tail risk to expected slippage surcharge. One practical form:
[ \Delta S^{tail}_t = \alpha_1 \cdot \mathbb{E}[Y_t \mid Y_t>0] + \alpha_2 \cdot \Pr(Y_t > y^*) ]
where (y^*) is an operational shock level.
Coupling latency tails with microstructure fragility
Latency bursts hurt more when book resiliency is weak. Define fragility score (F_t) from spread expansion, cancel imbalance, refill half-life, and short-horizon OFI toxicity.
Final surcharge:
[ \Delta S_t = \Delta S^{tail}_t \cdot (1 + \kappa F_t) ]
- calm book (low (F_t)): latency tail cost partially absorbed,
- fragile book (high (F_t)): same latency burst causes convex slippage jump.
This interaction term is where most underestimation errors live.
Online controller (state machine)
Use an explicit latency-risk state machine:
- GREEN: tail stable, normal tactics
- AMBER: exceedance frequency rising
- RED: burst cluster / GPD tail worsening
- SAFE: hard degradation mode
Transition signals
Promote state when any sustained condition holds:
- (\Pr(Y>y^*)) over threshold,
- burst-rate z-score over threshold,
- p95 slippage residual breach while latency tail elevated.
Demote with hysteresis (minimum dwell + recovery margin) to avoid flap.
Control actions by state
GREEN
- baseline POV and passive mix.
AMBER
- reduce clip size,
- widen cancel/replace spacing,
- avoid low-edge passive joins on thin queues.
RED
- cap aggressive crossing bursts,
- shift to safer venues/paths,
- tighten participation ceiling.
SAFE
- freeze non-urgent slices,
- fallback to baseline deterministic policy,
- require operator/automatic health recovery gate before resume.
Calibration loop
Run two linked calibrations daily + intraday mini-refresh:
- Tail fit health
- stability of (\xi, \beta), exceedance count sufficiency,
- QQ diagnostics for exceedances by session/regime.
- Cost linkage health
- calibration of predicted (\Delta S_t) vs realized residual slippage,
- p95/p99 conditional coverage under latency stress.
If tail fit unstable (too few exceedances or regime break), fallback to empirical quantile surcharge instead of forcing GPD.
Data contract (must-log fields)
Per decision / child order:
- timestamps for each latency segment,
- order intent vs accepted-at-exchange time,
- action/tactic and urgency level,
- queue position proxy / top-book snapshot,
- reject/cancel/replace outcomes,
- realized fill + markout horizons,
- throttle counters and API bucket utilization.
Without segment timestamps and throttle counters, root-cause separation is mostly guesswork.
Validation scorecard
Minimum weekly scorecard:
- Tail metrics
- latency exceedance rate, GPD (\xi) drift, burst cluster frequency.
- Execution outcomes
- p50/p95/p99 slippage by state (GREEN/AMBER/RED/SAFE).
- Controller quality
- precision/recall of RED entry vs future p95 breach,
- false-positive burden (unnecessary defensive mode time).
- Business tradeoff
- slippage saved vs additional underfill/opportunity cost.
Promotion criterion should be tail-first: improved p95/p99 with bounded completion damage.
Common failure modes
- Single latency average hides burst clustering.
- Static threshold (u) ignores intraday regime drift.
- No hysteresis causes controller oscillation.
- Missing segment clocks confuses network vs venue vs internal queue issues.
- Tail-only control without fill governance improves cost but misses deadlines.
Rollout plan
Phase 1 — Shadow scoring
- Compute latency-tail state + suggested controls, but do not execute.
- Compare predicted surcharge to realized residual slippage.
Phase 2 — Soft guardrails
- Enable AMBER-only controls with tight bounds.
- Keep RED/SAFE advisory.
Phase 3 — Controlled activation
- Activate RED with canary traffic and strict rollback triggers.
- SAFE remains hard kill-switch fallback.
Phase 4 — Full governance
- Weekly tail model review,
- monthly incident replay for top latency-burst events,
- champion/challenger for thresholding and surcharge mapping.
Practical takeaway
Latency risk in execution is not about “a few ms slower on average.” It is about clustered tail events interacting with fragile liquidity.
If your slippage model treats latency as a linear average feature, you will keep paying unexpected p95 invoices. Model the tail explicitly (EVT), couple it with book fragility, and attach it to a stateful controller.
Pointers for deeper reading
- Embrechts, Klüppelberg, Mikosch — Modelling Extremal Events (EVT foundations).
- Coles — An Introduction to Statistical Modeling of Extreme Values.
- Pickands (1975), Balkema & de Haan (1974) — POT/GPD limit results.
- McNeil, Frey, Embrechts — Quantitative Risk Management (tail modeling in practice).
- Dean & Barroso — The Tail at Scale (tail-latency systems intuition).