Latency-Jitter Burst EVT Tail Slippage Controller Playbook

2026-02-27 · finance

Latency-Jitter Burst EVT Tail Slippage Controller Playbook

Date: 2026-02-27
Category: research (quant execution / slippage modeling)

Why this playbook

Most slippage models include a latency feature (or a few percentile buckets), but they usually treat latency as a smooth average effect. In production, real damage often comes from bursty jitter clusters:

Those bursts are short, but they coincide with adverse microstructure shifts and create disproportionate p95/p99 slippage. This playbook models that explicitly and connects it to a live controller.


Problem framing

For each child order decision at time (t), define:

A robust decomposition:

[ S_t = f(x_t, c_t) + g(x_t, c_t, L_t, J_t) + \epsilon_t ]

Key insight: (g) is highly nonlinear in tails, so mean-latency correction is not enough.


Latency state variables that matter

Instead of one scalar latency, track a state vector:

  1. Segment latencies
    • decision→gateway, gateway→ACK, ACK→book update.
  2. Burstiness metrics
    • rolling p95/p99, Fano factor of delayed ACK counts,
    • exceedance count over dynamic threshold (u_t).
  3. Queue pressure proxies
    • in-flight orders, cancel backlog, retry queue depth.
  4. Freshness drift
    • quote age at decision and at actual exchange acceptance.
  5. Throttle pressure
    • requests/sec utilization vs limit, near-limit dwell time.

A practical jitter proxy:

[ J_t = \frac{L_t - \operatorname{median}(L_{t-w:t})}{\operatorname{MAD}(L_{t-w:t}) + \delta} ]

with window (w) and small (\delta) for stability.


Tail model: EVT over latency exceedances

Use Peaks-Over-Threshold (POT) on latency exceedances:

[ Y_t = L_t - u_t \quad \text{for } L_t > u_t ]

Model (Y_t) by Generalized Pareto Distribution (GPD):

[ \Pr(Y \le y) = 1 - \left(1 + \xi \frac{y}{\beta}\right)^{-1/\xi} ]

Then map tail risk to expected slippage surcharge. One practical form:

[ \Delta S^{tail}_t = \alpha_1 \cdot \mathbb{E}[Y_t \mid Y_t>0] + \alpha_2 \cdot \Pr(Y_t > y^*) ]

where (y^*) is an operational shock level.


Coupling latency tails with microstructure fragility

Latency bursts hurt more when book resiliency is weak. Define fragility score (F_t) from spread expansion, cancel imbalance, refill half-life, and short-horizon OFI toxicity.

Final surcharge:

[ \Delta S_t = \Delta S^{tail}_t \cdot (1 + \kappa F_t) ]

This interaction term is where most underestimation errors live.


Online controller (state machine)

Use an explicit latency-risk state machine:

Transition signals

Promote state when any sustained condition holds:

  1. (\Pr(Y>y^*)) over threshold,
  2. burst-rate z-score over threshold,
  3. p95 slippage residual breach while latency tail elevated.

Demote with hysteresis (minimum dwell + recovery margin) to avoid flap.

Control actions by state

GREEN

AMBER

RED

SAFE


Calibration loop

Run two linked calibrations daily + intraday mini-refresh:

  1. Tail fit health
    • stability of (\xi, \beta), exceedance count sufficiency,
    • QQ diagnostics for exceedances by session/regime.
  2. Cost linkage health
    • calibration of predicted (\Delta S_t) vs realized residual slippage,
    • p95/p99 conditional coverage under latency stress.

If tail fit unstable (too few exceedances or regime break), fallback to empirical quantile surcharge instead of forcing GPD.


Data contract (must-log fields)

Per decision / child order:

Without segment timestamps and throttle counters, root-cause separation is mostly guesswork.


Validation scorecard

Minimum weekly scorecard:

  1. Tail metrics
    • latency exceedance rate, GPD (\xi) drift, burst cluster frequency.
  2. Execution outcomes
    • p50/p95/p99 slippage by state (GREEN/AMBER/RED/SAFE).
  3. Controller quality
    • precision/recall of RED entry vs future p95 breach,
    • false-positive burden (unnecessary defensive mode time).
  4. Business tradeoff
    • slippage saved vs additional underfill/opportunity cost.

Promotion criterion should be tail-first: improved p95/p99 with bounded completion damage.


Common failure modes


Rollout plan

Phase 1 — Shadow scoring

Phase 2 — Soft guardrails

Phase 3 — Controlled activation

Phase 4 — Full governance


Practical takeaway

Latency risk in execution is not about “a few ms slower on average.” It is about clustered tail events interacting with fragile liquidity.

If your slippage model treats latency as a linear average feature, you will keep paying unexpected p95 invoices. Model the tail explicitly (EVT), couple it with book fragility, and attach it to a stateful controller.


Pointers for deeper reading