Latency-Decomposed Slippage Attribution & Control Playbook
Date: 2026-02-25
Category: research
Scope: Practical execution modeling for live equities routing (KRX/KIS-compatible, venue-agnostic)
1) Why this playbook
Most slippage stacks answer: “How much did we lose?”
Operations need a better question: “Which latency segment created the loss, and what control should fire now?”
This playbook decomposes implementation shortfall into clock segments and connects each segment to an online control action.
2) Core idea: split execution by clocks, not just by cost buckets
Classical attribution (spread/impact/delay/opportunity) is necessary but insufficient in production because delay is often treated as one blob.
Instead, track a child order on four clocks:
- Decision → Gateway (
L_dec)
Strategy + risk checks + serialization + queueing inside our stack - Gateway → Venue ACK (
L_wire)
Network + gateway + venue ingress latency - ACK → First Fill / Cancel ACK (
L_queue)
Queue priority + book dynamics + adverse selection exposure - Fill → Hedge/Next Decision (
L_react)
Post-fill reaction speed (inventory/risk updates)
Then model expected slippage as:
[ \mathbb{E}[S_t] = f(X_t) + g_1(L_{dec,t}) + g_2(L_{wire,t}) + g_3(L_{queue,t}) + g_4(L_{react,t}) + \epsilon_t ]
Where:
f(X_t): market-state baseline (spread, volatility, imbalance, depth, queue state)g_i(.): non-linear latency penalties (usually convex in stressed regimes)
3) Minimal data contract (production-first)
For every child order event, persist:
- IDs:
parent_id,child_id,strategy_id,venue,symbol - Timestamps (monotonic + wall):
t_decisiont_sent_gatewayt_venue_ackt_first_fillt_cancel_sentt_cancel_ackt_last_fill
- Prices:
- decision mid (
mid_dec) - arrival mid (
mid_arr) at venue ACK - fill VWAP (
px_fill) - markouts (
m5s,m30s,m60s)
- decision mid (
- LOB state snapshots at decision and ACK:
- spread, top depth, imbalance, short-horizon realized vol
- Outcome flags:
- filled/partial/canceled/rejected/expired
If this contract is incomplete, any “model quality” discussion is premature.
4) Attribution formula that operators can trust
For a buy child order:
[ S = px_{fill} - mid_{dec} ]
Decompose with observable intermediate anchors:
[ S = \underbrace{(mid_{arr} - mid_{dec})}{\text{pre-arrival drift}} + \underbrace{(px{fill} - mid_{arr})}{\text{execution + queue + spread}} + \underbrace{\Delta{fees/rebates}}_{\text{net fee effect}} ]
Then attribute pre-arrival drift to L_dec + L_wire and execution term to L_queue + instantaneous microstructure state.
Practical split:
DelayCost = beta_d * (L_dec + L_wire) * sigma_1sQueueCost = beta_q * L_queue * ToxicityScoreSpreadImpact = alpha_s * half_spreadResidual = S - (DelayCost + QueueCost + SpreadImpact + FeeAdj)
Residual should be monitored; if persistent, model is missing state variables.
5) Regime-aware penalty surfaces
Latency penalties are not stationary. Build 3 regimes:
- Calm: low vol, balanced book
- Stress: elevated vol or thin top depth
- Toxic: high imbalance persistence + adverse markout
Fit separate quantile models per regime (q50/q90/q95), e.g.:
[ Q_{\tau}(S\mid Z) = \theta^{(r)}_{\tau} \cdot \phi(Z), \quad r \in {Calm, Stress, Toxic} ]
with Z = [L_dec, L_wire, L_queue, spread, depth, imbalance, vol, side].
Why quantiles, not only mean: live control needs tail survival, not pretty averages.
6) Online controller (what to do when latency hurts)
State machine
- GREEN: tail budget healthy
- AMBER: q95 forecast approaching budget
- RED: projected breach
Controls by segment
If L_dec elevated:
- simplify pre-trade checks to fast path where legally allowed
- coalesce duplicate signal recomputations
- cap internal queue length
If L_wire elevated:
- venue/gateway reroute
- reduce child size in high-latency paths
- temporarily disable latency-sensitive passive logic
If L_queue elevated with high toxicity:
- reduce passive dwell time
- raise cancel/replace aggressiveness threshold
- shift from passive to bounded-aggression slices
If L_react elevated:
- prioritize inventory update path
- delay non-critical analytics tasks
Controller objective:
[ \min_{u_t} ; \mathbb{E}[S_t \mid u_t] + \lambda \cdot \text{UnderfillRisk}(u_t) ]
subject to:
- participation caps
- compliance/risk constraints
- p95 slippage budget
7) Calibration loop (weekly + intraday checks)
Weekly
- Re-label regimes from latest month
- Refit q50/q90/q95 models
- Coverage test (target: q95 exceedance ≈ 5%)
- Drift test by symbol bucket (liq tiers)
Intraday
- rolling 30m coverage monitor
- alert if q95 exceedance > 9% for 20+ mins
- auto-switch to conservative control template
8) Dashboard that changes behavior
Show these in one panel:
- p50/p95 realized slippage vs forecast
- latency segment medians + p95 (
L_dec,L_wire,L_queue,L_react) - exceedance rate by regime
- budget burn-down by strategy
- top symbols by residual error
If a metric does not trigger an action, remove it.
9) Failure modes & anti-footgun checks
- Clock mismatch (NTP drift): use monotonic deltas for segment latency
- Timestamp sampling bias: avoid only logging filled orders
- Overfitting tails: enforce minimum sample per regime/symbol bucket
- Control flapping: hysteresis in GREEN↔AMBER↔RED transitions
- Cost-only myopia: always track underfill + alpha decay jointly
10) 30-day rollout plan
- Week 1: enforce data contract, validate timestamp integrity
- Week 2: baseline attribution + regime labels + dashboards
- Week 3: shadow controller (no live actuation), compare counterfactuals
- Week 4: live with tight guardrails (small notional), daily review
Success criteria:
- q95 slippage reduced without underfill blow-up
- attribution residual variance down
- fewer unexplained tail events in stress regimes
References (starting points)
- Almgren, R. & Chriss, N. (2000), Optimal Execution of Portfolio Transactions
https://www.smallake.kr/wp-content/uploads/2016/03/optliq.pdf - Obizhaeva, A. & Wang, J. (2013), Optimal trading strategy and supply/demand dynamics
http://web.mit.edu/wangj/www/pap/ObizhaevaWang13.pdf - Huang, W., Lehalle, C.-A., Rosenbaum, M. (2015), Queue-reactive model
https://arxiv.org/abs/1312.0563 - Taranto et al. (2016), Linear models for impact of order flow (propagators)
https://arxiv.org/abs/1602.02735
One-line takeaway
Treat slippage as a latency-segmented control problem: when you can pinpoint which clock is bleeding, you can intervene before p95 damage compounds.