Latency-Decomposed Slippage Attribution & Control Playbook

Date: 2026-02-25
Category: research
Scope: Practical execution modeling for live equities routing (KRX/KIS-compatible, venue-agnostic)

1) Why this playbook

Most slippage stacks answer: “How much did we lose?”
Operations need a better question: “Which latency segment created the loss, and what control should fire now?”

This playbook decomposes implementation shortfall into clock segments and connects each segment to an online control action.

2) Core idea: split execution by clocks, not just by cost buckets

Classical attribution (spread/impact/delay/opportunity) is necessary but insufficient in production because delay is often treated as one blob.

Instead, track a child order on four clocks:

Decision → Gateway (L_dec)
Strategy + risk checks + serialization + queueing inside our stack
Gateway → Venue ACK (L_wire)
Network + gateway + venue ingress latency
ACK → First Fill / Cancel ACK (L_queue)
Queue priority + book dynamics + adverse selection exposure
Fill → Hedge/Next Decision (L_react)
Post-fill reaction speed (inventory/risk updates)

Then model expected slippage as:

[ \mathbb{E}[S_t] = f(X_t) + g_1(L_{dec,t}) + g_2(L_{wire,t}) + g_3(L_{queue,t}) + g_4(L_{react,t}) + \epsilon_t ]

Where:

f(X_t): market-state baseline (spread, volatility, imbalance, depth, queue state)
g_i(.): non-linear latency penalties (usually convex in stressed regimes)

3) Minimal data contract (production-first)

For every child order event, persist:

IDs: parent_id, child_id, strategy_id, venue, symbol
Timestamps (monotonic + wall):
- t_decision
- t_sent_gateway
- t_venue_ack
- t_first_fill
- t_cancel_sent
- t_cancel_ack
- t_last_fill
Prices:
- decision mid (mid_dec)
- arrival mid (mid_arr) at venue ACK
- fill VWAP (px_fill)
- markouts (m5s, m30s, m60s)
LOB state snapshots at decision and ACK:
- spread, top depth, imbalance, short-horizon realized vol
Outcome flags:
- filled/partial/canceled/rejected/expired

If this contract is incomplete, any “model quality” discussion is premature.

4) Attribution formula that operators can trust

For a buy child order:

[ S = px_{fill} - mid_{dec} ]

Decompose with observable intermediate anchors:

[ S = \underbrace{(mid_{arr} - mid_{dec})}{\text{pre-arrival drift}} + \underbrace{(px{fill} - mid_{arr})}{\text{execution + queue + spread}} + \underbrace{\Delta{fees/rebates}}_{\text{net fee effect}} ]

Then attribute pre-arrival drift to L_dec + L_wire and execution term to L_queue + instantaneous microstructure state.

Practical split:

DelayCost = beta_d * (L_dec + L_wire) * sigma_1s
QueueCost = beta_q * L_queue * ToxicityScore
SpreadImpact = alpha_s * half_spread
Residual = S - (DelayCost + QueueCost + SpreadImpact + FeeAdj)

Residual should be monitored; if persistent, model is missing state variables.

5) Regime-aware penalty surfaces

Latency penalties are not stationary. Build 3 regimes:

Calm: low vol, balanced book
Stress: elevated vol or thin top depth
Toxic: high imbalance persistence + adverse markout

Fit separate quantile models per regime (q50/q90/q95), e.g.:

[ Q_{\tau}(S\mid Z) = \theta^{(r)}_{\tau} \cdot \phi(Z), \quad r \in {Calm, Stress, Toxic} ]

with Z = [L_dec, L_wire, L_queue, spread, depth, imbalance, vol, side].

Why quantiles, not only mean: live control needs tail survival, not pretty averages.

6) Online controller (what to do when latency hurts)

State machine

GREEN: tail budget healthy
AMBER: q95 forecast approaching budget
RED: projected breach

Controls by segment

If L_dec elevated:

simplify pre-trade checks to fast path where legally allowed
coalesce duplicate signal recomputations
cap internal queue length

If L_wire elevated:

venue/gateway reroute
reduce child size in high-latency paths
temporarily disable latency-sensitive passive logic

If L_queue elevated with high toxicity:

reduce passive dwell time
raise cancel/replace aggressiveness threshold
shift from passive to bounded-aggression slices

If L_react elevated:

prioritize inventory update path
delay non-critical analytics tasks

Controller objective:

[ \min_{u_t} ; \mathbb{E}[S_t \mid u_t] + \lambda \cdot \text{UnderfillRisk}(u_t) ]

subject to:

participation caps
compliance/risk constraints
p95 slippage budget

7) Calibration loop (weekly + intraday checks)

Weekly

Re-label regimes from latest month
Refit q50/q90/q95 models
Coverage test (target: q95 exceedance ≈ 5%)
Drift test by symbol bucket (liq tiers)

Intraday

rolling 30m coverage monitor
alert if q95 exceedance > 9% for 20+ mins
auto-switch to conservative control template

8) Dashboard that changes behavior

Show these in one panel:

p50/p95 realized slippage vs forecast
latency segment medians + p95 (L_dec, L_wire, L_queue, L_react)
exceedance rate by regime
budget burn-down by strategy
top symbols by residual error

If a metric does not trigger an action, remove it.

9) Failure modes & anti-footgun checks

Clock mismatch (NTP drift): use monotonic deltas for segment latency
Timestamp sampling bias: avoid only logging filled orders
Overfitting tails: enforce minimum sample per regime/symbol bucket
Control flapping: hysteresis in GREEN↔AMBER↔RED transitions
Cost-only myopia: always track underfill + alpha decay jointly

10) 30-day rollout plan

Week 1: enforce data contract, validate timestamp integrity
Week 2: baseline attribution + regime labels + dashboards
Week 3: shadow controller (no live actuation), compare counterfactuals
Week 4: live with tight guardrails (small notional), daily review

Success criteria:

q95 slippage reduced without underfill blow-up
attribution residual variance down
fewer unexplained tail events in stress regimes

References (starting points)

Almgren, R. & Chriss, N. (2000), Optimal Execution of Portfolio Transactions
https://www.smallake.kr/wp-content/uploads/2016/03/optliq.pdf
Obizhaeva, A. & Wang, J. (2013), Optimal trading strategy and supply/demand dynamics
http://web.mit.edu/wangj/www/pap/ObizhaevaWang13.pdf
Huang, W., Lehalle, C.-A., Rosenbaum, M. (2015), Queue-reactive model
https://arxiv.org/abs/1312.0563
Taranto et al. (2016), Linear models for impact of order flow (propagators)
https://arxiv.org/abs/1602.02735

One-line takeaway

Treat slippage as a latency-segmented control problem: when you can pinpoint which clock is bleeding, you can intervene before p95 damage compounds.