Tail-Conditional Slippage Model Playbook (Online Quantiles + CVaR Controller)
Date: 2026-02-26
Category: Research (Execution / Slippage Modeling)
Scope: Intraday single-name + basket execution (KR-focused, portable)
Why this model exists
Most desks still optimize average slippage. That helps the median day, but PnL pain usually comes from a handful of tail events:
- queue collapse right after posting,
- spread shock during urgency ramp,
- hidden liquidity vacuum near session transitions.
This playbook models slippage as a conditional distribution, not a point estimate:
- Predict multiple slippage quantiles online,
- Estimate tail loss (CVaR / expected shortfall),
- Spend a dynamic tail-risk budget while executing.
Core idea: execution should be controlled by tail risk per remaining notional, not only expected bps.
Problem setup
At decision tick (t), for child action over horizon (\Delta):
[ y_t = \text{signed slippage}_{t\rightarrow t+\Delta} \quad (\text{bps}) ]
Given context (x_t), estimate conditional quantiles:
[ q_\tau(x_t) = Q_{Y|X}(\tau \mid x_t), \quad \tau \in {0.5, 0.75, 0.9, 0.95} ]
Define tail-risk metric at confidence (\alpha) (e.g., 0.95):
[ \text{CVaR}\alpha(x_t) = \mathbb{E}[Y \mid Y \ge q\alpha(x_t), X=x_t] ]
Then control execution aggressiveness based on projected CVaR burn vs remaining budget.
Feature design (execution-realistic)
Market microstructure
- spread (ticks/bps), depth slope, top-of-book imbalance,
- microprice drift and short-horizon realized vol,
- cancel-to-trade ratio, queue depletion rate,
- last-trade sign run length (flow persistence).
Own execution state
- participation, remaining quantity, time-to-go,
- queue age (time since post), replace count,
- child order size percentile vs displayed depth,
- fill hazard proxy (recent passive fill probability).
Session / venue state
- open/close/auction/proximity buckets,
- venue flag (KRX/NXT), VI/sidecar proximity,
- symbol liquidity class and beta bucket,
- event clock (macro/news windows).
Modeling stack
Stage 1 — Multi-quantile predictor
Use separate or multi-head quantile models:
- LightGBM/CatBoost quantile objectives, or
- linear quantile models for very low-latency paths.
Pinball loss for each (\tau):
[ \mathcal{L}\tau(y,\hat{q}) = (\tau - \mathbf{1}{y<\hat{q}})(y-\hat{q}) ]
Monotonicity fix (avoid quantile crossing):
- either constrained training,
- or post-process with isotonic rearrangement so (\hat{q}{0.5} \le \hat{q}{0.75} \le \hat{q}{0.9} \le \hat{q}{0.95}).
Stage 2 — Online calibration layer
Raw quantiles drift intraday. Add rolling calibration:
- maintain calibration residuals by symbol-time bucket,
- adjust (\hat{q}\tau) with adaptive offset (\delta\tau(t)),
- use exponentially decayed windows to react faster during regime shifts.
Goal: keep empirical exceedance near target (e.g., 5% above (q_{0.95})).
Stage 3 — Tail estimator above (q_{0.95})
For observations beyond estimated (q_{0.95}), fit a lightweight excess-loss model:
- mean excess approximation, or
- EVT-style generalized Pareto tail on exceedances (if sample support is adequate).
This yields a more stable online (\widehat{\text{CVaR}}_{0.95}) than naive averaging.
Tail-budget controller
Let total slippage budget be (B_{tot}) (bps-weighted notional). Remaining at (t):
[ B_t = B_{tot} - \sum_{i=t_0}^{t} y_i w_i ]
Projected tail burn over short horizon (h):
[ \rho_t^{tail} = \frac{\mathbb{E}[\sum_{j=t}^{t+h} \text{CVaR}_{0.95}(x_j) w_j \mid \mathcal{F}_t]}{\max(B_t,\epsilon)} ]
Controller tiers:
- GREEN: (\rho_t^{tail} < 0.30)
- YELLOW: (0.30 \le \rho_t^{tail} < 0.60)
- ORANGE: (0.60 \le \rho_t^{tail} < 0.90)
- RED: (\rho_t^{tail} \ge 0.90) or repeated q95 breaches
Actions by tier
GREEN
- baseline schedule,
- passive-first routing,
- normal repost cadence.
YELLOW
- reduce passive dwell timeout,
- trim child size percentile,
- tighten stale quote cancellation.
ORANGE
- cap participation escalation,
- prefer certainty on residual schedule,
- temporarily quarantine venue with persistent tail exceedance.
RED
- enter safe mode (controlled unwind protocol),
- freeze non-essential aggression,
- page operator with tail snapshot.
Use hysteresis (N-of-M confirms) to prevent flapping.
Practical training & validation
Data slicing
- Purged time splits by session/day,
- symbol-liquidity stratification,
- dedicated stress slices: open, close, VI-adjacent windows.
Metrics (must report)
- Quantile calibration error (per (\tau)),
- q95/q99 realized slippage reduction vs baseline,
- CVaR reduction at fixed completion SLA,
- underfill opportunity cost,
- mode-switch stability (no overreaction).
Counterfactual replay protocol
- Replay historical order books + own order events,
- compare baseline controller vs tail-budget controller,
- enforce same parent-order intents for fair comparison.
Production guardrails
Feature freshness SLO
- stale key features invalidate tail estimates and force fallback policy.
Sample floor for tail fit
- if exceedance count too low, use conservative fallback CVaR proxy.
Session-aware parameters
- separate calibration states for open/close/auction windows.
Hard stop on repeated breaches
- N breaches within M minutes triggers automatic defensive mode.
Explainability payload
- top feature contributions,
- current quantiles + CVaR,
- budget burn and state-transition reason.
Pseudocode
for t in decision_ticks:
x = build_features(t)
q50, q75, q90, q95 = quantile_model.predict(x)
q50, q75, q90, q95 = enforce_monotone(q50, q75, q90, q95)
q95 = q95 + online_calibrator.offset("q95", bucket=t.bucket)
cvar95 = tail_model.estimate_cvar95(x, q95)
tail_burn = forecast_tail_burn(cvar95, remain_qty, horizon=h)
rho_tail = tail_burn / max(remaining_budget, 1e-6)
state = transition_with_hysteresis(state, rho_tail, breach_q95=(obs_slip > q95))
action = policy_from_state(state, x, remain_qty)
send(action)
Common failure modes
Optimizing mean only → looks good in averages, fails in tail-heavy regimes.
Uncalibrated quantiles → model says “95%” but real exceedance is 12%+.
No quantile crossing control → unstable downstream tail estimates.
Ignoring completion constraints → apparent tail improvement via hidden underfill.
No venue/session separation → blended model over/underreacts in critical windows.
Next experiments
- Joint modeling of slippage tail + fill-hazard tail (multi-task).
- Adaptive (\alpha): shift from 0.95 to 0.99 during stress windows.
- Portfolio-aware tail budget: correlated names share a global risk envelope.
- Conformalized online wrappers for finite-sample exceedance guarantees.
One-line takeaway
Move from “expected slippage” to “tail-conditioned slippage”: online quantiles + CVaR budget control turns rare execution blowups into manageable, policy-driven events.