Tail-Conditional Slippage Model Playbook (Online Quantiles + CVaR Controller)

Date: 2026-02-26
Category: Research (Execution / Slippage Modeling)
Scope: Intraday single-name + basket execution (KR-focused, portable)

Why this model exists

Most desks still optimize average slippage. That helps the median day, but PnL pain usually comes from a handful of tail events:

queue collapse right after posting,
spread shock during urgency ramp,
hidden liquidity vacuum near session transitions.

This playbook models slippage as a conditional distribution, not a point estimate:

Predict multiple slippage quantiles online,
Estimate tail loss (CVaR / expected shortfall),
Spend a dynamic tail-risk budget while executing.

Core idea: execution should be controlled by tail risk per remaining notional, not only expected bps.

Problem setup

At decision tick (t), for child action over horizon (\Delta):

[ y_t = \text{signed slippage}_{t\rightarrow t+\Delta} \quad (\text{bps}) ]

Given context (x_t), estimate conditional quantiles:

[ q_\tau(x_t) = Q_{Y|X}(\tau \mid x_t), \quad \tau \in {0.5, 0.75, 0.9, 0.95} ]

Define tail-risk metric at confidence (\alpha) (e.g., 0.95):

[ \text{CVaR}\alpha(x_t) = \mathbb{E}[Y \mid Y \ge q\alpha(x_t), X=x_t] ]

Then control execution aggressiveness based on projected CVaR burn vs remaining budget.

Feature design (execution-realistic)

Market microstructure

spread (ticks/bps), depth slope, top-of-book imbalance,
microprice drift and short-horizon realized vol,
cancel-to-trade ratio, queue depletion rate,
last-trade sign run length (flow persistence).

Own execution state

participation, remaining quantity, time-to-go,
queue age (time since post), replace count,
child order size percentile vs displayed depth,
fill hazard proxy (recent passive fill probability).

Session / venue state

open/close/auction/proximity buckets,
venue flag (KRX/NXT), VI/sidecar proximity,
symbol liquidity class and beta bucket,
event clock (macro/news windows).

Modeling stack

Stage 1 — Multi-quantile predictor

Use separate or multi-head quantile models:

LightGBM/CatBoost quantile objectives, or
linear quantile models for very low-latency paths.

Pinball loss for each (\tau):

[ \mathcal{L}\tau(y,\hat{q}) = (\tau - \mathbf{1}{y<\hat{q}})(y-\hat{q}) ]

Monotonicity fix (avoid quantile crossing):

either constrained training,
or post-process with isotonic rearrangement so (\hat{q}{0.5} \le \hat{q}{0.75} \le \hat{q}{0.9} \le \hat{q}{0.95}).

Stage 2 — Online calibration layer

Raw quantiles drift intraday. Add rolling calibration:

maintain calibration residuals by symbol-time bucket,
adjust (\hat{q}\tau) with adaptive offset (\delta\tau(t)),
use exponentially decayed windows to react faster during regime shifts.

Goal: keep empirical exceedance near target (e.g., 5% above (q_{0.95})).

Stage 3 — Tail estimator above (q_{0.95})

For observations beyond estimated (q_{0.95}), fit a lightweight excess-loss model:

mean excess approximation, or
EVT-style generalized Pareto tail on exceedances (if sample support is adequate).

This yields a more stable online (\widehat{\text{CVaR}}_{0.95}) than naive averaging.

Tail-budget controller

Let total slippage budget be (B_{tot}) (bps-weighted notional). Remaining at (t):

[ B_t = B_{tot} - \sum_{i=t_0}^{t} y_i w_i ]

Projected tail burn over short horizon (h):

[ \rho_t^{tail} = \frac{\mathbb{E}[\sum_{j=t}^{t+h} \text{CVaR}_{0.95}(x_j) w_j \mid \mathcal{F}_t]}{\max(B_t,\epsilon)} ]

Controller tiers:

GREEN: (\rho_t^{tail} < 0.30)
YELLOW: (0.30 \le \rho_t^{tail} < 0.60)
ORANGE: (0.60 \le \rho_t^{tail} < 0.90)
RED: (\rho_t^{tail} \ge 0.90) or repeated q95 breaches

Actions by tier

GREEN

baseline schedule,
passive-first routing,
normal repost cadence.

YELLOW

reduce passive dwell timeout,
trim child size percentile,
tighten stale quote cancellation.

ORANGE

cap participation escalation,
prefer certainty on residual schedule,
temporarily quarantine venue with persistent tail exceedance.

RED

enter safe mode (controlled unwind protocol),
freeze non-essential aggression,
page operator with tail snapshot.

Use hysteresis (N-of-M confirms) to prevent flapping.

Practical training & validation

Data slicing

Purged time splits by session/day,
symbol-liquidity stratification,
dedicated stress slices: open, close, VI-adjacent windows.

Metrics (must report)

Quantile calibration error (per (\tau)),
q95/q99 realized slippage reduction vs baseline,
CVaR reduction at fixed completion SLA,
underfill opportunity cost,
mode-switch stability (no overreaction).

Counterfactual replay protocol

Replay historical order books + own order events,
compare baseline controller vs tail-budget controller,
enforce same parent-order intents for fair comparison.

Production guardrails

Feature freshness SLO
- stale key features invalidate tail estimates and force fallback policy.
Sample floor for tail fit
- if exceedance count too low, use conservative fallback CVaR proxy.
Session-aware parameters
- separate calibration states for open/close/auction windows.
Hard stop on repeated breaches
- N breaches within M minutes triggers automatic defensive mode.
Explainability payload
- top feature contributions,
- current quantiles + CVaR,
- budget burn and state-transition reason.

Pseudocode

for t in decision_ticks:
    x = build_features(t)

    q50, q75, q90, q95 = quantile_model.predict(x)
    q50, q75, q90, q95 = enforce_monotone(q50, q75, q90, q95)

    q95 = q95 + online_calibrator.offset("q95", bucket=t.bucket)
    cvar95 = tail_model.estimate_cvar95(x, q95)

    tail_burn = forecast_tail_burn(cvar95, remain_qty, horizon=h)
    rho_tail = tail_burn / max(remaining_budget, 1e-6)

    state = transition_with_hysteresis(state, rho_tail, breach_q95=(obs_slip > q95))
    action = policy_from_state(state, x, remain_qty)

    send(action)

Common failure modes

Optimizing mean only → looks good in averages, fails in tail-heavy regimes.
Uncalibrated quantiles → model says “95%” but real exceedance is 12%+.
No quantile crossing control → unstable downstream tail estimates.
Ignoring completion constraints → apparent tail improvement via hidden underfill.
No venue/session separation → blended model over/underreacts in critical windows.

Next experiments

Joint modeling of slippage tail + fill-hazard tail (multi-task).
Adaptive (\alpha): shift from 0.95 to 0.99 during stress windows.
Portfolio-aware tail budget: correlated names share a global risk envelope.
Conformalized online wrappers for finite-sample exceedance guarantees.

One-line takeaway

Move from “expected slippage” to “tail-conditioned slippage”: online quantiles + CVaR budget control turns rare execution blowups into manageable, policy-driven events.