Conformal Slippage Control Playbook (Online Calibration for p95 Survival)

Date: 2026-02-24
Category: research
Context: Live trading readiness for Vellab (execution risk under regime drift)

Why this playbook

Most slippage models fail in production for one boring reason: calibration drift.

The mean forecast can still look “fine” while tails blow up.
Liquidity regimes shift faster than weekly retraining cycles.
Execution control is often keyed to expected bps, while real PnL damage comes from p90/p95 events.

This playbook combines:

A base slippage predictor (feature-rich, fast inference)
Online conformal calibration (distribution-free interval control)
A budget-aware execution controller

Goal: keep realized slippage tail risk inside explicit limits, even when market microstructure changes intraday.

Core idea in one line

Don’t trust point estimates; enforce calibrated upper bounds and drive execution urgency from remaining tail budget.

System design

1) Base predictor (fast, stable)

Predict per-child-order expected shortfall in bps with a lightweight model (e.g., gradient boosting / linear + interactions):

Spread, effective spread
Short-horizon volatility (1m/5m)
Participation rate (realized and planned)
Queue/imbalance proxies (if available)
Time-of-day + session bucket
Latency + cancel/replace intensity
Venue / order type / side

Output:

mu_hat (expected slippage)
optional q_hat (raw quantile estimates, if model supports)

Design constraint

Inference must stay cheap enough for per-slice routing decisions (sub-second budget end-to-end).

2) Nonconformity score

For each completed child fill i:

observed slippage: y_i
model forecast: mu_hat_i
residual: r_i = y_i - mu_hat_i

Use one-sided upper-tail nonconformity:

a_i = max(0, r_i)

Why one-sided: execution control mainly cares about not underestimating bad-cost tail.

3) Online conformal wrapper (rolling)

Maintain rolling buffer A_t of recent nonconformity scores (e.g., last 1,000–5,000 fills, optionally regime-weighted).

For target miscoverage alpha:

q_alpha_t = Quantile(A_t, 1 - alpha)
upper bound: U_t(x) = mu_hat_t(x) + q_alpha_t

Interpretation:

For alpha = 0.05, target is ~95% coverage for future slippage under exchangeability-like conditions.

Drift adaptation

Use weighted/segmented buffers, not a single global memory:

Separate by volatility regime (low/med/high)
Separate by open/close auction proximity
Decay old samples exponentially

This keeps calibration responsive without throwing away all history.

4) Budget-aware execution controller

Define per-parent-order budget:

B_total (max allowed implementation shortfall in bps)
B_used
B_left = B_total - B_used
Q_left (remaining quantity fraction)

Use calibrated upper bound U_t to estimate worst-case incremental burn.

Control states

Harvest: U_t comfortably below per-slice budget
- more passive, lower urgency, wider patience window
Balance: U_t near budget edge
- mixed passive/aggressive, moderate urgency
Salvage: projected tail burn exceeds headroom
- prioritize completion reliability, reduce exposure window, tighter kill-switch checks

A simple gating score:

stress = U_t / max(eps, target_slice_budget)

with hysteresis bands to avoid flip-flop.

Production algorithm (minimal)

for each decision tick t:
  x_t <- build microstructure/features
  mu_hat <- base_model.predict(x_t)

  q95 <- conformal_quantile(buffer=A_regime(t), level=0.95)
  U95 <- mu_hat + q95

  budget_slice <- B_left / max(1, slices_left)
  stress <- U95 / max(eps, budget_slice)

  if stress < 0.8: state = HARVEST
  else if stress < 1.2: state = BALANCE
  else: state = SALVAGE

  policy <- map_state_to_execution(state)
  place/modify/cancel orders via policy

on fill completion:
  y <- realized slippage(fill)
  a <- max(0, y - mu_hat_at_decision)
  update conformal buffer by regime
  update B_used

Monitoring (what actually matters)

Track these online (5m/30m/day):

Coverage error
- realized(y <= U95) vs target 95%
Tail exceedance magnitude
- mean/median of (y - U95)_+
Budget burn velocity
- bps consumed per elapsed participation/time
State occupancy
- Harvest/Balance/Salvage dwell time
Opportunity-cost companion metric
- underfill and alpha decay penalty (don’t “solve” slippage by never trading)

If coverage drops materially (e.g., <90% for sustained window), auto-tighten controller and trigger recalibration alarm.

Practical parameter defaults (starting point)

Conformal window: 2,000 fills
Exponential decay half-life: 1 trading day
Regimes: volatility terciles × TOD bucket (open/mid/close)
Primary risk target: U95
Secondary guardrail: U90 and U99 shadow metrics
Hysteresis dwell: minimum 60–120s before state downgrade

These are starter values; production values must be tuned from live paper-trading logs.

Backtest-to-live validation ladder

Historical replay
- Compare raw model vs conformal-wrapped coverage stability.
Paper trading (shadow)
- Log predicted U95, realized slippage, and state transitions.
Tiny capital canary
- Hard caps on notional + automatic freeze on coverage breach.
Progressive scale-up
- Increase size only if coverage and budget-burn SLOs hold for N sessions.

Failure modes and fixes

A) Coverage looks good, PnL still weak

Cause: controller too conservative, opportunity cost too high. Fix: jointly optimize slippage + completion/alpha metrics; add explicit tradeoff coefficient.

B) Coverage collapses at open/close

Cause: non-stationary microstructure around auctions. Fix: dedicated auction regime buffers and separate conformal quantiles.

C) State oscillation (thrashing)

Cause: no hysteresis / noisy stress score. Fix: smoothing + minimum dwell + asymmetric enter/exit thresholds.

D) “Calibrated” but lagging on regime break

Cause: buffer memory too long. Fix: stronger decay, drift detector (CUSUM/Page-Hinkley), temporary defensive multiplier.

Integration notes for Vellab execution stack

Treat conformal module as a thin post-model layer (no invasive model rewrite).
Keep a dedicated table for calibration artifacts:
- decision timestamp
- features hash/regime
- mu_hat, U90/U95/U99
- realized slippage
- chosen state and action
Expose real-time dashboard panel:
- live coverage gap
- tail breach count
- budget headroom
Couple with existing kill-switch ladder (coverage breach can escalate risk state).

Bottom line

A slippage model is only useful if it stays honest under drift.

Online conformal calibration gives a practical honesty layer: “How bad can this slice get with controlled error rate?”

Once that bound exists, execution policy becomes a disciplined control problem, not a vibes-based urgency argument.