Conformal Slippage Control Playbook (Online Calibration for p95 Survival)
- Date: 2026-02-24
- Category: research
- Context: Live trading readiness for Vellab (execution risk under regime drift)
Why this playbook
Most slippage models fail in production for one boring reason: calibration drift.
- The mean forecast can still look “fine” while tails blow up.
- Liquidity regimes shift faster than weekly retraining cycles.
- Execution control is often keyed to expected bps, while real PnL damage comes from p90/p95 events.
This playbook combines:
- A base slippage predictor (feature-rich, fast inference)
- Online conformal calibration (distribution-free interval control)
- A budget-aware execution controller
Goal: keep realized slippage tail risk inside explicit limits, even when market microstructure changes intraday.
Core idea in one line
Don’t trust point estimates; enforce calibrated upper bounds and drive execution urgency from remaining tail budget.
System design
1) Base predictor (fast, stable)
Predict per-child-order expected shortfall in bps with a lightweight model (e.g., gradient boosting / linear + interactions):
- Spread, effective spread
- Short-horizon volatility (1m/5m)
- Participation rate (realized and planned)
- Queue/imbalance proxies (if available)
- Time-of-day + session bucket
- Latency + cancel/replace intensity
- Venue / order type / side
Output:
mu_hat(expected slippage)- optional
q_hat(raw quantile estimates, if model supports)
Design constraint
Inference must stay cheap enough for per-slice routing decisions (sub-second budget end-to-end).
2) Nonconformity score
For each completed child fill i:
- observed slippage:
y_i - model forecast:
mu_hat_i - residual:
r_i = y_i - mu_hat_i
Use one-sided upper-tail nonconformity:
a_i = max(0, r_i)
Why one-sided: execution control mainly cares about not underestimating bad-cost tail.
3) Online conformal wrapper (rolling)
Maintain rolling buffer A_t of recent nonconformity scores (e.g., last 1,000–5,000 fills, optionally regime-weighted).
For target miscoverage alpha:
q_alpha_t = Quantile(A_t, 1 - alpha)- upper bound:
U_t(x) = mu_hat_t(x) + q_alpha_t
Interpretation:
- For alpha = 0.05, target is ~95% coverage for future slippage under exchangeability-like conditions.
Drift adaptation
Use weighted/segmented buffers, not a single global memory:
- Separate by volatility regime (low/med/high)
- Separate by open/close auction proximity
- Decay old samples exponentially
This keeps calibration responsive without throwing away all history.
4) Budget-aware execution controller
Define per-parent-order budget:
B_total(max allowed implementation shortfall in bps)B_usedB_left = B_total - B_usedQ_left(remaining quantity fraction)
Use calibrated upper bound U_t to estimate worst-case incremental burn.
Control states
- Harvest:
U_tcomfortably below per-slice budget- more passive, lower urgency, wider patience window
- Balance:
U_tnear budget edge- mixed passive/aggressive, moderate urgency
- Salvage: projected tail burn exceeds headroom
- prioritize completion reliability, reduce exposure window, tighter kill-switch checks
A simple gating score:
stress = U_t / max(eps, target_slice_budget)
with hysteresis bands to avoid flip-flop.
Production algorithm (minimal)
for each decision tick t:
x_t <- build microstructure/features
mu_hat <- base_model.predict(x_t)
q95 <- conformal_quantile(buffer=A_regime(t), level=0.95)
U95 <- mu_hat + q95
budget_slice <- B_left / max(1, slices_left)
stress <- U95 / max(eps, budget_slice)
if stress < 0.8: state = HARVEST
else if stress < 1.2: state = BALANCE
else: state = SALVAGE
policy <- map_state_to_execution(state)
place/modify/cancel orders via policy
on fill completion:
y <- realized slippage(fill)
a <- max(0, y - mu_hat_at_decision)
update conformal buffer by regime
update B_used
Monitoring (what actually matters)
Track these online (5m/30m/day):
- Coverage error
realized(y <= U95)vs target 95%
- Tail exceedance magnitude
- mean/median of
(y - U95)_+
- mean/median of
- Budget burn velocity
- bps consumed per elapsed participation/time
- State occupancy
- Harvest/Balance/Salvage dwell time
- Opportunity-cost companion metric
- underfill and alpha decay penalty (don’t “solve” slippage by never trading)
If coverage drops materially (e.g., <90% for sustained window), auto-tighten controller and trigger recalibration alarm.
Practical parameter defaults (starting point)
- Conformal window: 2,000 fills
- Exponential decay half-life: 1 trading day
- Regimes: volatility terciles × TOD bucket (open/mid/close)
- Primary risk target: U95
- Secondary guardrail: U90 and U99 shadow metrics
- Hysteresis dwell: minimum 60–120s before state downgrade
These are starter values; production values must be tuned from live paper-trading logs.
Backtest-to-live validation ladder
- Historical replay
- Compare raw model vs conformal-wrapped coverage stability.
- Paper trading (shadow)
- Log predicted
U95, realized slippage, and state transitions.
- Log predicted
- Tiny capital canary
- Hard caps on notional + automatic freeze on coverage breach.
- Progressive scale-up
- Increase size only if coverage and budget-burn SLOs hold for N sessions.
Failure modes and fixes
A) Coverage looks good, PnL still weak
Cause: controller too conservative, opportunity cost too high. Fix: jointly optimize slippage + completion/alpha metrics; add explicit tradeoff coefficient.
B) Coverage collapses at open/close
Cause: non-stationary microstructure around auctions. Fix: dedicated auction regime buffers and separate conformal quantiles.
C) State oscillation (thrashing)
Cause: no hysteresis / noisy stress score. Fix: smoothing + minimum dwell + asymmetric enter/exit thresholds.
D) “Calibrated” but lagging on regime break
Cause: buffer memory too long. Fix: stronger decay, drift detector (CUSUM/Page-Hinkley), temporary defensive multiplier.
Integration notes for Vellab execution stack
- Treat conformal module as a thin post-model layer (no invasive model rewrite).
- Keep a dedicated table for calibration artifacts:
- decision timestamp
- features hash/regime
mu_hat,U90/U95/U99- realized slippage
- chosen state and action
- Expose real-time dashboard panel:
- live coverage gap
- tail breach count
- budget headroom
- Couple with existing kill-switch ladder (coverage breach can escalate risk state).
Bottom line
A slippage model is only useful if it stays honest under drift.
Online conformal calibration gives a practical honesty layer: “How bad can this slice get with controlled error rate?”
Once that bound exists, execution policy becomes a disciplined control problem, not a vibes-based urgency argument.