Slippage Champion–Challenger Playbook (Online Regret Budget + Safe Promotion)
Date: 2026-02-27
Category: Research (Execution / Slippage Modeling)
Scope: Live model governance for single-name and basket execution
Why this playbook exists
Most slippage model stacks fail in one of two ways:
- Static champion lock-in: one model is blessed and kept too long while market microstructure drifts.
- Unsafe model hopping: frequent model swaps create instability and hidden tail risk.
This playbook introduces an operational middle path:
- keep a stable champion model for production actions,
- evaluate challengers continuously in shadow/advisory modes,
- promote only when performance and risk gates pass,
- use an explicit regret budget so experimentation never silently taxes PnL.
Core idea: model evolution should feel like controlled aviation maintenance, not mid-flight improvisation.
Problem framing
For each execution decision tick (t), define side-adjusted slippage target:
[ y_t = \text{side} \times \frac{p^{exec}_t - p^{bench}_t}{p^{bench}_t} \times 10^4 ]
We have:
- champion predictor (\hat{y}^{(C)}_t)
- challenger set ({\hat{y}^{(k)}t}{k=1..K})
Define per-tick loss (example: asymmetric quantile-aware loss):
[ \ell_t(m) = w_{\text{tail}} \cdot \rho_{\tau}(y_t - \hat{y}^{(m)}t) + w{\text{mean}}\cdot (y_t - \hat{y}^{(m)}_t)^2 ]
where (\rho_{\tau}) is pinball loss at a high quantile (e.g., (\tau=0.9) or (0.95)).
Online regret
For challenger (k) over horizon (T):
[ R_T^{(k)} = \sum_{t=1}^{T} \big(\ell_t(C) - \ell_t(k)\big) ]
- (R_T^{(k)} > 0): challenger beating champion.
- (R_T^{(k)} < 0): challenger underperforming.
But we only care if outperformance survives risk constraints (tail, fill, stability).
Architecture: four operating modes
- Shadow: challenger predicts only; no execution impact.
- Advisory: challenger proposes actions; champion still executes.
- Bounded live: challenger controls a capped traffic slice (e.g., 5–10%).
- Champion: full promotion after governance checks.
Mandatory guardrail: no direct jump from Shadow to Champion.
Data contract (must be strict)
Each decision tick must log:
- order intent id, parent/child ids
- timestamped features used by champion/challenger (same snapshot)
- champion prediction + uncertainty
- challenger predictions + uncertainty
- chosen action and realized fill outcomes
- benchmark definition metadata (arrival, decision, short-horizon markout)
- venue/session state (auction, VI, reopen, halt-adjacent)
If this contract is broken, model comparison is invalid. No promotion decisions allowed.
Model scorecard (promotion is multi-metric)
A challenger is eligible only if all gates pass.
Gate A — Predictive quality
- Lower weighted loss vs champion on rolling windows (e.g., 5d/20d)
- Better calibration for q90/q95 exceedance
- No degradation in sparse-liquidity buckets
Gate B — Execution outcomes
- p95 slippage non-inferior or improved
- CVaR improvement at desk-approved confidence level
- Fill-rate / completion SLA not worse beyond tolerance
Gate C — Stability
- lower or equal action-flip rate under similar states
- no bursty mode oscillation (hysteresis respected)
- bounded sensitivity to noisy feature perturbations
Gate D — Operational reliability
- inference latency budget respected
- freshness/staleness handling tested
- fallback behavior deterministic under degraded data
A model that “wins average bps” but fails any tail/operability gate is rejected.
Regret budget policy
Treat challenger experimentation as controlled spend:
[ \text{RegretBudget}{day} = B{bps} \times \text{TradedNotional}_{day} ]
For bounded-live traffic, track realized excess loss vs champion baseline estimate:
[ \Delta L_t = \ell_t(\text{live challenger}) - \ell_t(\text{counterfactual champion}) ]
Cumulative daily spend:
[ S_d = \sum_{t \in d} \Delta L_t ]
Policy:
- if (S_d) crosses 50% of budget: auto-throttle challenger share.
- if (S_d) crosses 100%: immediate rollback to champion.
- if (S_d < 0) (challenger gains): allow gradual ramp, still gate by tail metrics.
This prevents “learning in production” from becoming silent strategy bleed.
Safe promotion protocol
Phase 1: Shadow qualification (min sample size)
- Require (N) intents across liquidity and session strata
- Positive regret with confidence bounds
- Tail calibration within tolerance
Phase 2: Advisory consistency
- Compare champion-vs-challenger action differences
- Validate that challenger suggestions are not driven by stale/noisy features
- Operator review for major divergence clusters
Phase 3: Bounded-live canary
- Start 5% traffic cap, increase by fixed ladder (5→10→20%)
- Run regret-budget and tail-risk breakers continuously
- Require minimum dwell time per ladder step
Phase 4: Promotion vote
Promotion packet must include:
- scorecard summary (A/B/C/D gates)
- failure analyses from worst sessions
- rollback test evidence
- sign-off from quant + execution ops
If any mandatory evidence missing: no promotion.
Rollback protocol (must be instant)
Automatic rollback triggers (example):
- 2 consecutive windows with p95 exceedance > threshold
- CVaR degradation beyond tolerance
- action oscillation > max flip rate
- model service latency/freshness breach
Rollback path:
- Freeze challenger traffic to 0%
- Restore champion policy template
- Open incident record with time-bounded RCA
- Add new negative test before next canary attempt
No blame loop; only learning loop.
Reference controller pseudocode
state = "SHADOW"
traffic_share = 0.0
spent_budget = 0.0
for tick in decision_stream:
x = build_feature_snapshot(tick)
y_hat_c = champion.predict(x)
y_hat_k = challenger.predict(x)
# Always log both predictions with same feature snapshot
log_predictions(tick, x, y_hat_c, y_hat_k, state, traffic_share)
if state in {"SHADOW", "ADVISORY"}:
execute_with_champion(tick, y_hat_c)
continue
# BOUNDED_LIVE or CHAMPION mode
use_challenger = random_uniform() < traffic_share
action = policy(challenger, x) if use_challenger else policy(champion, x)
result = execute(action)
# online counterfactual loss estimate
loss_live = realized_loss(result)
loss_counterfactual = estimate_counterfactual_loss(champion, tick)
delta = loss_live - loss_counterfactual if use_challenger else 0.0
spent_budget += delta
if spent_budget > regret_budget_day:
rollback_to_champion(reason="regret budget exceeded")
state = "ADVISORY"
traffic_share = 0.0
if tail_guardrail_breached() or reliability_breached():
rollback_to_champion(reason="risk/reliability guardrail")
state = "ADVISORY"
traffic_share = 0.0
Common failure modes
Benchmark drift between models
- comparing different benchmark definitions makes “wins” fake.
Selection bias in canary flow
- routing only easy orders to challenger inflates apparent quality.
Ignoring action stability
- slight predictive gains but high action churn can worsen real fills.
No rollback drill
- policy says “instant rollback” but operationally takes 30+ minutes.
Overfitting promotion windows
- model promoted on one good week, collapses next regime.
Tail blindness
- mean bps improves while p95/CVaR gets worse.
Minimal implementation checklist
- Unified decision-tick data contract for champion and challengers
- Rolling regret computation with confidence intervals
- Regret-budget accounting and automatic throttle/rollback
- Tail metrics dashboard (q90/q95/CVaR + fill SLA)
- Canary traffic allocator with deterministic audit logs
- Promotion packet template + explicit sign-off flow
- Rollback runbook tested in simulation and live drill
- Post-promotion 2-week heightened monitoring window
Practical defaults (starter values)
- Shadow minimum: 2 weeks + representative session coverage
- Initial canary: 5% of eligible flow
- Ramp cadence: no faster than one step per full trading day
- Regret budget: desk-defined (start conservative)
- Promotion requires: all gates pass on both short (5d) and medium (20d) windows
Tune by strategy turnover and liquidity regime, not by vanity speed.
References (seed reading)
- Bubeck, S., Cesa-Bianchi, N. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems (2012).
https://arxiv.org/abs/1204.5721 - Sutton, R. S., Barto, A. G. Reinforcement Learning: An Introduction (2nd ed.).
http://incompleteideas.net/book/the-book-2nd.html - Almgren, R., Chriss, N. Optimal Execution of Portfolio Transactions (2000).
https://www.smallake.kr/wp-content/uploads/2016/03/optliq.pdf - Cartea, Á., Jaimungal, S., Penalva, J. Algorithmic and High-Frequency Trading (2015).
One-line takeaway
A better slippage model is not the one with the prettiest backtest; it is the one that survives promotion gates, respects regret budgets, and can be rolled back safely in minutes.