Slippage Champion–Challenger Playbook (Online Regret Budget + Safe Promotion)

Date: 2026-02-27
Category: Research (Execution / Slippage Modeling)
Scope: Live model governance for single-name and basket execution

Why this playbook exists

Most slippage model stacks fail in one of two ways:

Static champion lock-in: one model is blessed and kept too long while market microstructure drifts.
Unsafe model hopping: frequent model swaps create instability and hidden tail risk.

This playbook introduces an operational middle path:

keep a stable champion model for production actions,
evaluate challengers continuously in shadow/advisory modes,
promote only when performance and risk gates pass,
use an explicit regret budget so experimentation never silently taxes PnL.

Core idea: model evolution should feel like controlled aviation maintenance, not mid-flight improvisation.

Problem framing

For each execution decision tick (t), define side-adjusted slippage target:

[ y_t = \text{side} \times \frac{p^{exec}_t - p^{bench}_t}{p^{bench}_t} \times 10^4 ]

We have:

champion predictor (\hat{y}^{(C)}_t)
challenger set ({\hat{y}^{(k)}t}{k=1..K})

Define per-tick loss (example: asymmetric quantile-aware loss):

[ \ell_t(m) = w_{\text{tail}} \cdot \rho_{\tau}(y_t - \hat{y}^{(m)}t) + w{\text{mean}}\cdot (y_t - \hat{y}^{(m)}_t)^2 ]

where (\rho_{\tau}) is pinball loss at a high quantile (e.g., (\tau=0.9) or (0.95)).

Online regret

For challenger (k) over horizon (T):

[ R_T^{(k)} = \sum_{t=1}^{T} \big(\ell_t(C) - \ell_t(k)\big) ]

(R_T^{(k)} > 0): challenger beating champion.
(R_T^{(k)} < 0): challenger underperforming.

But we only care if outperformance survives risk constraints (tail, fill, stability).

Architecture: four operating modes

Shadow: challenger predicts only; no execution impact.
Advisory: challenger proposes actions; champion still executes.
Bounded live: challenger controls a capped traffic slice (e.g., 5–10%).
Champion: full promotion after governance checks.

Mandatory guardrail: no direct jump from Shadow to Champion.

Data contract (must be strict)

Each decision tick must log:

order intent id, parent/child ids
timestamped features used by champion/challenger (same snapshot)
champion prediction + uncertainty
challenger predictions + uncertainty
chosen action and realized fill outcomes
benchmark definition metadata (arrival, decision, short-horizon markout)
venue/session state (auction, VI, reopen, halt-adjacent)

If this contract is broken, model comparison is invalid. No promotion decisions allowed.

Model scorecard (promotion is multi-metric)

A challenger is eligible only if all gates pass.

Gate A — Predictive quality

Lower weighted loss vs champion on rolling windows (e.g., 5d/20d)
Better calibration for q90/q95 exceedance
No degradation in sparse-liquidity buckets

Gate B — Execution outcomes

p95 slippage non-inferior or improved
CVaR improvement at desk-approved confidence level
Fill-rate / completion SLA not worse beyond tolerance

Gate C — Stability

lower or equal action-flip rate under similar states
no bursty mode oscillation (hysteresis respected)
bounded sensitivity to noisy feature perturbations

Gate D — Operational reliability

inference latency budget respected
freshness/staleness handling tested
fallback behavior deterministic under degraded data

A model that “wins average bps” but fails any tail/operability gate is rejected.

Regret budget policy

Treat challenger experimentation as controlled spend:

[ \text{RegretBudget}{day} = B{bps} \times \text{TradedNotional}_{day} ]

For bounded-live traffic, track realized excess loss vs champion baseline estimate:

[ \Delta L_t = \ell_t(\text{live challenger}) - \ell_t(\text{counterfactual champion}) ]

Cumulative daily spend:

[ S_d = \sum_{t \in d} \Delta L_t ]

Policy:

if (S_d) crosses 50% of budget: auto-throttle challenger share.
if (S_d) crosses 100%: immediate rollback to champion.
if (S_d < 0) (challenger gains): allow gradual ramp, still gate by tail metrics.

This prevents “learning in production” from becoming silent strategy bleed.

Safe promotion protocol

Phase 1: Shadow qualification (min sample size)

Require (N) intents across liquidity and session strata
Positive regret with confidence bounds
Tail calibration within tolerance

Phase 2: Advisory consistency

Compare champion-vs-challenger action differences
Validate that challenger suggestions are not driven by stale/noisy features
Operator review for major divergence clusters

Phase 3: Bounded-live canary

Start 5% traffic cap, increase by fixed ladder (5→10→20%)
Run regret-budget and tail-risk breakers continuously
Require minimum dwell time per ladder step

Phase 4: Promotion vote

Promotion packet must include:

scorecard summary (A/B/C/D gates)
failure analyses from worst sessions
rollback test evidence
sign-off from quant + execution ops

If any mandatory evidence missing: no promotion.

Rollback protocol (must be instant)

Automatic rollback triggers (example):

2 consecutive windows with p95 exceedance > threshold
CVaR degradation beyond tolerance
action oscillation > max flip rate
model service latency/freshness breach

Rollback path:

Freeze challenger traffic to 0%
Restore champion policy template
Open incident record with time-bounded RCA
Add new negative test before next canary attempt

No blame loop; only learning loop.

Reference controller pseudocode

state = "SHADOW"
traffic_share = 0.0
spent_budget = 0.0

for tick in decision_stream:
    x = build_feature_snapshot(tick)
    y_hat_c = champion.predict(x)
    y_hat_k = challenger.predict(x)

    # Always log both predictions with same feature snapshot
    log_predictions(tick, x, y_hat_c, y_hat_k, state, traffic_share)

    if state in {"SHADOW", "ADVISORY"}:
        execute_with_champion(tick, y_hat_c)
        continue

    # BOUNDED_LIVE or CHAMPION mode
    use_challenger = random_uniform() < traffic_share
    action = policy(challenger, x) if use_challenger else policy(champion, x)
    result = execute(action)

    # online counterfactual loss estimate
    loss_live = realized_loss(result)
    loss_counterfactual = estimate_counterfactual_loss(champion, tick)
    delta = loss_live - loss_counterfactual if use_challenger else 0.0
    spent_budget += delta

    if spent_budget > regret_budget_day:
        rollback_to_champion(reason="regret budget exceeded")
        state = "ADVISORY"
        traffic_share = 0.0

    if tail_guardrail_breached() or reliability_breached():
        rollback_to_champion(reason="risk/reliability guardrail")
        state = "ADVISORY"
        traffic_share = 0.0

Common failure modes

Benchmark drift between models
- comparing different benchmark definitions makes “wins” fake.
Selection bias in canary flow
- routing only easy orders to challenger inflates apparent quality.
Ignoring action stability
- slight predictive gains but high action churn can worsen real fills.
No rollback drill
- policy says “instant rollback” but operationally takes 30+ minutes.
Overfitting promotion windows
- model promoted on one good week, collapses next regime.
Tail blindness
- mean bps improves while p95/CVaR gets worse.

Minimal implementation checklist

Unified decision-tick data contract for champion and challengers
Rolling regret computation with confidence intervals
Regret-budget accounting and automatic throttle/rollback
Tail metrics dashboard (q90/q95/CVaR + fill SLA)
Canary traffic allocator with deterministic audit logs
Promotion packet template + explicit sign-off flow
Rollback runbook tested in simulation and live drill
Post-promotion 2-week heightened monitoring window

Practical defaults (starter values)

Shadow minimum: 2 weeks + representative session coverage
Initial canary: 5% of eligible flow
Ramp cadence: no faster than one step per full trading day
Regret budget: desk-defined (start conservative)
Promotion requires: all gates pass on both short (5d) and medium (20d) windows

Tune by strategy turnover and liquidity regime, not by vanity speed.

References (seed reading)

Bubeck, S., Cesa-Bianchi, N. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems (2012).
https://arxiv.org/abs/1204.5721
Sutton, R. S., Barto, A. G. Reinforcement Learning: An Introduction (2nd ed.).
http://incompleteideas.net/book/the-book-2nd.html
Almgren, R., Chriss, N. Optimal Execution of Portfolio Transactions (2000).
https://www.smallake.kr/wp-content/uploads/2016/03/optliq.pdf
Cartea, Á., Jaimungal, S., Penalva, J. Algorithmic and High-Frequency Trading (2015).

One-line takeaway

A better slippage model is not the one with the prettiest backtest; it is the one that survives promotion gates, respects regret budgets, and can be rolled back safely in minutes.