Slippage Champion–Challenger Playbook (Online Regret Budget + Safe Promotion)

2026-02-27 · finance

Slippage Champion–Challenger Playbook (Online Regret Budget + Safe Promotion)

Date: 2026-02-27
Category: Research (Execution / Slippage Modeling)
Scope: Live model governance for single-name and basket execution


Why this playbook exists

Most slippage model stacks fail in one of two ways:

  1. Static champion lock-in: one model is blessed and kept too long while market microstructure drifts.
  2. Unsafe model hopping: frequent model swaps create instability and hidden tail risk.

This playbook introduces an operational middle path:

Core idea: model evolution should feel like controlled aviation maintenance, not mid-flight improvisation.


Problem framing

For each execution decision tick (t), define side-adjusted slippage target:

[ y_t = \text{side} \times \frac{p^{exec}_t - p^{bench}_t}{p^{bench}_t} \times 10^4 ]

We have:

Define per-tick loss (example: asymmetric quantile-aware loss):

[ \ell_t(m) = w_{\text{tail}} \cdot \rho_{\tau}(y_t - \hat{y}^{(m)}t) + w{\text{mean}}\cdot (y_t - \hat{y}^{(m)}_t)^2 ]

where (\rho_{\tau}) is pinball loss at a high quantile (e.g., (\tau=0.9) or (0.95)).

Online regret

For challenger (k) over horizon (T):

[ R_T^{(k)} = \sum_{t=1}^{T} \big(\ell_t(C) - \ell_t(k)\big) ]

But we only care if outperformance survives risk constraints (tail, fill, stability).


Architecture: four operating modes

  1. Shadow: challenger predicts only; no execution impact.
  2. Advisory: challenger proposes actions; champion still executes.
  3. Bounded live: challenger controls a capped traffic slice (e.g., 5–10%).
  4. Champion: full promotion after governance checks.

Mandatory guardrail: no direct jump from Shadow to Champion.


Data contract (must be strict)

Each decision tick must log:

If this contract is broken, model comparison is invalid. No promotion decisions allowed.


Model scorecard (promotion is multi-metric)

A challenger is eligible only if all gates pass.

Gate A — Predictive quality

Gate B — Execution outcomes

Gate C — Stability

Gate D — Operational reliability

A model that “wins average bps” but fails any tail/operability gate is rejected.


Regret budget policy

Treat challenger experimentation as controlled spend:

[ \text{RegretBudget}{day} = B{bps} \times \text{TradedNotional}_{day} ]

For bounded-live traffic, track realized excess loss vs champion baseline estimate:

[ \Delta L_t = \ell_t(\text{live challenger}) - \ell_t(\text{counterfactual champion}) ]

Cumulative daily spend:

[ S_d = \sum_{t \in d} \Delta L_t ]

Policy:

This prevents “learning in production” from becoming silent strategy bleed.


Safe promotion protocol

Phase 1: Shadow qualification (min sample size)

Phase 2: Advisory consistency

Phase 3: Bounded-live canary

Phase 4: Promotion vote

Promotion packet must include:

If any mandatory evidence missing: no promotion.


Rollback protocol (must be instant)

Automatic rollback triggers (example):

Rollback path:

  1. Freeze challenger traffic to 0%
  2. Restore champion policy template
  3. Open incident record with time-bounded RCA
  4. Add new negative test before next canary attempt

No blame loop; only learning loop.


Reference controller pseudocode

state = "SHADOW"
traffic_share = 0.0
spent_budget = 0.0

for tick in decision_stream:
    x = build_feature_snapshot(tick)
    y_hat_c = champion.predict(x)
    y_hat_k = challenger.predict(x)

    # Always log both predictions with same feature snapshot
    log_predictions(tick, x, y_hat_c, y_hat_k, state, traffic_share)

    if state in {"SHADOW", "ADVISORY"}:
        execute_with_champion(tick, y_hat_c)
        continue

    # BOUNDED_LIVE or CHAMPION mode
    use_challenger = random_uniform() < traffic_share
    action = policy(challenger, x) if use_challenger else policy(champion, x)
    result = execute(action)

    # online counterfactual loss estimate
    loss_live = realized_loss(result)
    loss_counterfactual = estimate_counterfactual_loss(champion, tick)
    delta = loss_live - loss_counterfactual if use_challenger else 0.0
    spent_budget += delta

    if spent_budget > regret_budget_day:
        rollback_to_champion(reason="regret budget exceeded")
        state = "ADVISORY"
        traffic_share = 0.0

    if tail_guardrail_breached() or reliability_breached():
        rollback_to_champion(reason="risk/reliability guardrail")
        state = "ADVISORY"
        traffic_share = 0.0

Common failure modes

  1. Benchmark drift between models

    • comparing different benchmark definitions makes “wins” fake.
  2. Selection bias in canary flow

    • routing only easy orders to challenger inflates apparent quality.
  3. Ignoring action stability

    • slight predictive gains but high action churn can worsen real fills.
  4. No rollback drill

    • policy says “instant rollback” but operationally takes 30+ minutes.
  5. Overfitting promotion windows

    • model promoted on one good week, collapses next regime.
  6. Tail blindness

    • mean bps improves while p95/CVaR gets worse.

Minimal implementation checklist


Practical defaults (starter values)

Tune by strategy turnover and liquidity regime, not by vanity speed.


References (seed reading)


One-line takeaway

A better slippage model is not the one with the prettiest backtest; it is the one that survives promotion gates, respects regret budgets, and can be rolled back safely in minutes.