Dynamic Model Averaging for Slippage Under Regime Drift
Date: 2026-02-28
Category: research (execution / slippage modeling)
Why this exists
Most slippage models fail for the same reason: they are right on average, wrong at the worst time.
You calibrate one model on recent data, then market microstructure shifts (volatility, queue resiliency, spread regime, cancellation behavior, auction pressure). The model keeps predicting yesterday's world and your tail cost explodes.
This playbook uses Dynamic Model Averaging (DMA) to maintain a live ensemble of slippage experts and adapt weights online as regimes change.
1) Core idea
Instead of picking a single winner model, keep a small portfolio of experts:
- M1: linear micro-cost baseline (spread + fee + simple participation impact)
- M2: square-root impact model (large metaorder behavior)
- M3: transient-impact / propagator-style model (impact memory + decay)
- M4: nonlinear ML quantile model (GBDT / monotonic tree with microstructure features)
At each decision step, update model weights by recent predictive performance:
[ \tilde{w}{k,t} \propto w{k,t-1}^{\lambda} \cdot p(y_t \mid M_k, x_t) ] [ w_{k,t} = \frac{\tilde{w}{k,t}}{\sum_j \tilde{w}{j,t}} ]
- (y_t): realized child-order slippage (bps)
- (x_t): decision-time features
- (\lambda \in (0,1]): forgetting factor (smaller = faster adaptation)
Then combine predictions:
[ \hat{Q}{\alpha,t}^{\text{ens}} = \sum_k w{k,t} \hat{Q}_{\alpha,t}^{(k)} ]
where (\hat{Q}_{\alpha,t}) is quantile forecast (e.g. q50, q90, q95).
2) Data contract (minimum viable)
Unit = child order attempt/fill event.
Required fields
ts_decision,symbol,side,venueparent_id,child_id,remaining_qty,remaining_time_secarrival_mid,fill_px,fill_qty,fee_bps,rebate_bpsslippage_bps_realized(vs decision benchmark)participation_rate,urgency_modespread_bps,depth_topn,obi,cancel_rate,trade_raterv_1m,rv_5m,jump_proxy,auction_flag,session_bucket- optional: queue-position proxies, latency buckets
Guardrails
- Use point-in-time features only
- Separate partial-fill vs no-fill opportunity branch if possible
- Preserve event ordering and deterministic replay IDs
3) Expert model design
Keep experts intentionally different (diversity > perfection).
M1 — robust linear baseline
[ y = \beta_0 + \beta_1\cdot spread + \beta_2\cdot POV + \beta_3\cdot RV + \beta_4\cdot OBI + \epsilon ]
- Fast, interpretable, stable fallback
- Fit with Huber/quantile regression for tail robustness
M2 — square-root impact expert
[ \text{impact} = Y\sigma\sqrt{Q/V} ]
- Works well for larger clips and metaorder footprint
- Use intraday volume curve normalization and symbol-specific scaling
M3 — transient-impact expert
[ y_t = \sum_{\tau<t} G(t-\tau) q_\tau + \phi^\top z_t + \epsilon_t ]
- Captures impact memory/decay and clustering
- Valuable in bursty or one-sided tape
M4 — nonlinear quantile ML expert
- Features: spread/depth/OBI/cancel bursts/volatility/session/auction flags
- Outputs q50/q90/q95 directly
- Apply monotonic constraints on obvious directions (e.g., higher POV should not reduce q95 impact)
4) Online weighting mechanics
Predictive likelihood
Use Student-t likelihood for fat tails:
[ p(y_t|M_k,x_t)=t_\nu\big(y_t;\mu_{k,t}, s_{k,t}\big) ]
This prevents a single outlier from instantly zeroing a good model.
Forgetting factor
- Start with (\lambda=0.97\sim0.995)
- Lower (\lambda) when drift detector fires (faster adaptation)
- Raise (\lambda) in calm regimes (stability)
Weight floor / cap
- Floor: (w_{k,t} \ge 0.05) to avoid premature extinction
- Cap: optional max weight 0.75 to prevent monoculture
Entropy monitor
[ H_t=-\sum_k w_{k,t}\log w_{k,t} ]
- Very low entropy for long periods may indicate overconfidence
- Very high entropy may signal unresolved regime ambiguity
5) Regime features and drift detection
Use lightweight drift signals to tune DMA responsiveness:
- rolling calibration error on q90/q95
- PIT uniformity drift
- CUSUM/Page-Hinkley on residual mean/variance
- spread/depth regime break (structural market state)
If drift alarm = ON:
- reduce (\lambda) (faster forgetting)
- increase uncertainty penalty in controller
- tighten max POV and larger passive bias
- optionally trigger SAFE mode if burn-rate breaches
6) Execution controller coupling
DMA forecast is useful only if tied to action.
Define per-decision score:
[ J(a)=\mathbb{E}[\text{slippage}|a] + \eta,\text{OppCost}(a) + \rho,\text{TailRisk}_{95}(a) ]
Action set (example):
PASSIVE_JOINPASSIVE_IMPROVEMID_PEGCONTROLLED_TAKESWEEP_LIMITED
Policy chooses action minimizing (J(a)) under constraints:
- remaining-time SLA
- participation cap
- venue/session constraints
- kill-switch state
State ladder
- GREEN: normal DMA control
- AMBER: drift warning, tighter caps
- RED: tail/burn breach, defensive actions only
- SAFE: minimal-risk completion protocol
7) Backtest & promotion protocol
Do not promote with mean bps only.
Offline checks
- walk-forward blocks by date/session
- q50/q90/q95 calibration error
- CVaR95 delta vs baseline
- completion reliability under time budget
- turnover of expert weights (too jumpy = unstable)
Shadow mode
Run live shadow for 1-2 weeks:
- emit proposed action + confidence
- compare realized vs predicted quantiles
- evaluate regret vs current production policy
Canary promotion
- start 5% flow → 15% → 30%
- rollback if any hard gate violated (q95 overshoot, burn-rate, reject surge)
8) Practical defaults (starter)
- Experts: 4 (M1..M4)
- Quantiles: q50, q90, q95
- Forgetting factor (\lambda): 0.985 (calm), 0.96 (drift alarm)
- Weight floor: 0.05
- Refit cadence:
- M1 daily
- M2 weekly (with daily scale update)
- M3 every 2-3 days
- M4 daily with rolling window
- Hard safety:
- max single-step POV cap
- max spread-cross budget
- emergency SAFE mode on burn-rate breach
9) Failure modes to avoid
Ensemble of near-clones
- If experts are too similar, DMA adds little adaptation.
Likelihood without heavy-tail robustness
- Gaussian likelihood overreacts to shock observations.
No feature-time hygiene
- Tiny leakage can fake great calibration.
No no-fill accounting
- Fill-only slippage underestimates true execution cost.
Action-policy mismatch
- Better forecasts are wasted if controller still uses static heuristics.
10) Minimal implementation skeleton
init models M1..M4
init weights w_k = 1/K
for each decision event t:
x_t = point_in_time_features(t)
for each model k:
pred_k = model_k.predict_quantiles(x_t) # q50/q90/q95
ll_k = student_t_loglik(y_t_prev, pred_k_prev)
w_k = normalize((w_k ^ lambda_t) * exp(ll_k))
w_k = apply_floor_and_renorm(w_k, floor=0.05)
pred_ens = weighted_quantiles(pred_k, w_k)
regime = detect_regime_and_drift(metrics)
action = controller(pred_ens, regime, constraints)
execute(action)
log(decision, pred_k, w_k, pred_ens, regime, action)
periodically refit/update each model on rolling PIT-clean window
11) What “good” looks like in production
- q95 calibration error materially lower than single-model baseline
- reduced tail slippage blowups during volatility/cancel-shock regimes
- stable completion rates (no hidden underfill tax)
- interpretable weight migration across regimes (human-auditable)
- clean rollback behavior under hard gates
If those are true, DMA is doing its job: adapting without thrashing.
Closing note
Slippage is not one distribution. It is a moving mixture generated by changing microstructure.
Dynamic Model Averaging gives you a practical way to treat that reality directly: multiple hypotheses alive, continuous evidence update, explicit tail-aware control.
In short: stop searching for one forever-model. Build an ensemble that learns the regime before the regime invoices you.