Dynamic Model Averaging for Slippage Under Regime Drift

Date: 2026-02-28
Category: research (execution / slippage modeling)

Why this exists

Most slippage models fail for the same reason: they are right on average, wrong at the worst time.

You calibrate one model on recent data, then market microstructure shifts (volatility, queue resiliency, spread regime, cancellation behavior, auction pressure). The model keeps predicting yesterday's world and your tail cost explodes.

This playbook uses Dynamic Model Averaging (DMA) to maintain a live ensemble of slippage experts and adapt weights online as regimes change.

1) Core idea

Instead of picking a single winner model, keep a small portfolio of experts:

M1: linear micro-cost baseline (spread + fee + simple participation impact)
M2: square-root impact model (large metaorder behavior)
M3: transient-impact / propagator-style model (impact memory + decay)
M4: nonlinear ML quantile model (GBDT / monotonic tree with microstructure features)

At each decision step, update model weights by recent predictive performance:

[ \tilde{w}{k,t} \propto w{k,t-1}^{\lambda} \cdot p(y_t \mid M_k, x_t) ] [ w_{k,t} = \frac{\tilde{w}{k,t}}{\sum_j \tilde{w}{j,t}} ]

(y_t): realized child-order slippage (bps)
(x_t): decision-time features
(\lambda \in (0,1]): forgetting factor (smaller = faster adaptation)

Then combine predictions:

[ \hat{Q}{\alpha,t}^{\text{ens}} = \sum_k w{k,t} \hat{Q}_{\alpha,t}^{(k)} ]

where (\hat{Q}_{\alpha,t}) is quantile forecast (e.g. q50, q90, q95).

2) Data contract (minimum viable)

Unit = child order attempt/fill event.

Required fields

ts_decision, symbol, side, venue
parent_id, child_id, remaining_qty, remaining_time_sec
arrival_mid, fill_px, fill_qty, fee_bps, rebate_bps
slippage_bps_realized (vs decision benchmark)
participation_rate, urgency_mode
spread_bps, depth_topn, obi, cancel_rate, trade_rate
rv_1m, rv_5m, jump_proxy, auction_flag, session_bucket
optional: queue-position proxies, latency buckets

Guardrails

Use point-in-time features only
Separate partial-fill vs no-fill opportunity branch if possible
Preserve event ordering and deterministic replay IDs

3) Expert model design

Keep experts intentionally different (diversity > perfection).

M1 — robust linear baseline

[ y = \beta_0 + \beta_1\cdot spread + \beta_2\cdot POV + \beta_3\cdot RV + \beta_4\cdot OBI + \epsilon ]

Fast, interpretable, stable fallback
Fit with Huber/quantile regression for tail robustness

M2 — square-root impact expert

[ \text{impact} = Y\sigma\sqrt{Q/V} ]

Works well for larger clips and metaorder footprint
Use intraday volume curve normalization and symbol-specific scaling

M3 — transient-impact expert

[ y_t = \sum_{\tau<t} G(t-\tau) q_\tau + \phi^\top z_t + \epsilon_t ]

Captures impact memory/decay and clustering
Valuable in bursty or one-sided tape

M4 — nonlinear quantile ML expert

Features: spread/depth/OBI/cancel bursts/volatility/session/auction flags
Outputs q50/q90/q95 directly
Apply monotonic constraints on obvious directions (e.g., higher POV should not reduce q95 impact)

4) Online weighting mechanics

Predictive likelihood

Use Student-t likelihood for fat tails:

[ p(y_t|M_k,x_t)=t_\nu\big(y_t;\mu_{k,t}, s_{k,t}\big) ]

This prevents a single outlier from instantly zeroing a good model.

Forgetting factor

Start with (\lambda=0.97\sim0.995)
Lower (\lambda) when drift detector fires (faster adaptation)
Raise (\lambda) in calm regimes (stability)

Weight floor / cap

Floor: (w_{k,t} \ge 0.05) to avoid premature extinction
Cap: optional max weight 0.75 to prevent monoculture

Entropy monitor

[ H_t=-\sum_k w_{k,t}\log w_{k,t} ]

Very low entropy for long periods may indicate overconfidence
Very high entropy may signal unresolved regime ambiguity

5) Regime features and drift detection

Use lightweight drift signals to tune DMA responsiveness:

rolling calibration error on q90/q95
PIT uniformity drift
CUSUM/Page-Hinkley on residual mean/variance
spread/depth regime break (structural market state)

If drift alarm = ON:

reduce (\lambda) (faster forgetting)
increase uncertainty penalty in controller
tighten max POV and larger passive bias
optionally trigger SAFE mode if burn-rate breaches

6) Execution controller coupling

DMA forecast is useful only if tied to action.

Define per-decision score:

[ J(a)=\mathbb{E}[\text{slippage}|a] + \eta,\text{OppCost}(a) + \rho,\text{TailRisk}_{95}(a) ]

Action set (example):

PASSIVE_JOIN
PASSIVE_IMPROVE
MID_PEG
CONTROLLED_TAKE
SWEEP_LIMITED

Policy chooses action minimizing (J(a)) under constraints:

remaining-time SLA
participation cap
venue/session constraints
kill-switch state

State ladder

GREEN: normal DMA control
AMBER: drift warning, tighter caps
RED: tail/burn breach, defensive actions only
SAFE: minimal-risk completion protocol

7) Backtest & promotion protocol

Do not promote with mean bps only.

Offline checks

walk-forward blocks by date/session
q50/q90/q95 calibration error
CVaR95 delta vs baseline
completion reliability under time budget
turnover of expert weights (too jumpy = unstable)

Shadow mode

Run live shadow for 1-2 weeks:

emit proposed action + confidence
compare realized vs predicted quantiles
evaluate regret vs current production policy

Canary promotion

start 5% flow → 15% → 30%
rollback if any hard gate violated (q95 overshoot, burn-rate, reject surge)

8) Practical defaults (starter)

Experts: 4 (M1..M4)
Quantiles: q50, q90, q95
Forgetting factor (\lambda): 0.985 (calm), 0.96 (drift alarm)
Weight floor: 0.05
Refit cadence:
- M1 daily
- M2 weekly (with daily scale update)
- M3 every 2-3 days
- M4 daily with rolling window
Hard safety:
- max single-step POV cap
- max spread-cross budget
- emergency SAFE mode on burn-rate breach

9) Failure modes to avoid

Ensemble of near-clones
- If experts are too similar, DMA adds little adaptation.
Likelihood without heavy-tail robustness
- Gaussian likelihood overreacts to shock observations.
No feature-time hygiene
- Tiny leakage can fake great calibration.
No no-fill accounting
- Fill-only slippage underestimates true execution cost.
Action-policy mismatch
- Better forecasts are wasted if controller still uses static heuristics.

10) Minimal implementation skeleton

init models M1..M4
init weights w_k = 1/K
for each decision event t:
  x_t = point_in_time_features(t)

  for each model k:
    pred_k = model_k.predict_quantiles(x_t)   # q50/q90/q95
    ll_k   = student_t_loglik(y_t_prev, pred_k_prev)

  w_k = normalize((w_k ^ lambda_t) * exp(ll_k))
  w_k = apply_floor_and_renorm(w_k, floor=0.05)

  pred_ens = weighted_quantiles(pred_k, w_k)
  regime   = detect_regime_and_drift(metrics)
  action   = controller(pred_ens, regime, constraints)

  execute(action)
  log(decision, pred_k, w_k, pred_ens, regime, action)

  periodically refit/update each model on rolling PIT-clean window

11) What “good” looks like in production

q95 calibration error materially lower than single-model baseline
reduced tail slippage blowups during volatility/cancel-shock regimes
stable completion rates (no hidden underfill tax)
interpretable weight migration across regimes (human-auditable)
clean rollback behavior under hard gates

If those are true, DMA is doing its job: adapting without thrashing.

Closing note

Slippage is not one distribution. It is a moving mixture generated by changing microstructure.

Dynamic Model Averaging gives you a practical way to treat that reality directly: multiple hypotheses alive, continuous evidence update, explicit tail-aware control.

In short: stop searching for one forever-model. Build an ensemble that learns the regime before the regime invoices you.