Slippage Modeling Regime-Switching Impact Calibration Playbook

Stop Using One Global Impact Curve — Fit by Liquidity Regime and Update Online

Why this note: A single slippage model is usually “right on average, wrong when it hurts.” Production execution should calibrate impact by intraday/liquidity regime and switch policies when the regime changes.

1) Failure Mode in One Sentence

If you fit one global impact curve across all sessions and volatility states, you will underprice tails in stressed regimes and overtrade exactly when liquidity is fragile.

2) Practical Model Stack (Structural + Data-Driven)

At decision time (t), expected cost for action (a):

[ \mathbb{E}[IS_t(a)] = C_{spread} + C_{temp\ impact} + C_{perm\ impact} + C_{timing} + C_{fees} ]

A production-friendly decomposition:

Spread term: crossing + queue-loss penalty
Temporary impact: participation/urgency-sensitive
Permanent impact proxy: adverse selection / information footprint
Timing/opportunity term: alpha decay while waiting
Fee/rebate term: venue + order-type economics

Use structure for interpretability and ML residuals for regime-specific corrections.

3) Core Equations You Actually Need

A) Baseline impact prior (square-root family)

[ I_{prior}(Q) = Y,\sigma,\sqrt{\frac{Q}{V}} ]

(Q): parent size, (V): expected session volume (or bucket ADV)
(\sigma): horizon volatility
(Y): asset/regime coefficient

This is a useful prior, not a production truth.

B) Participation form for child decisions

[ I_{temp}(u) = \eta_r \cdot u^{\beta_r}, \quad u = \frac{\text{participation rate}}{\text{target liquidity}} ]

Regime index (r) allows different ((\eta,\beta)) by state.

C) Transient decay (propagator intuition)

[ \Delta p_t = \sum_{k \le t} G_r(t-k),q_k + \epsilon_t ]

Where (G_r) decays faster/slower depending on regime (normal vs stressed refill speed).

4) Regime Definition (Simple and Robust)

Do not overcomplicate with fragile HMMs on day one. Start with deterministic buckets:

R1 Calm: tight spread, deep book, low short-horizon vol
R2 Busy: medium spread/imbalance, normal refill
R3 Stressed: wide spread, shallow depth, high cancel rate, elevated volatility

Suggested online signals:

spread_bps
depth_l1_usd, depth_l5_usd
cancel_to_trade_ratio
microprice_drift_1s
vol_1s, vol_30s
queue_age_ms / refill half-life proxy

Promote to probabilistic regime labeling only after telemetry quality is stable.

5) Calibration Pipeline (Daily + Intraday Online Updates)

Step 1 — Offline robust fit (per symbol bucket × regime)

Fit (\eta_r, \beta_r) using robust regression (Huber/quantile).
Winsorize extreme events; store raw tails separately for risk overlays.
Estimate spread/fee components directly from realized fills.

Step 2 — Tail model

Fit conditional quantiles (P50/P90/P97.5) of slippage, not only mean.
Track exceedance rate by bucket (coverage control).

Step 3 — Online coefficient refresh

Exponential forgetting update for (\eta_r) and residual bias.
Hard guardrail: freeze updates when sample size is too small or data quality degrades.

Step 4 — Policy linkage

Use cost quantiles for action selection:

[ a_t^* = \arg\min_a; \mathbb{E}[IS_t(a)] + \lambda,\mathrm{CVaR}_{q}(IS_t(a)) ]

6) Telemetry Contract (Must-Have Fields)

Decision features

decision_ts, symbol, side, parent_id, child_id
remaining_qty, time_to_deadline_ms, schedule_progress
regime_label, regime_prob

Market features

mid, spread_bps, depth_l1/l5, imbalance
trade_rate, cancel_rate, vol_1s/30s

Execution outcomes

child_qty, limit_offset_ticks, tactic
fill_qty, fill_px, fill_latency_ms, reject_count
effective_fee_bps, rebate_bps

Labels

realized_is_bps
markout_1s/5s/30s
tail_event_flag (e.g., P97.5 breach)

If regime label at decision time is missing, your calibration is untrustworthy.

7) Control States for Live Routing

NORMAL: optimize mean+tail cost as configured
GUARD: triggered by rising tail undercoverage or regime shift to R3
- cap aggression step size
- shorten passive timeout
- tighten max queue age
RECOVERY: after repeated non-fill or deadline risk
- raise participation with bounded sweep size
SAFE_EXIT: near deadline breach
- prioritize completion certainty, log explicit exception reason

Use hysteresis + minimum dwell time to avoid oscillation.

8) KPIs That Catch Model Drift Early

Coverage Error @ q: observed tail exceedance minus target
Regime Confusion Rate: post-hoc regime mismatch vs decision-time label
Tail Cost Inflation: P95/P50 ratio drift
Late Catch-Up Share: cost paid in final bucket
Venue Fee Drift: predicted vs realized fee/rebate gap

A common anti-pattern: average IS improves while Coverage Error worsens.

9) Rollout Plan (Production-Safe)

Shadow predictions for 1–2 weeks (no policy impact)
Compare old vs new by symbol-liquidity deciles
Canary rollout with notional caps and kill-switch
Promote only if:
- tail coverage improves,
- deadline breaches do not increase,
- fee drift remains bounded

10) Fast Checklist

[ ] Define 3 deterministic liquidity regimes from live microstructure signals
[ ] Fit impact coefficients per regime (not global)
[ ] Model tail quantiles and monitor coverage
[ ] Add online update with sample-size and data-quality guardrails
[ ] Wire model output to NORMAL/GUARD/RECOVERY/SAFE_EXIT state machine
[ ] Gate promotion on tail stability, not mean slippage alone

References

Almgren, R., Chriss, N. (2000), Optimal Execution of Portfolio Transactions.
Kyle, A. (1985), Continuous Auctions and Insider Trading.
Gatheral, J. (2010), No-Dynamic-Arbitrage and Market Impact.
Obizhaeva, A., Wang, J. (2013), Optimal Trading Strategy and Supply/Demand Dynamics.
Cartea, Á., Jaimungal, S., Penalva, J. (2015), Algorithmic and High-Frequency Trading.

TL;DR

Use a structural impact prior, calibrate by liquidity regime, monitor tail coverage continuously, and switch routing behavior when regime stress appears. One global slippage curve is convenient—but expensive.