Distributionally Robust Slippage Modeling with Wasserstein Ambiguity Playbook
Why this matters
Most production slippage models implicitly assume that the future data distribution will look like the recent past. That assumption fails exactly when it hurts most: stress liquidity, volatility jumps, queue evaporations, and venue-specific microstructure breaks.
A distributionally robust approach asks a safer question:
- not only “what is expected slippage under one estimated distribution?”
- but “what is the worst plausible slippage in a neighborhood around that distribution?”
This gives cleaner tail protection and more stable execution controls under regime drift.
1) Problem setup
Target per decision slice:
y = realized slippage_bps
Context vector x includes:
- participation, clip size, residual horizon
- spread, top-of-book depth, queue imbalance
- short-horizon volatility, OFI/toxicity proxies
- session bucket (open/mid/close/auction), venue, tactic
Decision variable:
- action
a(urgency/tactic/participation level)
Goal:
- minimize expected slippage while enforcing tail-risk and completion constraints
2) From ERM to DRO
Standard empirical-risk minimization (ERM):
[ \min_a ; \mathbb{E}_{\hat P}[\ell(a, x, y)] ]
Distributionally robust optimization (DRO):
[ \min_a ; \sup_{Q \in \mathcal{U}(\hat P)} \mathbb{E}_{Q}[\ell(a, x, y)] ]
Where:
\hat P= empirical distribution from recent data window\mathcal{U}(\hat P)= ambiguity set (plausible distributions near\hat P)
Interpretation: choose execution action that is good not only on average history, but against nearby adverse shifts.
3) Ambiguity set design (Wasserstein ball)
A practical choice:
[ \mathcal{U}(\hat P) = {Q : W_c(Q, \hat P) \le \rho} ]
W_c= Wasserstein distance under transport costc\rho= robustness radius
Practical guidance:
- Normalize features before distance computation (z-score or robust scaling)
- Use anisotropic transport cost so critical dimensions (participation, spread, vol) get larger penalties
- Tune
\rhoby out-of-sample tail calibration (coverage + p95 regret)
Larger \rho = safer but more conservative execution.
4) Loss function for execution control
Use a composite loss:
[ \ell = y(a) + \lambda \cdot \text{CVaR}_{\alpha}(y(a)) + \eta \cdot \text{MissPenalty}(a) ]
y(a): expected implementation shortfall componentCVaR_α: tail-sensitive risk term (e.g.,α=0.95)MissPenalty: underfill/opportunity-cost penalty
This avoids a common failure: over-defensive throttling that protects slippage but misses completion deadlines.
5) Conditional DRO (context-aware)
Global ambiguity is too blunt. In production, make it context-aware:
- bucket by liquidity/volatility/session/venue
- maintain per-bucket
\hat P_band\rho_b - use fallback hierarchy (bucket → broader bucket → global) for sparse contexts
This preserves local realism while retaining robustness in data-thin segments.
6) Online adaptation loop
At each control interval:
- Observe context
x_t - Generate candidate actions
a ∈ A - For each action, compute robust score:
robust_cost(a) = sup_{Q∈U_t} E_Q[ℓ(a,x_t,y)]
- Pick action with minimum robust score under hard constraints
- Execute, observe outcomes, append to rolling buffer
- Update
\hat P_t, drift stats, and\rho_t
Robustness radius policy:
- increase
\rho_twhen drift signals spike (CUSUM/Page-Hinkley/coverage break) - decay
\rho_tgradually when calibration recovers
7) Controller states (operational)
Map robust metrics to a state machine:
- GREEN: low robust mean and low robust tail → normal POV, standard clips
- AMBER: rising robust tail or calibration stress → reduce clip, widen randomization, slower pacing
- RED: robust CVaR breach or severe drift → defensive mode (tiny clips, venue filtering, optional pause)
- SAFE: kill-switch thresholds triggered → halt aggressive tactics, complete via safest fallback policy
Hysteresis is mandatory to avoid mode flapping.
8) Evaluation framework
Predictive + robust quality
- pinball loss (q50/q90/q95)
- interval coverage stability under drift windows
- robust regret: realized cost minus robust policy baseline
Economic outcomes
- IS mean and p95/p99 deltas vs current policy
- completion reliability at deadline buckets
- frequency/duration of RED/SAFE states
Safety outcomes
- breach rate of slippage budget and CVaR budget
- policy flip-flop frequency (anti-chatter KPI)
9) Failure modes and mitigations
Over-conservatism (alpha bleed via underfill)
- add explicit miss-penalty term
- cap maximum defensiveness by residual-time urgency
Under-robustness (radius too small)
- tie
\rhoto drift diagnostics and coverage breakdowns - keep stress-window holdout in weekly calibration
- tie
Distance mis-specification
- feature scaling audits and transport-weight sanity checks
- sensitivity tests per dimension
Sparse-context instability
- hierarchical fallback pools
- minimum-support thresholds before local
\rhooverrides
10) Implementation blueprint (Vellab-friendly)
Phase 1 (1 week)
- Define robust-loss contract and runtime API
- Build rolling empirical distribution store by context bucket
- Add offline robust scorer for candidate policy replay
Phase 2 (1–2 weeks)
- Add Wasserstein-ball robust optimization for discrete action set
- Integrate CVaR + miss-penalty composite objective
- Ship GREEN/AMBER/RED state machine with hysteresis
Phase 3 (ongoing)
- Online
\rhoadaptation from drift + calibration signals - Champion–challenger robust-vs-ERM governance
- Tail-budget SLO dashboard + rollback automation
11) Practical defaults
- Action grid: small finite set (e.g., 5–9 urgency/participation combos)
- Tail term: CVaR95 with bounded influence winsorization for pathological prints
- Radius
\rho: initialize from 90th percentile recent shift magnitude, then adapt slowly - Refit cadence: daily baseline + intraday lightweight updates
- Hard rails: price bands, participation caps, reject-rate and latency kill-switches remain outside model
Bottom line
DRO slippage control with Wasserstein ambiguity is a practical upgrade when you care about tail survivability under distribution shift, not just average backtest bps.
Treat robustness as a dial (\rho) linked to live drift evidence, and pair it with completion-aware penalties so the policy stays both defensive and executable.