Slippage Model Risk Stress-Testing Playbook (Parameter Uncertainty + Adversarial Regimes)
Date: 2026-02-26
Category: Research (Execution / Slippage Modeling)
Scope: Pre-trade + intraday controls for single-name and basket execution
Why this playbook exists
A lot of execution stacks now have decent slippage predictors. The next failure mode is subtler:
- model works in calm data,
- confidence looks strong,
- then one hidden regime shift turns “optimized execution” into a fast way to burn the tail budget.
This playbook treats slippage modeling as a model-risk problem, not just a forecasting problem.
Core idea:
- Quantify parameter uncertainty explicitly,
- Stress the model with plausible microstructure shocks,
- Control execution with stressed risk (not base-case mean bps).
If the model is uncertain, policy must become conservative before realized shortfall explodes.
Problem framing
At decision time (t), define signed slippage target:
[ y_t = \frac{\text{execution price} - \text{benchmark}}{\text{benchmark}} \times 10^4 ; (\text{bps, signed by side}) ]
Base model predicts:
[ \hat{y}_t = f(x_t; \theta) ]
But we need distribution-aware output under uncertainty:
[ Y_t \mid x_t \sim \mathcal{D}(\mu_t, \sigma^2_{\text{aleatoric},t}, \sigma^2_{\text{epistemic},t}) ]
and stressed quantiles:
[ q_{\tau}^{\text{stress}}(x_t) = Q_{Y|X,,\mathcal{S}}(\tau \mid x_t) ]
where (\mathcal{S}) is a stress scenario (spread shock, depth collapse, cancel burst, latency shock, etc.).
Uncertainty decomposition (what can go wrong)
1) Aleatoric uncertainty (market randomness)
Irreducible noise even with perfect parameters:
- random queue jumps,
- hidden liquidity reveal,
- intermittent flow toxicity.
2) Epistemic uncertainty (model ignorance)
Parameter/model uncertainty due to:
- regime drift,
- sparse data for rare states,
- extrapolation outside training support.
3) Structural uncertainty
Wrong model class assumptions:
- linear impact where convexity matters,
- stationarity assumptions across session boundaries,
- features that leak in backtest but are stale in live.
Execution policy should react mostly to (2) and (3), because they are warning signs that the model itself is untrustworthy right now.
Feature set for stress engines
Use state variables that are both predictive and stressable:
- spread (ticks/bps), top-N depth, imbalance,
- cancel-to-trade ratio, queue depletion speed,
- short-horizon realized vol, microprice drift,
- participation rate, child-size percentile, time-to-go,
- own ACK/cancel/replace latency,
- session flags (open/close/auction/VI-adjacent), venue flags.
Avoid features that cannot be perturbed realistically in scenario generation.
Modeling stack
Layer A — Base slippage distribution model
Recommended minimum:
- quantile model for (q50, q75, q90, q95),
- plus mean/variance head (or residual model) for decomposition.
Use purged time splits and symbol-liquidity stratification.
Layer B — Parameter uncertainty estimator
Three practical options:
Block bootstrap ensemble
- resample by time blocks to preserve serial dependence,
- train (M) models (f^{(m)}), estimate epistemic spread.
Bayesian / variational approximation
- posterior over parameters (p(\theta|\mathcal{D})),
- sample predictive distributions.
Conformalized residual wrapper
- robust online interval calibration,
- fast operational guardrail even if model is misspecified.
Ensemble variance proxy:
[ \widehat{\sigma^2_{\text{epistemic},t}} = \text{Var}_{m=1..M}\left[f^{(m)}(x_t)\right] ]
Layer C — Scenario stress transformer
Transform current state (x_t) into stressed states (x_t^{(s)}):
[ x_t^{(s)} = T_s(x_t), \quad s \in {\text{spread+},\text{depth-},\text{cancel+},\text{latency+},\text{combo}} ]
Examples:
- spread Ă— 1.5 / 2.0,
- top-of-book depth Ă— 0.5,
- cancel-to-trade ratio +2(\sigma),
- decision→ACK latency +20ms / +50ms,
- combined “fragile-close” scenario.
Stress transforms must be empirically bounded by historical extremes per symbol-liquidity bucket.
Stress metrics that matter operationally
For each parent order, report:
- Base p95 slippage (\hat{q}_{0.95})
- Stressed p95 slippage (\hat{q}^{stress}_{0.95})
- Stress uplift ratio [ U = \frac{\hat{q}^{stress}{0.95}}{\max(\hat{q}{0.95}, \epsilon)} ]
- Budget breach probability under stress: [ P\left(\text{cumulative slippage} > B_{rem} \mid \mathcal{S}\right) ]
- Model fragility score (MFS): [ \text{MFS}t = w_1 z(\sigma{epistemic,t}) + w_2 z(U_t) + w_3 z(\text{coverage error}_t) ]
MFS is the single controller input for risk state transitions.
Controller: uncertainty-aware execution states
State machine:
- NORMAL: low MFS, normal schedule
- GUARDED: moderate MFS, lower passive dwell, smaller slices
- DEFENSIVE: high MFS, cap aggression escalation, venue selectivity
- SAFE: critical MFS, protect capital; controlled completion only
Example thresholds (calibrate by desk):
- NORMAL → GUARDED: MFS > 1.0 for 3/5 ticks
- GUARDED → DEFENSIVE: MFS > 1.8 or stress breach prob > 25%
- DEFENSIVE → SAFE: MFS > 2.5 or repeated realized q95 exceedance
Recovery requires hysteresis + hold time to avoid oscillation.
Pre-trade workflow (before sending parent order)
- Build current base forecast distribution.
- Run scenario set (\mathcal{S}) (single + combo shocks).
- Compute stressed p95/CVaR and breach probability.
- Choose initial execution template by worst-case scenario:
- aggressive POV allowed only if worst-case breach prob < threshold,
- otherwise start guarded/defensive from first slice.
- Store a machine-readable “stress card” with assumptions.
This prevents starting too aggressively and “discovering” fragility after losses are already booked.
Intraday online adaptation
Every decision cycle:
- refresh feature freshness checks,
- update conformal coverage diagnostics,
- recompute MFS and stressed quantiles,
- transition controller state with hysteresis,
- attach explainability payload (top features + stress driver).
If freshness fails or model service is degraded, force fallback policy:
- participation cap,
- larger spacing,
- deterministic completion guardrails.
Validation protocol (must pass before production)
A) Forecast quality
- quantile calibration error by session bucket,
- exceedance rates for q90/q95,
- residual autocorrelation checks.
B) Uncertainty quality
- interval coverage under rolling windows,
- sharpness vs coverage tradeoff,
- epistemic spikes around known regime changes.
C) Stress realism
- scenario values within historical plausibility bounds,
- historical replay during known stress days,
- false-positive/false-negative cost analysis.
D) Policy outcomes (counterfactual replay)
Compare baseline vs uncertainty-aware controller on same intents:
- p95/CVaR slippage,
- completion SLA,
- opportunity cost of caution,
- number/duration of defensive states.
Do not ship if tail reduction comes only from unacceptable underfill.
Pseudocode
for t in decision_ticks:
x = build_features(t)
assert freshness_ok(x), "fallback"
# Base predictive distribution
q = quantile_model.predict(x) # q50,q75,q90,q95
mu, sigma_a = mean_var_model.predict(x)
# Epistemic uncertainty (ensemble/bootstrap)
preds = [m.predict_mean(x) for m in ensemble]
sigma_e = np.std(preds)
# Scenario stress
stressed_q95 = []
breach_probs = []
for s in scenarios:
xs = apply_scenario(x, s)
q95_s = quantile_model.predict(xs).q95
stressed_q95.append(q95_s)
breach_probs.append(estimate_breach_prob(xs, budget_remaining))
uplift = max(stressed_q95) / max(q.q95, 1e-6)
mfs = w1*z(sigma_e) + w2*z(uplift) + w3*z(online_coverage_error())
state = transition_with_hysteresis(state, mfs, max(breach_probs))
action = policy_from_state(state, x, budget_remaining)
send(action)
Common failure modes
Bootstrap done IID, not block-based
- underestimates uncertainty in serially correlated flow.
Stress scenarios too mild
- dashboard looks safe, live is not.
No feature freshness gating
- stale depth/latency makes stress math meaningless.
Conformal window too long
- adapts too slowly during regime breaks.
Controller without hysteresis
- action flapping creates extra costs.
Tail win, fill disaster
- p95 improves only because execution quietly stopped participating.
Minimal rollout plan (practical)
Phase 1 (shadow):
- compute MFS + stressed metrics only,
- no policy effect, collect diagnostics.
Phase 2 (advisory):
- suggest state transitions to operator,
- compare accepted vs ignored suggestions.
Phase 3 (bounded automation):
- auto NORMAL↔GUARDED,
- require explicit gate for DEFENSIVE/SAFE.
Phase 4 (full with kill-switch):
- automated transitions with audit trail,
- operator override and fast rollback.
Implementation checklist
- Block-bootstrap or equivalent epistemic estimator in place
- Scenario library versioned (with per-symbol bounds)
- Online conformal / calibration monitor wired
- MFS thresholds backtested + approved
- Hysteresis + hold-time logic verified
- Fallback execution policy tested under feature staleness
- Stress card logging in TCA pipeline
- Weekly drift review (coverage + uplift + realized tails)
References (seed reading)
- Almgren, R., Chriss, N. Optimal Execution of Portfolio Transactions (2000).
https://www.smallake.kr/wp-content/uploads/2016/03/optliq.pdf - Jaisson, T. et al. Market impact as anticipation of the order flow imbalance (arXiv:1402.1288).
https://arxiv.org/abs/1402.1288 - Jusselin, P. et al. No-arbitrage implies power-law market impact and rough volatility (arXiv:1805.07134).
https://arxiv.org/abs/1805.07134 - Szymanski, G. et al. The two square root laws of market impact... (arXiv:2311.18283).
https://arxiv.org/abs/2311.18283
One-line takeaway
A slippage model is only as good as its behavior under stress; if uncertainty rises, execution should automatically spend less risk before the market sends the invoice.