Slippage Model Risk Stress-Testing Playbook (Parameter Uncertainty + Adversarial Regimes)

Date: 2026-02-26
Category: Research (Execution / Slippage Modeling)
Scope: Pre-trade + intraday controls for single-name and basket execution

Why this playbook exists

A lot of execution stacks now have decent slippage predictors. The next failure mode is subtler:

model works in calm data,
confidence looks strong,
then one hidden regime shift turns “optimized execution” into a fast way to burn the tail budget.

This playbook treats slippage modeling as a model-risk problem, not just a forecasting problem.

Core idea:

Quantify parameter uncertainty explicitly,
Stress the model with plausible microstructure shocks,
Control execution with stressed risk (not base-case mean bps).

If the model is uncertain, policy must become conservative before realized shortfall explodes.

Problem framing

At decision time (t), define signed slippage target:

[ y_t = \frac{\text{execution price} - \text{benchmark}}{\text{benchmark}} \times 10^4 ; (\text{bps, signed by side}) ]

Base model predicts:

[ \hat{y}_t = f(x_t; \theta) ]

But we need distribution-aware output under uncertainty:

[ Y_t \mid x_t \sim \mathcal{D}(\mu_t, \sigma^2_{\text{aleatoric},t}, \sigma^2_{\text{epistemic},t}) ]

and stressed quantiles:

[ q_{\tau}^{\text{stress}}(x_t) = Q_{Y|X,,\mathcal{S}}(\tau \mid x_t) ]

where (\mathcal{S}) is a stress scenario (spread shock, depth collapse, cancel burst, latency shock, etc.).

Uncertainty decomposition (what can go wrong)

1) Aleatoric uncertainty (market randomness)

Irreducible noise even with perfect parameters:

random queue jumps,
hidden liquidity reveal,
intermittent flow toxicity.

2) Epistemic uncertainty (model ignorance)

Parameter/model uncertainty due to:

regime drift,
sparse data for rare states,
extrapolation outside training support.

3) Structural uncertainty

Wrong model class assumptions:

linear impact where convexity matters,
stationarity assumptions across session boundaries,
features that leak in backtest but are stale in live.

Execution policy should react mostly to (2) and (3), because they are warning signs that the model itself is untrustworthy right now.

Feature set for stress engines

Use state variables that are both predictive and stressable:

spread (ticks/bps), top-N depth, imbalance,
cancel-to-trade ratio, queue depletion speed,
short-horizon realized vol, microprice drift,
participation rate, child-size percentile, time-to-go,
own ACK/cancel/replace latency,
session flags (open/close/auction/VI-adjacent), venue flags.

Avoid features that cannot be perturbed realistically in scenario generation.

Modeling stack

Layer A — Base slippage distribution model

Recommended minimum:

quantile model for (q50, q75, q90, q95),
plus mean/variance head (or residual model) for decomposition.

Use purged time splits and symbol-liquidity stratification.

Layer B — Parameter uncertainty estimator

Three practical options:

Block bootstrap ensemble
- resample by time blocks to preserve serial dependence,
- train (M) models (f^{(m)}), estimate epistemic spread.
Bayesian / variational approximation
- posterior over parameters (p(\theta|\mathcal{D})),
- sample predictive distributions.
Conformalized residual wrapper
- robust online interval calibration,
- fast operational guardrail even if model is misspecified.

Ensemble variance proxy:

[ \widehat{\sigma^2_{\text{epistemic},t}} = \text{Var}_{m=1..M}\left[f^{(m)}(x_t)\right] ]

Layer C — Scenario stress transformer

Transform current state (x_t) into stressed states (x_t^{(s)}):

[ x_t^{(s)} = T_s(x_t), \quad s \in {\text{spread+},\text{depth-},\text{cancel+},\text{latency+},\text{combo}} ]

Examples:

spread × 1.5 / 2.0,
top-of-book depth × 0.5,
cancel-to-trade ratio +2(\sigma),
decision→ACK latency +20ms / +50ms,
combined “fragile-close” scenario.

Stress transforms must be empirically bounded by historical extremes per symbol-liquidity bucket.

Stress metrics that matter operationally

For each parent order, report:

Base p95 slippage (\hat{q}_{0.95})
Stressed p95 slippage (\hat{q}^{stress}_{0.95})
Stress uplift ratio [ U = \frac{\hat{q}^{stress}{0.95}}{\max(\hat{q}{0.95}, \epsilon)} ]
Budget breach probability under stress: [ P\left(\text{cumulative slippage} > B_{rem} \mid \mathcal{S}\right) ]
Model fragility score (MFS): [ \text{MFS}t = w_1 z(\sigma{epistemic,t}) + w_2 z(U_t) + w_3 z(\text{coverage error}_t) ]

MFS is the single controller input for risk state transitions.

Controller: uncertainty-aware execution states

State machine:

NORMAL: low MFS, normal schedule
GUARDED: moderate MFS, lower passive dwell, smaller slices
DEFENSIVE: high MFS, cap aggression escalation, venue selectivity
SAFE: critical MFS, protect capital; controlled completion only

Example thresholds (calibrate by desk):

NORMAL → GUARDED: MFS > 1.0 for 3/5 ticks
GUARDED → DEFENSIVE: MFS > 1.8 or stress breach prob > 25%
DEFENSIVE → SAFE: MFS > 2.5 or repeated realized q95 exceedance

Recovery requires hysteresis + hold time to avoid oscillation.

Pre-trade workflow (before sending parent order)

Build current base forecast distribution.
Run scenario set (\mathcal{S}) (single + combo shocks).
Compute stressed p95/CVaR and breach probability.
Choose initial execution template by worst-case scenario:
- aggressive POV allowed only if worst-case breach prob < threshold,
- otherwise start guarded/defensive from first slice.
Store a machine-readable “stress card” with assumptions.

This prevents starting too aggressively and “discovering” fragility after losses are already booked.

Intraday online adaptation

Every decision cycle:

refresh feature freshness checks,
update conformal coverage diagnostics,
recompute MFS and stressed quantiles,
transition controller state with hysteresis,
attach explainability payload (top features + stress driver).

If freshness fails or model service is degraded, force fallback policy:

participation cap,
larger spacing,
deterministic completion guardrails.

Validation protocol (must pass before production)

A) Forecast quality

quantile calibration error by session bucket,
exceedance rates for q90/q95,
residual autocorrelation checks.

B) Uncertainty quality

interval coverage under rolling windows,
sharpness vs coverage tradeoff,
epistemic spikes around known regime changes.

C) Stress realism

scenario values within historical plausibility bounds,
historical replay during known stress days,
false-positive/false-negative cost analysis.

D) Policy outcomes (counterfactual replay)

Compare baseline vs uncertainty-aware controller on same intents:

p95/CVaR slippage,
completion SLA,
opportunity cost of caution,
number/duration of defensive states.

Do not ship if tail reduction comes only from unacceptable underfill.

Pseudocode

for t in decision_ticks:
    x = build_features(t)
    assert freshness_ok(x), "fallback"

    # Base predictive distribution
    q = quantile_model.predict(x)      # q50,q75,q90,q95
    mu, sigma_a = mean_var_model.predict(x)

    # Epistemic uncertainty (ensemble/bootstrap)
    preds = [m.predict_mean(x) for m in ensemble]
    sigma_e = np.std(preds)

    # Scenario stress
    stressed_q95 = []
    breach_probs = []
    for s in scenarios:
        xs = apply_scenario(x, s)
        q95_s = quantile_model.predict(xs).q95
        stressed_q95.append(q95_s)
        breach_probs.append(estimate_breach_prob(xs, budget_remaining))

    uplift = max(stressed_q95) / max(q.q95, 1e-6)
    mfs = w1*z(sigma_e) + w2*z(uplift) + w3*z(online_coverage_error())

    state = transition_with_hysteresis(state, mfs, max(breach_probs))
    action = policy_from_state(state, x, budget_remaining)
    send(action)

Common failure modes

Bootstrap done IID, not block-based
- underestimates uncertainty in serially correlated flow.
Stress scenarios too mild
- dashboard looks safe, live is not.
No feature freshness gating
- stale depth/latency makes stress math meaningless.
Conformal window too long
- adapts too slowly during regime breaks.
Controller without hysteresis
- action flapping creates extra costs.
Tail win, fill disaster
- p95 improves only because execution quietly stopped participating.

Minimal rollout plan (practical)

Phase 1 (shadow):

compute MFS + stressed metrics only,
no policy effect, collect diagnostics.

Phase 2 (advisory):

suggest state transitions to operator,
compare accepted vs ignored suggestions.

Phase 3 (bounded automation):

auto NORMAL↔GUARDED,
require explicit gate for DEFENSIVE/SAFE.

Phase 4 (full with kill-switch):

automated transitions with audit trail,
operator override and fast rollback.

Implementation checklist

Block-bootstrap or equivalent epistemic estimator in place
Scenario library versioned (with per-symbol bounds)
Online conformal / calibration monitor wired
MFS thresholds backtested + approved
Hysteresis + hold-time logic verified
Fallback execution policy tested under feature staleness
Stress card logging in TCA pipeline
Weekly drift review (coverage + uplift + realized tails)

References (seed reading)

Almgren, R., Chriss, N. Optimal Execution of Portfolio Transactions (2000).
https://www.smallake.kr/wp-content/uploads/2016/03/optliq.pdf
Jaisson, T. et al. Market impact as anticipation of the order flow imbalance (arXiv:1402.1288).
https://arxiv.org/abs/1402.1288
Jusselin, P. et al. No-arbitrage implies power-law market impact and rough volatility (arXiv:1805.07134).
https://arxiv.org/abs/1805.07134
Szymanski, G. et al. The two square root laws of market impact... (arXiv:2311.18283).
https://arxiv.org/abs/2311.18283

One-line takeaway

A slippage model is only as good as its behavior under stress; if uncertainty rises, execution should automatically spend less risk before the market sends the invoice.