Liquidity Regime-Switching State-Space Slippage Playbook

Date: 2026-03-02 Category: finance / execution research

Why this model

Most slippage models fail in production for one reason: they assume a single market regime.

In reality, intraday execution alternates between at least three latent regimes:

Resilient liquidity (mean-reverting impact, fast refill)
Fragile liquidity (slow refill, queue depletion risk)
Stress/liquidity vacuum (impact convexity + high adverse selection)

A state-space model with regime switching lets us infer this latent state in real time and adapt schedule, aggression, and participation caps before costs explode.

Core setup

1) Observation layer (what we can measure each slice)

At decision time (t), define:

(y_t): realized slice slippage (bps vs decision benchmark)
(x_t): observable features
- spread, microprice skew, imbalance
- queue position survival proxy
- local volatility (short-horizon RV)
- signed market order pressure
- our participation rate and child-order type mix

Observation equation:

[ y_t = x_t^\top \beta_{z_t} + \eta_t, \quad \eta_t \sim t_{\nu}(0, \sigma^2_{z_t}) ]

Use Student-t errors to avoid overreacting to outlier prints.

2) Latent state layer

(z_t \in {1,2,3}): latent liquidity regime
Markov transition matrix (P_t), optionally time-varying with covariates:

[ \Pr(z_t=j\mid z_{t-1}=i) \propto \exp\left(a_{ij} + w_{ij}^\top g_t\right) ]

where (g_t) may include event clock, auction proximity, volatility burst indicators, and cross-asset stress index.

3) Impact memory state

Add a latent impact stock (h_t) for temporary impact decay:

[ h_t = \phi_{z_t} h_{t-1} + \kappa_{z_t} u_t + \epsilon_t ]

(u_t): signed participation impulse (our trading pressure)
(\phi_{z_t}): decay speed (near 0 in resilient regime, high in fragile/stress)

Final predicted cost:

[ \hat c_{t+1} = x_{t+1}^\top \beta_{z_{t+1}} + h_{t+1} ]

Estimation workflow

1. Offline initialization

Use EM (or variational EM) for switching linear models.
Start with constrained priors:
- (\sigma_{stress} > \sigma_{fragile} > \sigma_{resilient})
- (\phi_{stress} > \phi_{fragile} > \phi_{resilient})
Fit per-symbol first, then cluster symbols and partially pool parameters (hierarchical shrinkage).

2. Online filtering

Run forward filter every slice:
- filtered state probs (\pi_t(j)=\Pr(z_t=j\mid \mathcal F_t))
- impact stock posterior (h_t)
Keep latency budget strict (<10ms per symbol at decision tier is realistic with vectorized implementation).

3. Drift control

Weekly re-estimation of (\beta, \sigma, \phi, \kappa)
Daily calibration check:
- probability integral transform (PIT)
- binned expected vs realized slippage
- regime occupancy drift

Execution policy overlay

Convert state probabilities into action constraints:

[ \text{AggressionScore}_t = \sum_j \pi_t(j) A_j, \quad \text{MaxPOV}_t = \sum_j \pi_t(j) V_j ]

Example policy template:

If (\pi_t(stress) > 0.45):
- halve max POV
- favor passive + midpoint pegs
- widen no-trade band for alpha-neutral flow
If (\pi_t(resilient) > 0.6) and urgency high:
- allow controlled IOC bursts
- tighten completion risk guardrails with hard stop on marginal cost slope

Key point: use soft blending by probabilities, not hard regime labels, to reduce thrashing.

Backtest design (must-have)

Counterfactual simulator with queue-aware fills (not bar-level toy fills)
Walk-forward splits by month/volatility regime
Compare against:
- static Almgren-Chriss-style schedule
- non-switching state-space model
- simple POV baseline
Scorecard:
- mean slippage (bps)
- 95p tail slippage
- implementation shortfall variance
- completion risk
- regime-conditioned performance

Success criterion is not just lower mean cost; it is lower tail + variance under stress days.

Production guardrails

Enforce parameter monotonicity checks before model publish.
Freeze to safe baseline if:
- filter degeneracy (single-state collapse for long window)
- live calibration error beyond threshold
- data quality incidents (stale book, missing trade prints)
Log every decision tuple:
- ((\pi_t, h_t, action, realized_cost)) for post-trade attribution.

Failure modes to watch

Regime aliasing: model confuses volatility spike with true liquidity fragility.
Self-impact endogeneity: your own urgency policy changes future state transitions.
Under-modeled event risk: macro headline windows break transition stationarity.
Overfitting per-symbol: no pooling leads to brittle rare-state estimates.

Mitigation: event-conditioned transitions, partial pooling, and explicit policy-simulation loop in training.

Minimal implementation checklist

Define slice schema (features, cost target, action traces)
Train 3-state switching model + impact stock
Build real-time filter endpoint
Add policy mapper (state probs -> aggression/POV caps)
Run walk-forward + stress-day diagnostics
Deploy shadow mode (no control) for 2+ weeks
Graduate to capped-control mode with rollback switch

One-line takeaway

A regime-switching state-space slippage model turns execution from “average-day optimization” into adaptive survival + cost control across liquidity states, especially where tail slippage actually hurts PnL.