Latent Liquidity Resiliency via Sweep→Refill Modeling

Date: 2026-02-27
Category: research (quant execution / slippage modeling)

Why this playbook

Top-of-book depth is an unreliable proxy for true executable liquidity. What matters in production is how fast liquidity refills after you consume it.

This playbook models slippage as a function of:

Immediate sweep cost (how far you must walk the book),
Refill speed (how quickly liquidity comes back),
Adverse continuation risk (does price keep drifting against you before refill).

The goal is to replace static depth-based heuristics with an online, regime-aware controller.

Core idea

Treat every child-order interaction as a tiny experiment:

You consume some visible liquidity (sweep),
Observe post-trade microstructure response (refill and drift),
Update a latent resiliency state,
Adjust next child-order size/urgency.

So execution policy becomes:

“Trade more when refill is healthy; throttle when refill is slow or toxic.”

Slippage decomposition

For child order (i):

[ \text{Cost}i = C{\text{sweep}, i} + C_{\text{transient}, i} + C_{\text{continuation}, i} + C_{\text{timing}, i} ]

Where:

(C_{\text{sweep}}): immediate crossing + walk-up/down cost,
(C_{\text{transient}}): temporary impact until book recovers,
(C_{\text{continuation}}): adverse markout if imbalance persists,
(C_{\text{timing}}): delay/opportunity cost from waiting.

Traditional models usually fit the first term well and under-model the middle two.

Observables to log per child order

At minimum:

decision_ts, sent_ts, ack_ts, fill_ts
side, child notional, participation-at-send
pre-trade L1/L2 depth snapshot
sweep depth consumed (levels and shares)
spread, microprice, OFI, cancel rate
refill metrics for windows ({250ms, 500ms, 1s, 2s, 5s})
markouts ({1s, 5s, 30s})

Derived metrics

Depth Recovery Ratio (DRR) [ \text{DRR}_{\Delta t} = \frac{\text{same-side depth at } t+\Delta t}{\text{depth consumed at } t} ]
Refill Half-life (\tau_{1/2})

Time until DRR reaches 0.5.

Recovery Efficiency (RE) [ \text{RE}_{\Delta t} = \frac{\text{refilled depth at same/better price}}{\text{total refilled depth}} ]
Adverse Continuation Score (ACS)

Side-adjusted post-fill drift standardized by intraday volatility.

Sweep-to-Refill Elasticity (SRE) [ \text{SRE} = \frac{\partial \text{DRR}_{1s}}{\partial \text{sweep size percentile}} ]

If SRE is strongly negative, larger clips nonlinearly damage refill quality.

A practical online model

Use a two-stage model:

Stage A — refill/resiliency nowcast

Predict:(\hat{\tau}{1/2}), (\widehat{\text{DRR}}{1s}), (\widehat{\text{RE}}_{1s})

Features:

spread, L1 imbalance, microprice slope,
cancel burst intensity,
last-N sweep sizes,
auction/session proximity,
symbol liquidity bucket and volatility regime.

Good defaults:

Gradient-boosted trees for point forecasts,
Quantile heads (q50/q90) for conservative control,
Hourly isotonic recalibration for coverage drift.

Stage B — continuation/markout risk

Predict side-adjusted 5s/30s markout conditional on Stage A outputs + flow toxicity features.

Useful formula:

[ \widehat{\text{Cost}}(q) = \widehat{C}_{\text{sweep}}(q)

w_1,f_1(\hat{\tau}_{1/2}, q)
w_2,f_2(\widehat{\text{RE}}_{1s}, q)
w_3,\widehat{\text{Markout}}_{5s}(q) ]

Then optimize child size (q) under:

residual completion deadline,
participation cap,
p95 slippage budget.

Regime state machine (operational)

Define 3 states from resiliency + continuation signals:

1) HEALTHY

fast refill (low (\tau_{1/2})),
high RE,
contained ACS.

Action:

standard POV,
moderate child size,
passive-first allowed.

2) FRAGILE

refill slowing,
RE deterioration,
mild adverse continuation.

Action:

reduce child clip by 20–40%,
widen inter-child spacing,
prefer queue positions with better cancellation shelter,
tighten max chase distance.

3) EXHAUSTED

prolonged (\tau_{1/2}),
low DRR and RE,
persistent adverse continuation.

Action:

hard throttle participation,
opportunistic-only executions,
optionally pause venue/symbol slice for cooldown,
escalate if residual completion risk breaches deadline constraints.

Add hysteresis (e.g., 2 consecutive windows to enter, 3 to exit) to avoid flapping.

“Sentinel sweep” calibration trick

To estimate live resiliency without overtrading:

Inject tiny, controlled test clips (sentinels) at low frequency,
Measure refill response cleanly,
Use sentinels as high-quality labels for Stage A recalibration.

Guardrails:

cap sentinel participation (e.g., <0.5% of interval volume),
disable during known shock windows,
track sentinel-induced cost separately.

This gives online identifiability for refill dynamics when normal flow is sparse/noisy.

Data contract (example)

create table exec_micro_events (
  event_id text primary key,
  symbol text not null,
  side text not null,
  decision_ts timestamptz not null,
  fill_ts timestamptz,
  child_qty numeric not null,
  child_notional numeric not null,
  spread_bps numeric,
  l1_imbalance numeric,
  sweep_levels int,
  swept_qty numeric,
  drr_250ms numeric,
  drr_1s numeric,
  re_1s numeric,
  tau_half_ms numeric,
  markout_5s_bps numeric,
  markout_30s_bps numeric,
  regime_label text,
  created_at timestamptz default now()
);

Keep event-time semantics strict; slippage attribution dies when clocks drift.

Validation: what to track weekly

Forecast calibration
- q50/q90 coverage for (\tau_{1/2}), DRR, markout.
Control quality
- p95 child-order slippage vs baseline,
- underfill rate delta,
- completion lateness incidents.
Stability
- regime flip frequency,
- time spent in EXHAUSTED,
- false-positive throttles (cost saved vs opportunity lost).
Robustness
- stress windows (open/close/news) sliced separately,
- liquidity-bucket stratified performance.

Failure modes and anti-footguns

Depth mirage overfitting: model trusts displayed size too much.
- Fix: prioritize refill/markout features over raw depth.
Feedback loop blindness: model ignores that your own clips change state.
- Fix: include lagged own-flow features + clip percentile.
State flapping: noisy transitions create execution chaos.
- Fix: hysteresis + minimum dwell time.
Over-throttling: great slippage, missed completion.
- Fix: dual objective with deadline-aware penalty.
Session boundary leakage: open/close dynamics contaminate normal model.
- Fix: explicit session-phase feature and per-phase calibration.

Minimal rollout plan (production)

Phase 1 (Shadow)

compute resiliency state, no execution changes.
log counterfactual recommendations.

Phase 2 (Canary 5–10%)

apply clip-throttle logic only.
hard rollback if p95 or completion SLO degrades.

Phase 3 (Scaled)

enable full state-driven controls (clip + spacing + aggression).
add per-symbol override switches.

Phase 4 (Governance)

weekly champion/challenger review,
model-risk dashboard (coverage drift + tail budget burn),
automatic demotion on persistent miscalibration.

Practical takeaway

In live execution, liquidity quality is not “how much is visible now,” but how the book heals after being touched.

Modeling sweep→refill resiliency turns slippage control from static heuristics into an adaptive control loop that better protects p95/p99 cost without blindly sacrificing fills.