Cancel-Burst Queue Fragility Slippage Playbook

Date: 2026-02-23 Category: research

Why this matters

Many execution models treat displayed depth as if it were stable for the next few seconds. In practice, during stress windows, cancel intensity spikes first, then spread widening and impact acceleration follow. If your child-order policy still trusts stale queue depth, you pay a hidden slippage tax.

This playbook turns queue-fragility signals into a live execution control loop.

Core idea

Model short-horizon fill quality as a function of:

Queue state (depth, imbalance, distance to mid)
Cancel burst pressure (state-dependent cancellation intensity)
Recent self/market impact (markout and spread transition)

Then switch execution behavior by regime:

Normal: depth trustworthy enough
Fragile: queue likely to evaporate
Toxic: adverse-selection + cancel burst; protect tails first

Practical data contract (per child-order decision)

At each decision timestamp t:

Venue, symbol, side
Best bid/ask depth (Qb1, Qa1) and 2–5 level aggregate depth
Queue imbalance I=(Qb1-Qa1)/(Qb1+Qa1)
Recent cancellation count/size (e.g., 250ms, 1s windows)
Market order pressure (signed traded volume)
Spread state (1 tick vs widened)
Short markouts (e.g., +5s, +30s for recent fills)
Fill result of recent passive quotes (fill/no-fill, fill delay)

Without this event-level logging, you cannot distinguish “no fill because no flow” vs “no fill because queue evaporated.”

Signal engineering

1) Cancel-Burst Index (CBI)

For side s at time t:

CBI_s(t) = cancel_rate_s(t,1s) / median(cancel_rate_s, same TOD bucket, last 20 days)

Interpretation:

CBI < 1.2: normal
1.2–2.0: elevated
>2.0: burst

Use robust medians by time-of-day bucket to avoid open/close false alarms.

2) Queue Survival Score (QSS)

Estimate probability the front-of-queue depth survives for horizon H (e.g., 1–3s):

QSS = P(queue_not_evaporated_by_H | depth, imbalance, CBI, spread_state)

In production, keep it simple first:

Logistic model or isotonic binning over (depth_bucket, imbalance_bucket, CBI_bucket, spread_state)

3) Fill-Quality Score (FQS)

Blend fill probability and expected adverse markout:

FQS = w1 * P(fill<=H) - w2 * E[adverse_markout_30s | state]

Use this as the per-venue side quality gate.

Regime state machine

State A — Normal

Trigger:

CBI <= 1.2 and spread stable
QSS >= 0.6

Action:

Baseline POV
Standard passive placement
Normal re-quote cadence

State B — Fragile

Trigger (any):

CBI > 1.2
QSS in [0.35, 0.6)
transient spread widening frequency rising

Action:

Reduce passive size per clip (e.g., -20% to -40%)
Shorten quote TTL
Increase cancel/replace discipline
Shift some flow to midpoint/less exposed levels

State C — Toxic

Trigger (any):

CBI > 2.0 and QSS < 0.35
adverse markout breach (rolling p90 beyond threshold)
repeated quote fade before touch

Action:

Cap participation aggressively
Prefer immediacy only when inventory/risk requires
Temporarily quarantine worst venues
Enforce execution budget circuit-breaker

Recovery rule:

Require hysteresis (e.g., 3–5 consecutive clean windows) before stepping down risk state.

Control-loop template (every 1–5 seconds)

Recompute CBI, QSS, FQS
Classify regime (Normal/Fragile/Toxic)
Apply policy knobs:
- POV multiplier
- passive clip size
- quote TTL
- venue allowlist
Log decision + realized outcomes
Weekly recalibration on prediction error and tail slippage

Calibration plan (minimal but production-usable)

Weekly

Refit/refresh QSS buckets or logistic coefficients
Re-estimate FQS weights using recent fill-vs-markout tradeoff
Check by venue and TOD segment

Daily risk review

p50/p90 shortfall by regime
state occupancy (% time in Normal/Fragile/Toxic)
underfill vs slippage tradeoff after policy changes

Guardrail

If model confidence collapses (feature drift), fallback to conservative static schedule rather than pretending precision.

Common failure modes

Averaging away regime shifts
- Fix: evaluate metrics per state and TOD, not only daily average bps.
Overfitting to one venue
- Fix: maintain venue-specific overlays; avoid global coefficients only.
No hysteresis
- Fix: add recovery confirmation windows to prevent policy thrash.
Ignoring opportunity cost
- Fix: monitor both slippage and underfill risk together.

KPI set

Primary:

Implementation shortfall (bps), p50/p90/p95
Adverse markout (5s/30s)
Fill probability within target horizon

Stability:

Regime transition count per hour (thrash indicator)
Time in Toxic state
Venue quarantine frequency/duration

Efficiency:

Underfill rate
Residual urgency at end of schedule

Implementation sequence (2-week sprint)

Add event-level cancellation + fill-outcome logging
Build CBI + QSS baseline buckets
Add state machine + 3 control knobs (POV, clip size, TTL)
Backtest with historical event replay
Shadow run in paper mode
Promote with conservative limits + daily review

References to revisit

Queue-reactive microstructure modeling (state-dependent order flow and cancellations):
- Huang, Lehalle, Rosenbaum (2013), Simulating and analyzing order book data: The queue-reactive model.
Queue-reactive Hawkes extensions for order-flow dynamics:
- Wu et al. (2019), Queue-reactive Hawkes models for the order flow.
Fill probability modeling in LOB state-space:
- Recent LOB fill-probability studies (e.g., arXiv 2403.02572).
Queue position valuation and adverse selection tradeoffs:
- Moallemi et al. queue-value line of work.

(Use these as conceptual anchors; production controls must be calibrated on your own venue/symbol stack.)

One-line takeaway

When cancel intensity spikes, displayed depth becomes perishable. Treat queue survival as a first-class risk signal, or your passive execution edge will vanish exactly when you need it most.