Cancel-Burst Queue Fragility Slippage Playbook
Date: 2026-02-23 Category: research
Why this matters
Many execution models treat displayed depth as if it were stable for the next few seconds. In practice, during stress windows, cancel intensity spikes first, then spread widening and impact acceleration follow. If your child-order policy still trusts stale queue depth, you pay a hidden slippage tax.
This playbook turns queue-fragility signals into a live execution control loop.
Core idea
Model short-horizon fill quality as a function of:
- Queue state (depth, imbalance, distance to mid)
- Cancel burst pressure (state-dependent cancellation intensity)
- Recent self/market impact (markout and spread transition)
Then switch execution behavior by regime:
- Normal: depth trustworthy enough
- Fragile: queue likely to evaporate
- Toxic: adverse-selection + cancel burst; protect tails first
Practical data contract (per child-order decision)
At each decision timestamp t:
- Venue, symbol, side
- Best bid/ask depth (
Qb1,Qa1) and 2–5 level aggregate depth - Queue imbalance
I=(Qb1-Qa1)/(Qb1+Qa1) - Recent cancellation count/size (e.g., 250ms, 1s windows)
- Market order pressure (signed traded volume)
- Spread state (1 tick vs widened)
- Short markouts (e.g., +5s, +30s for recent fills)
- Fill result of recent passive quotes (fill/no-fill, fill delay)
Without this event-level logging, you cannot distinguish “no fill because no flow” vs “no fill because queue evaporated.”
Signal engineering
1) Cancel-Burst Index (CBI)
For side s at time t:
CBI_s(t) = cancel_rate_s(t,1s) / median(cancel_rate_s, same TOD bucket, last 20 days)
Interpretation:
CBI < 1.2: normal1.2–2.0: elevated>2.0: burst
Use robust medians by time-of-day bucket to avoid open/close false alarms.
2) Queue Survival Score (QSS)
Estimate probability the front-of-queue depth survives for horizon H (e.g., 1–3s):
QSS = P(queue_not_evaporated_by_H | depth, imbalance, CBI, spread_state)
In production, keep it simple first:
- Logistic model or isotonic binning over
(depth_bucket, imbalance_bucket, CBI_bucket, spread_state)
3) Fill-Quality Score (FQS)
Blend fill probability and expected adverse markout:
FQS = w1 * P(fill<=H) - w2 * E[adverse_markout_30s | state]
Use this as the per-venue side quality gate.
Regime state machine
State A — Normal
Trigger:
CBI <= 1.2and spread stableQSS >= 0.6
Action:
- Baseline POV
- Standard passive placement
- Normal re-quote cadence
State B — Fragile
Trigger (any):
CBI > 1.2QSS in [0.35, 0.6)- transient spread widening frequency rising
Action:
- Reduce passive size per clip (e.g., -20% to -40%)
- Shorten quote TTL
- Increase cancel/replace discipline
- Shift some flow to midpoint/less exposed levels
State C — Toxic
Trigger (any):
CBI > 2.0andQSS < 0.35- adverse markout breach (rolling p90 beyond threshold)
- repeated quote fade before touch
Action:
- Cap participation aggressively
- Prefer immediacy only when inventory/risk requires
- Temporarily quarantine worst venues
- Enforce execution budget circuit-breaker
Recovery rule:
- Require hysteresis (e.g., 3–5 consecutive clean windows) before stepping down risk state.
Control-loop template (every 1–5 seconds)
- Recompute
CBI,QSS,FQS - Classify regime (Normal/Fragile/Toxic)
- Apply policy knobs:
- POV multiplier
- passive clip size
- quote TTL
- venue allowlist
- Log decision + realized outcomes
- Weekly recalibration on prediction error and tail slippage
Calibration plan (minimal but production-usable)
Weekly
- Refit/refresh
QSSbuckets or logistic coefficients - Re-estimate
FQSweights using recent fill-vs-markout tradeoff - Check by venue and TOD segment
Daily risk review
- p50/p90 shortfall by regime
- state occupancy (% time in Normal/Fragile/Toxic)
- underfill vs slippage tradeoff after policy changes
Guardrail
If model confidence collapses (feature drift), fallback to conservative static schedule rather than pretending precision.
Common failure modes
Averaging away regime shifts
- Fix: evaluate metrics per state and TOD, not only daily average bps.
Overfitting to one venue
- Fix: maintain venue-specific overlays; avoid global coefficients only.
No hysteresis
- Fix: add recovery confirmation windows to prevent policy thrash.
Ignoring opportunity cost
- Fix: monitor both slippage and underfill risk together.
KPI set
Primary:
- Implementation shortfall (bps), p50/p90/p95
- Adverse markout (5s/30s)
- Fill probability within target horizon
Stability:
- Regime transition count per hour (thrash indicator)
- Time in Toxic state
- Venue quarantine frequency/duration
Efficiency:
- Underfill rate
- Residual urgency at end of schedule
Implementation sequence (2-week sprint)
- Add event-level cancellation + fill-outcome logging
- Build
CBI+QSSbaseline buckets - Add state machine + 3 control knobs (POV, clip size, TTL)
- Backtest with historical event replay
- Shadow run in paper mode
- Promote with conservative limits + daily review
References to revisit
- Queue-reactive microstructure modeling (state-dependent order flow and cancellations):
- Huang, Lehalle, Rosenbaum (2013), Simulating and analyzing order book data: The queue-reactive model.
- Queue-reactive Hawkes extensions for order-flow dynamics:
- Wu et al. (2019), Queue-reactive Hawkes models for the order flow.
- Fill probability modeling in LOB state-space:
- Recent LOB fill-probability studies (e.g., arXiv 2403.02572).
- Queue position valuation and adverse selection tradeoffs:
- Moallemi et al. queue-value line of work.
(Use these as conceptual anchors; production controls must be calibrated on your own venue/symbol stack.)
One-line takeaway
When cancel intensity spikes, displayed depth becomes perishable. Treat queue survival as a first-class risk signal, or your passive execution edge will vanish exactly when you need it most.