Cancel-Burst Queue Fragility Slippage Playbook

2026-02-23 · finance

Cancel-Burst Queue Fragility Slippage Playbook

Date: 2026-02-23 Category: research

Why this matters

Many execution models treat displayed depth as if it were stable for the next few seconds. In practice, during stress windows, cancel intensity spikes first, then spread widening and impact acceleration follow. If your child-order policy still trusts stale queue depth, you pay a hidden slippage tax.

This playbook turns queue-fragility signals into a live execution control loop.


Core idea

Model short-horizon fill quality as a function of:

  1. Queue state (depth, imbalance, distance to mid)
  2. Cancel burst pressure (state-dependent cancellation intensity)
  3. Recent self/market impact (markout and spread transition)

Then switch execution behavior by regime:


Practical data contract (per child-order decision)

At each decision timestamp t:

Without this event-level logging, you cannot distinguish “no fill because no flow” vs “no fill because queue evaporated.”


Signal engineering

1) Cancel-Burst Index (CBI)

For side s at time t:

CBI_s(t) = cancel_rate_s(t,1s) / median(cancel_rate_s, same TOD bucket, last 20 days)

Interpretation:

Use robust medians by time-of-day bucket to avoid open/close false alarms.

2) Queue Survival Score (QSS)

Estimate probability the front-of-queue depth survives for horizon H (e.g., 1–3s):

QSS = P(queue_not_evaporated_by_H | depth, imbalance, CBI, spread_state)

In production, keep it simple first:

3) Fill-Quality Score (FQS)

Blend fill probability and expected adverse markout:

FQS = w1 * P(fill<=H) - w2 * E[adverse_markout_30s | state]

Use this as the per-venue side quality gate.


Regime state machine

State A — Normal

Trigger:

Action:

State B — Fragile

Trigger (any):

Action:

State C — Toxic

Trigger (any):

Action:

Recovery rule:


Control-loop template (every 1–5 seconds)

  1. Recompute CBI, QSS, FQS
  2. Classify regime (Normal/Fragile/Toxic)
  3. Apply policy knobs:
    • POV multiplier
    • passive clip size
    • quote TTL
    • venue allowlist
  4. Log decision + realized outcomes
  5. Weekly recalibration on prediction error and tail slippage

Calibration plan (minimal but production-usable)

Weekly

Daily risk review

Guardrail

If model confidence collapses (feature drift), fallback to conservative static schedule rather than pretending precision.


Common failure modes

  1. Averaging away regime shifts

    • Fix: evaluate metrics per state and TOD, not only daily average bps.
  2. Overfitting to one venue

    • Fix: maintain venue-specific overlays; avoid global coefficients only.
  3. No hysteresis

    • Fix: add recovery confirmation windows to prevent policy thrash.
  4. Ignoring opportunity cost

    • Fix: monitor both slippage and underfill risk together.

KPI set

Primary:

Stability:

Efficiency:


Implementation sequence (2-week sprint)

  1. Add event-level cancellation + fill-outcome logging
  2. Build CBI + QSS baseline buckets
  3. Add state machine + 3 control knobs (POV, clip size, TTL)
  4. Backtest with historical event replay
  5. Shadow run in paper mode
  6. Promote with conservative limits + daily review

References to revisit

(Use these as conceptual anchors; production controls must be calibrated on your own venue/symbol stack.)


One-line takeaway

When cancel intensity spikes, displayed depth becomes perishable. Treat queue survival as a first-class risk signal, or your passive execution edge will vanish exactly when you need it most.