Cancel-on-Disconnect Trigger Ambiguity Slippage Playbook

2026-03-30 · finance

Cancel-on-Disconnect Trigger Ambiguity Slippage Playbook

Scope: How Cancel-on-Disconnect (COD) behavior, session-health ambiguity, and reconnect timing create hidden execution cost via forced cancels, stale intent, and panic re-entry.

1) Why this matters

Most desks treat disconnects as pure reliability incidents. In production, they are often execution incidents.

If COD is enabled, a transport/session failure can trigger exchange-side cancel waves. That can:

  1. erase queue position,
  2. invalidate child-order intent,
  3. force urgent re-entry at worse prices,
  4. inflate tail implementation shortfall (IS).

The cost is easy to misclassify as "market volatility" unless COD state is modeled explicitly.

2) Mechanism: where slippage appears

Typical failure chain:

  1. Session health degrades (heartbeat miss, abnormal socket drop, gateway reset, policy logout).
  2. Venue/market COD policy triggers (or partially triggers).
  3. Open passive inventory is canceled (with venue/session-specific exceptions).
  4. Parent schedule falls behind while order state is uncertain.
  5. Router over-corrects after reconnect (catch-up burst / forced crossing).
  6. Residual slippage appears as late urgency tax.

Key asymmetry: cancel finality and visibility are not perfectly synchronized. You may know the session dropped before you know exactly which child orders remain active.

3) External rule facts to encode (not memorize)

Operational implication: COD semantics are venue- and session-policy specific. Never treat COD as a single boolean.

4) Slippage decomposition with COD component

Let:

Decompose:

IS_total = IS_base + IS_COD + ε

Model COD component as:

IS_COD = C_reset + C_uncertain + C_reentry + C_backlog

Where:

This decomposition avoids blaming all post-reconnect cost on generic market regime.

5) Core metrics (production-ready)

1) COD Trigger Latency (CTL)

Time from first hard session-health failure signal to first confirmed COD-related cancel event.

2) Cancel Coverage Gap (CCG)

CCG = 1 - (confirmed_canceled_qty / expected_cancelable_qty)

High CCG means your cancellation expectation model is wrong (policy exceptions, in-flight uncertainty, stale state).

3) In-Flight Survivorship Ratio (ISR)

Fraction of orders sent pre-disconnect that remain active or fill after assumed COD window.

4) Reconnect Stabilization Lag (RSL)

Time from transport reconnect to trustworthy order-state convergence (open_qty_truth confidence > threshold).

5) Catch-up Convexity Penalty (CCP)

Incremental bps paid per unit backlog removed during post-disconnect recovery window.

6) COD Episode Tail Lift (CETL)

p95(IS | COD episode) - p95(IS | matched non-COD controls)

This is your practical board-level KPI.

6) State machine (must be explicit)

Use a deterministic execution state machine:

Transition conditions must be data-driven (heartbeat misses, socket events, COD flags, cancel confirms, state confidence).

7) Data you need (minimum schema)

Per child order:

Per episode:

Without this lineage, COD cost cannot be estimated reliably.

8) Identification strategy (causal, not just correlation)

  1. Episode labeling

    • Build COD candidate episodes from session-health + unsolicited cancel patterns.
  2. Matched controls

    • Match by symbol liquidity bucket, time-of-day, spread/volatility regime, parent urgency.
  3. Difference-in-differences

    • Compare slippage drift pre/post incident between COD and control episodes.
  4. Backlog-path mediation

    • Quantify how much tail cost is mediated by recovery backlog, not just initial disconnect.
  5. Heterogeneity cuts

    • COD level/policy, venue type, order type mix, parent horizon.

9) Control policy design

A. Pre-incident hardening

B. Incident mode

C. Recovery mode

10) Practical rollout (small team, 2 weeks)

Week 1

  1. Add COD-aware event schema + episode labeling.
  2. Build baseline dashboard (CTL, CCG, ISR, RSL, CETL).
  3. Encode state machine and safe fallback router profile.

Week 2

  1. Add matched-control analytics and COD cost decomposition.
  2. Canary on limited symbols/notional.
  3. Define promotion gates:
    • CETL down,
    • no increase in completion failures,
    • no unresolved state ambiguity beyond SLA.

11) Common mistakes

  1. Treating COD as a binary flag independent of venue/session policy level.
  2. Assuming "disconnect -> all canceled" is always true.
  3. Ignoring in-flight non-guaranteed cancel risk.
  4. Letting reconnect trigger panic catch-up without confidence gating.
  5. Measuring only average IS (missing p95/p99 COD tails).

12) What good looks like

A mature stack can answer, for every disconnect episode:

If you cannot answer these four questions, COD is still a hidden slippage tax.

References