Cancel-on-Disconnect Trigger Ambiguity Slippage Playbook
Scope: How Cancel-on-Disconnect (COD) behavior, session-health ambiguity, and reconnect timing create hidden execution cost via forced cancels, stale intent, and panic re-entry.
1) Why this matters
Most desks treat disconnects as pure reliability incidents. In production, they are often execution incidents.
If COD is enabled, a transport/session failure can trigger exchange-side cancel waves. That can:
- erase queue position,
- invalidate child-order intent,
- force urgent re-entry at worse prices,
- inflate tail implementation shortfall (IS).
The cost is easy to misclassify as "market volatility" unless COD state is modeled explicitly.
2) Mechanism: where slippage appears
Typical failure chain:
- Session health degrades (heartbeat miss, abnormal socket drop, gateway reset, policy logout).
- Venue/market COD policy triggers (or partially triggers).
- Open passive inventory is canceled (with venue/session-specific exceptions).
- Parent schedule falls behind while order state is uncertain.
- Router over-corrects after reconnect (catch-up burst / forced crossing).
- Residual slippage appears as late urgency tax.
Key asymmetry: cancel finality and visibility are not perfectly synchronized. You may know the session dropped before you know exactly which child orders remain active.
3) External rule facts to encode (not memorize)
- TMX (TSX/TSXV/TSX Alpha): COD is configurable by level and can trigger on inactivity/disconnect; on trigger, session bundle blocks and attempts to cancel open orders with explicit exceptions (e.g., GTC/GTD, MOC/LOC, state-constrained cases).
- Coinbase FIX: logon supports
CancelOrdersOnDisconnectbehavior (Ssession,Yprofile). Docs explicitly note that like COD, some in-flight orders not yet acknowledged are not guaranteed to be canceled. - CME Globex FAQ: during port closure events, working-order outcome depends on COD enablement; if COD not enabled, alternate kill/cancel paths are needed.
Operational implication: COD semantics are venue- and session-policy specific. Never treat COD as a single boolean.
4) Slippage decomposition with COD component
Let:
IS_total: implementation shortfall (bps)IS_base: baseline model cost (spread + impact + timing)IS_COD: additional COD-driven component
Decompose:
IS_total = IS_base + IS_COD + ε
Model COD component as:
IS_COD = C_reset + C_uncertain + C_reentry + C_backlog
Where:
C_reset: queue-priority loss after forced cancelsC_uncertain: cost while live-order truth is ambiguousC_reentry: urgency premium paid to regain scheduleC_backlog: convex catch-up cost near deadline
This decomposition avoids blaming all post-reconnect cost on generic market regime.
5) Core metrics (production-ready)
1) COD Trigger Latency (CTL)
Time from first hard session-health failure signal to first confirmed COD-related cancel event.
2) Cancel Coverage Gap (CCG)
CCG = 1 - (confirmed_canceled_qty / expected_cancelable_qty)
High CCG means your cancellation expectation model is wrong (policy exceptions, in-flight uncertainty, stale state).
3) In-Flight Survivorship Ratio (ISR)
Fraction of orders sent pre-disconnect that remain active or fill after assumed COD window.
4) Reconnect Stabilization Lag (RSL)
Time from transport reconnect to trustworthy order-state convergence (open_qty_truth confidence > threshold).
5) Catch-up Convexity Penalty (CCP)
Incremental bps paid per unit backlog removed during post-disconnect recovery window.
6) COD Episode Tail Lift (CETL)
p95(IS | COD episode) - p95(IS | matched non-COD controls)
This is your practical board-level KPI.
6) State machine (must be explicit)
Use a deterministic execution state machine:
HEALTHY: normal routingDISCONNECT_SUSPECT: heartbeat/session anomalies; freeze aggressive expansionsCOD_POSSIBLE: probable COD trigger; suppress new passive staging until truth improvesRECONCILING: rebuild live-order truth from exchange acks/drop-copy/recovery queriesRECOVERY_CONTROLLED: paced re-entry with backlog capsHEALTHY_RESTORED: return only after stability criteria
Transition conditions must be data-driven (heartbeat misses, socket events, COD flags, cancel confirms, state confidence).
7) Data you need (minimum schema)
Per child order:
parent_id,child_id, symbol, side, qty, price- decision/send timestamps
- exchange ack/reject/cancel/fill timestamps
- session id, gateway id, connection state snapshots
- COD policy flags active at order time (session/profile level)
- cancel reason / unsolicited indicator fields (if venue provides)
- reconnect cycle id + reconciliation confidence score
Per episode:
- first anomaly timestamp
- COD trigger timestamp proxy
- reconnect start/end
- backlog path vs target schedule
- final slippage attribution breakdown
Without this lineage, COD cost cannot be estimated reliably.
8) Identification strategy (causal, not just correlation)
Episode labeling
- Build COD candidate episodes from session-health + unsolicited cancel patterns.
Matched controls
- Match by symbol liquidity bucket, time-of-day, spread/volatility regime, parent urgency.
Difference-in-differences
- Compare slippage drift pre/post incident between COD and control episodes.
Backlog-path mediation
- Quantify how much tail cost is mediated by recovery backlog, not just initial disconnect.
Heterogeneity cuts
- COD level/policy, venue type, order type mix, parent horizon.
9) Control policy design
A. Pre-incident hardening
- Keep venue/session-specific COD policy registry versioned in code.
- Simulate heartbeat and abnormal disconnect scenarios in paper/game-day.
- Enforce dual-source order truth (order-entry + drop-copy/recovery channel).
B. Incident mode
- Enter
COD_POSSIBLEquickly; stop naive catch-up. - Cap recovery aggression by observed
RSLand confidence score. - Prefer paced backlog reduction over immediate full crossing.
C. Recovery mode
- Ramp participation with convex caps:
- low confidence -> low cap,
- confidence improving -> gradual cap increase,
- stable truth -> normalize.
- Auto-block strategy if
CETLandCCGbreach hard thresholds.
10) Practical rollout (small team, 2 weeks)
Week 1
- Add COD-aware event schema + episode labeling.
- Build baseline dashboard (CTL, CCG, ISR, RSL, CETL).
- Encode state machine and safe fallback router profile.
Week 2
- Add matched-control analytics and COD cost decomposition.
- Canary on limited symbols/notional.
- Define promotion gates:
CETLdown,- no increase in completion failures,
- no unresolved state ambiguity beyond SLA.
11) Common mistakes
- Treating COD as a binary flag independent of venue/session policy level.
- Assuming "disconnect -> all canceled" is always true.
- Ignoring in-flight non-guaranteed cancel risk.
- Letting reconnect trigger panic catch-up without confidence gating.
- Measuring only average IS (missing p95/p99 COD tails).
12) What good looks like
A mature stack can answer, for every disconnect episode:
- Did COD likely trigger?
- Which child orders were truly canceled vs survived/fill-late?
- How much slippage came from queue reset vs uncertainty vs catch-up convexity?
- Did policy reduce tail cost vs matched controls?
If you cannot answer these four questions, COD is still a hidden slippage tax.
References
- TMX (TSX/TSXV/TSX Alpha), Cancel on Disconnect overview and trigger-level behavior:
https://www.tsx.com/en/trading/products-and-services/cancel-on-disconnect - Coinbase Exchange FIX Order Entry Messages 5.0 (incl.
CancelOrdersOnDisconnect, mass-cancel caveat):
https://docs.cdp.coinbase.com/exchange/fix-api/order-entry-messages/order-entry-messages5 - CME Group, FAQ: Port Closure (COD dependency for working-order outcome):
https://www.cmegroup.com/solutions/market-access/globex/trade-on-globex/about-the-global-command-center/faq-port-closure.html