Cancel-on-Disconnect Trigger Ambiguity Slippage Playbook

Scope: How Cancel-on-Disconnect (COD) behavior, session-health ambiguity, and reconnect timing create hidden execution cost via forced cancels, stale intent, and panic re-entry.

1) Why this matters

Most desks treat disconnects as pure reliability incidents. In production, they are often execution incidents.

If COD is enabled, a transport/session failure can trigger exchange-side cancel waves. That can:

erase queue position,
invalidate child-order intent,
force urgent re-entry at worse prices,
inflate tail implementation shortfall (IS).

The cost is easy to misclassify as "market volatility" unless COD state is modeled explicitly.

2) Mechanism: where slippage appears

Typical failure chain:

Session health degrades (heartbeat miss, abnormal socket drop, gateway reset, policy logout).
Venue/market COD policy triggers (or partially triggers).
Open passive inventory is canceled (with venue/session-specific exceptions).
Parent schedule falls behind while order state is uncertain.
Router over-corrects after reconnect (catch-up burst / forced crossing).
Residual slippage appears as late urgency tax.

Key asymmetry: cancel finality and visibility are not perfectly synchronized. You may know the session dropped before you know exactly which child orders remain active.

3) External rule facts to encode (not memorize)

TMX (TSX/TSXV/TSX Alpha): COD is configurable by level and can trigger on inactivity/disconnect; on trigger, session bundle blocks and attempts to cancel open orders with explicit exceptions (e.g., GTC/GTD, MOC/LOC, state-constrained cases).
Coinbase FIX: logon supports CancelOrdersOnDisconnect behavior (S session, Y profile). Docs explicitly note that like COD, some in-flight orders not yet acknowledged are not guaranteed to be canceled.
CME Globex FAQ: during port closure events, working-order outcome depends on COD enablement; if COD not enabled, alternate kill/cancel paths are needed.

Operational implication: COD semantics are venue- and session-policy specific. Never treat COD as a single boolean.

4) Slippage decomposition with COD component

Let:

IS_total: implementation shortfall (bps)
IS_base: baseline model cost (spread + impact + timing)
IS_COD: additional COD-driven component

Decompose:

IS_total = IS_base + IS_COD + ε

Model COD component as:

IS_COD = C_reset + C_uncertain + C_reentry + C_backlog

Where:

C_reset: queue-priority loss after forced cancels
C_uncertain: cost while live-order truth is ambiguous
C_reentry: urgency premium paid to regain schedule
C_backlog: convex catch-up cost near deadline

This decomposition avoids blaming all post-reconnect cost on generic market regime.

5) Core metrics (production-ready)

1) COD Trigger Latency (CTL)

Time from first hard session-health failure signal to first confirmed COD-related cancel event.

2) Cancel Coverage Gap (CCG)

CCG = 1 - (confirmed_canceled_qty / expected_cancelable_qty)

High CCG means your cancellation expectation model is wrong (policy exceptions, in-flight uncertainty, stale state).

3) In-Flight Survivorship Ratio (ISR)

Fraction of orders sent pre-disconnect that remain active or fill after assumed COD window.

4) Reconnect Stabilization Lag (RSL)

Time from transport reconnect to trustworthy order-state convergence (open_qty_truth confidence > threshold).

5) Catch-up Convexity Penalty (CCP)

Incremental bps paid per unit backlog removed during post-disconnect recovery window.

6) COD Episode Tail Lift (CETL)

p95(IS | COD episode) - p95(IS | matched non-COD controls)

This is your practical board-level KPI.

6) State machine (must be explicit)

Use a deterministic execution state machine:

HEALTHY: normal routing
DISCONNECT_SUSPECT: heartbeat/session anomalies; freeze aggressive expansions
COD_POSSIBLE: probable COD trigger; suppress new passive staging until truth improves
RECONCILING: rebuild live-order truth from exchange acks/drop-copy/recovery queries
RECOVERY_CONTROLLED: paced re-entry with backlog caps
HEALTHY_RESTORED: return only after stability criteria

Transition conditions must be data-driven (heartbeat misses, socket events, COD flags, cancel confirms, state confidence).

7) Data you need (minimum schema)

Per child order:

parent_id, child_id, symbol, side, qty, price
decision/send timestamps
exchange ack/reject/cancel/fill timestamps
session id, gateway id, connection state snapshots
COD policy flags active at order time (session/profile level)
cancel reason / unsolicited indicator fields (if venue provides)
reconnect cycle id + reconciliation confidence score

Per episode:

first anomaly timestamp
COD trigger timestamp proxy
reconnect start/end
backlog path vs target schedule
final slippage attribution breakdown

Without this lineage, COD cost cannot be estimated reliably.

8) Identification strategy (causal, not just correlation)

Episode labeling
- Build COD candidate episodes from session-health + unsolicited cancel patterns.
Matched controls
- Match by symbol liquidity bucket, time-of-day, spread/volatility regime, parent urgency.
Difference-in-differences
- Compare slippage drift pre/post incident between COD and control episodes.
Backlog-path mediation
- Quantify how much tail cost is mediated by recovery backlog, not just initial disconnect.
Heterogeneity cuts
- COD level/policy, venue type, order type mix, parent horizon.

9) Control policy design

A. Pre-incident hardening

Keep venue/session-specific COD policy registry versioned in code.
Simulate heartbeat and abnormal disconnect scenarios in paper/game-day.
Enforce dual-source order truth (order-entry + drop-copy/recovery channel).

B. Incident mode

Enter COD_POSSIBLE quickly; stop naive catch-up.
Cap recovery aggression by observed RSL and confidence score.
Prefer paced backlog reduction over immediate full crossing.

C. Recovery mode

Ramp participation with convex caps:
- low confidence -> low cap,
- confidence improving -> gradual cap increase,
- stable truth -> normalize.
Auto-block strategy if CETL and CCG breach hard thresholds.

10) Practical rollout (small team, 2 weeks)

Week 1

Add COD-aware event schema + episode labeling.
Build baseline dashboard (CTL, CCG, ISR, RSL, CETL).
Encode state machine and safe fallback router profile.

Week 2

Add matched-control analytics and COD cost decomposition.
Canary on limited symbols/notional.
Define promotion gates:
- CETL down,
- no increase in completion failures,
- no unresolved state ambiguity beyond SLA.

11) Common mistakes

Treating COD as a binary flag independent of venue/session policy level.
Assuming "disconnect -> all canceled" is always true.
Ignoring in-flight non-guaranteed cancel risk.
Letting reconnect trigger panic catch-up without confidence gating.
Measuring only average IS (missing p95/p99 COD tails).

12) What good looks like

A mature stack can answer, for every disconnect episode:

Did COD likely trigger?
Which child orders were truly canceled vs survived/fill-late?
How much slippage came from queue reset vs uncertainty vs catch-up convexity?
Did policy reduce tail cost vs matched controls?

If you cannot answer these four questions, COD is still a hidden slippage tax.

References

TMX (TSX/TSXV/TSX Alpha), Cancel on Disconnect overview and trigger-level behavior:
https://www.tsx.com/en/trading/products-and-services/cancel-on-disconnect
Coinbase Exchange FIX Order Entry Messages 5.0 (incl. CancelOrdersOnDisconnect, mass-cancel caveat):
https://docs.cdp.coinbase.com/exchange/fix-api/order-entry-messages/order-entry-messages5
CME Group, FAQ: Port Closure (COD dependency for working-order outcome):
https://www.cmegroup.com/solutions/market-access/globex/trade-on-globex/about-the-global-command-center/faq-port-closure.html