Cancel-on-Disconnect Flapping Session Slippage Playbook

2026-03-14 · finance

Cancel-on-Disconnect Flapping Session Slippage Playbook

Date: 2026-03-14
Category: research
Focus: Modeling hidden execution cost when connectivity flaps trigger venue-level Cancel-on-Disconnect (CoD) order purges.


1) Why this failure mode deserves first-class treatment

Most slippage models assume a clean order lifecycle:

submit -> rest -> partial fill -> amend/cancel -> complete

But in real production, session-level controls (FIX disconnect rules, gateway heartbeat timeout, venue CoD settings) can force a different lifecycle:

disconnect -> venue purges resting orders -> reconnect -> strategy rebuilds inventory with fresh child orders

That creates a control-plane slippage tax that often gets misattributed to “market volatility”:

If this is not explicitly modeled, p95/p99 cost tails look random and “unfixable.”


2) Mechanism map (what actually causes the leak)

2.1 CoD purge branch

When session drops long enough to violate venue/gateway timeout, resting passive orders are canceled by venue policy.

Direct consequences

  1. Queue Age Reset: passive queue option value is destroyed.
  2. Re-entry Competition: replacement orders fight from the back of queue.
  3. Residual Compression: missed passive fills become future catch-up pressure.
  4. Asymmetric Replay: fills/acks/cancels can replay in mixed order after reconnect.

2.2 Flapping amplification

A single disconnect is manageable. Flapping (repeated disconnect/reconnect cycles) is dangerous:

So the tail is not linear in outage duration; it is convex in flap count × rebuild pressure.


3) Cost decomposition

Model total execution shortfall as:

[ C_{total} = C_{base} + C_{CoD} + C_{rebuild} + C_{deadline} ]

Where:

A practical branch expectation form:

[ \mathbb{E}[C] = p_{stable}C_{stable} + p_{single}C_{single_CoD} + p_{flap}C_{flap_cascade} ]

with (p_{flap}) estimated from session-health features, not market features alone.


4) Feature set for production modeling

4.1 Session-health features (must-have)

4.2 CoD/rebuild features

4.3 Queue-loss proxies

4.4 Coupling features with market stress

Important: CoD cost is worst when infra fragility and liquidity fragility co-occur.


5) Core metrics (dashboard + alarms)

5.1 DFI — Disconnect Flap Index

[ DFI = w_1 \cdot disconnect_count_{1m} + w_2 \cdot heartbeat_miss_streak + w_3 \cdot reconnect_duration_z ]

Measures session instability intensity.

5.2 QLT — Queue Loss Tax

[ QLT = \frac{\text{post-CoD realized cost} - \text{counterfactual no-CoD cost}}{\text{executed notional}} ]

Primary KPI for this failure mode.

5.3 RBS — Rebuild Burst Stress

[ RBS = \frac{\text{rebuild notional in }30s}{\text{remaining schedule notional}} ]

Captures urgency injection from forced restarts.

5.4 FRR — Flap Recovery Reject rate

[ FRR = \frac{\text{rejects during }T_{recovery}}{\text{orders sent during }T_{recovery}} ]

Detects control-plane saturation during re-entry.


6) State machine for execution controls

STABLE

FLAP_WATCH (DFI above watch threshold)

COD_RECOVERY (confirmed CoD event)

SAFE_STABILIZE (repeated flaps or FRR spike)

Recovery requires hysteresis (time + health thresholds), not one-tick oscillation.


7) Backtest/replay methodology

  1. Episode labeling

    • Partition historical data into stable, single_CoD, flap_cascade episodes.
  2. Counterfactual reconstruction

    • Replay with observed market path but synthetic no-CoD order continuity.
  3. Tail-centric evaluation

    • Track q50/q90/q95 shortfall delta by episode class.
  4. Completion guardrail

    • Report cost improvement jointly with completion reliability and deadline misses.
  5. Stress slicing

    • Evaluate separately for open/close windows where queue reset is most expensive.

8) Practical rollout plan (30 days)

Week 1: Instrumentation

Week 2: Shadow model

Week 3: Guarded activation

Week 4: Scale + stabilize


9) Anti-patterns


10) Bottom line

Cancel-on-Disconnect is a slippage regime, not an operational footnote.

If you do not model session-health and CoD-induced queue resets explicitly, your execution stack will keep paying hidden tail tax during connectivity turbulence. The winning setup is:

That turns reconnect chaos from “market bad luck” into an engineerable control problem.