Competing-Risks Fill/Cancel Slippage Modeling Playbook

2026-03-03 · finance

Competing-Risks Fill/Cancel Slippage Modeling Playbook

Date: 2026-03-03
Category: research
Focus: Treat passive execution as a competing-risks problem (fill vs cancel/replace vs timeout), then optimize tactics on expected + tail slippage, not fill-only optimism.

Why this matters

A lot of execution stacks still estimate passive quality from fills only. That creates survivorship bias:

The practical fix is simple in concept: model mutually exclusive order outcomes explicitly.

Key idea: fill is one event among several

For each child order, define first event within horizon (T):

Use cause-specific hazards:

[ h_k(\tau\mid x)=h_{0,k}(\tau)\exp(\beta_k^\top x) ]

with state vector (x) (spread, queue ahead, imbalance, volatility, local order-flow toxicity, own urgency, etc.).

Then compute cumulative incidence functions (CIF):

[ F_k(T\mid x)=\Pr(\text{event }k\text{ occurs first by }T\mid x) ]

and survival (S(T\mid x)) for no event yet.

Slippage objective (the operational core)

Instead of "maximize fill probability," optimize expected execution cost including all branches:

[ \mathbb{E}[C\mid x,T]=F_1,C_{\text{fill}} + F_2,C_{\text{cancel}} + F_3,C_{\text{timeout}} + S(T),C_{\text{roll}} ]

Where:

Add tail term for production safety:

[ \min_{a\in\mathcal{A}};\mathbb{E}[C\mid a,x] + \lambda,\text{CVaR}_{95}(C\mid a,x) ]

This prevents "great mean, terrible p95" behavior.

What to model in features (minimum useful set)

  1. Queue state
    • queue ahead at order price
    • queue depletion rate
    • local cancel intensity
  2. Tightness/depth/resiliency
    • spread, top-of-book depth imbalance
    • refill speed proxy after recent shocks
  3. Flow toxicity
    • signed trade pressure (1s/5s/15s)
    • short-horizon adverse markout proxy
  4. Own execution context
    • parent residual %, time-to-deadline, urgency state
    • participation and recent child cadence
  5. Venue/session context
    • open/mid/close bucket, venue tag, event flag

Why this is consistent with literature

Control policy translation (desk-ready)

At each decision step, evaluate actions:

For each action (a), estimate branch probabilities and branch costs, then pick argmin of risk-adjusted objective.

Practical guardrails:

Calibration protocol

  1. Build child-order event table with strict first-event labeling.
  2. Fit cause-specific hazard models (Cox/AFT/GBDT-hazard) per event type.
  3. Reconstruct CIFs and validate probability calibration (Brier/reliability by horizon).
  4. Fit branch-cost models conditioned on event type.
  5. Backtest policy with full branching (not fill-only replay).
  6. Promote only if mean and p95/CVaR improve without completion-SLA breach.

Monitoring (must-have)

Alarm examples:

Failure modes

Minimal implementation checklist

References


Bottom line: stop treating non-fills as nuisance censoring. Model fill/cancel/timeout as competing outcomes, attach explicit branch costs, and optimize execution on risk-adjusted total slippage. That is where a lot of hidden p95 leakage lives.