Competing-Risks Fill/Cancel Slippage Modeling Playbook

Date: 2026-03-03
Category: research
Focus: Treat passive execution as a competing-risks problem (fill vs cancel/replace vs timeout), then optimize tactics on expected + tail slippage, not fill-only optimism.

Why this matters

A lot of execution stacks still estimate passive quality from fills only. That creates survivorship bias:

easy fills dominate the dataset,
cancels/timeouts get treated like “missing data,”
re-entry and delay costs are underpriced,
tail slippage appears “unexpected” in live trading.

The practical fix is simple in concept: model mutually exclusive order outcomes explicitly.

Key idea: fill is one event among several

For each child order, define first event within horizon (T):

(k=1): first fill (partial/full, configurable)
(k=2): cancel/replace before fill
(k=3): timeout/no fill (and forced follow-up action)

Use cause-specific hazards:

[ h_k(\tau\mid x)=h_{0,k}(\tau)\exp(\beta_k^\top x) ]

with state vector (x) (spread, queue ahead, imbalance, volatility, local order-flow toxicity, own urgency, etc.).

Then compute cumulative incidence functions (CIF):

[ F_k(T\mid x)=\Pr(\text{event }k\text{ occurs first by }T\mid x) ]

and survival (S(T\mid x)) for no event yet.

Slippage objective (the operational core)

Instead of "maximize fill probability," optimize expected execution cost including all branches:

[ \mathbb{E}[C\mid x,T]=F_1,C_{\text{fill}} + F_2,C_{\text{cancel}} + F_3,C_{\text{timeout}} + S(T),C_{\text{roll}} ]

Where:

(C_{\text{fill}}): realized spread capture + short-horizon markout + fees/rebates
(C_{\text{cancel}}): cancel/replace footprint + queue reset + likely worse re-entry
(C_{\text{timeout}}): urgency jump cost (often later crossing)
(C_{\text{roll}}): residual risk carried forward to next control step

Add tail term for production safety:

[ \min_{a\in\mathcal{A}};\mathbb{E}[C\mid a,x] + \lambda,\text{CVaR}_{95}(C\mid a,x) ]

This prevents "great mean, terrible p95" behavior.

What to model in features (minimum useful set)

Queue state
- queue ahead at order price
- queue depletion rate
- local cancel intensity
Tightness/depth/resiliency
- spread, top-of-book depth imbalance
- refill speed proxy after recent shocks
Flow toxicity
- signed trade pressure (1s/5s/15s)
- short-horizon adverse markout proxy
Own execution context
- parent residual %, time-to-deadline, urgency state
- participation and recent child cadence
Venue/session context
- open/mid/close bucket, venue tag, event flag

Why this is consistent with literature

Lo, MacKinlay, Zhang (2002): survival-analysis framing for limit-order execution times; time-to-first-fill and time-to-completion are different objects and sensitive to microstructure covariates.
Huang, Lehalle, Rosenbaum (2014/2015): queue-reactive modeling where order-flow intensities depend on current LOB state.
Xu et al. (2016): spread/depth/intensity co-evolve after liquidity shocks; resiliency behavior matters for tactic timing.
Lokin & Yu (2024/2026): state-dependent interacting-queue framework gives tractable fill-probability expressions at best/deeper levels.
Competing-risks statistics: treating competing events as ordinary censoring biases incidence estimates; finance execution has the same structural issue.

Control policy translation (desk-ready)

At each decision step, evaluate actions:

stay passive at current level,
improve one tick,
cancel+repost deeper,
cross now,
pause briefly.

For each action (a), estimate branch probabilities and branch costs, then pick argmin of risk-adjusted objective.

Practical guardrails:

hysteresis to avoid cancel/repost flapping,
max cancel rate caps by venue,
emergency fallback to conservative baseline when calibration drift alarms fire.

Calibration protocol

Build child-order event table with strict first-event labeling.
Fit cause-specific hazard models (Cox/AFT/GBDT-hazard) per event type.
Reconstruct CIFs and validate probability calibration (Brier/reliability by horizon).
Fit branch-cost models conditioned on event type.
Backtest policy with full branching (not fill-only replay).
Promote only if mean and p95/CVaR improve without completion-SLA breach.

Monitoring (must-have)

Fill-CIF calibration error by regime bucket
Cancel/timeout incidence drift (PSI or KL vs training)
Expected vs realized branch mix
p95 slippage gap in high-urgency windows
Queue-reset penalty trend after replaces

Alarm examples:

15-min rolling CIF calibration error > threshold
timeout incidence doubles vs trailing 20-day matched bucket
realized p95 exceeds modeled p95 by > X bps for Y windows

Failure modes

Event-label leakage: using post-decision info in predictors.
Partial-fill mishandling: class definitions inconsistent across venues.
Confounding from policy changes: model drift after router rule edits.
Clock inconsistency: bad timestamp alignment corrupts hazard estimates.

Minimal implementation checklist

First-event schema (fill/cancel/timeout) at child-order granularity
Cause-specific hazard models with calibrated CIF output
Event-conditional branch cost models
Risk-adjusted action selector (mean + CVaR)
Drift alarms + conservative fallback mode
Weekly scenario replay (open stress, close stress, thin-liquidity shock)

References

Lo, A. W., MacKinlay, A. C., & Zhang, J. (2002). Econometric models of limit-order executions. Journal of Financial Economics, 65(1), 31–71.
Huang, W., Lehalle, C.-A., & Rosenbaum, M. (2015). Simulating and Analyzing Order Book Data: The Queue-Reactive Model. JASA. (arXiv:1312.0563)
Xu, H.-C., Chen, W., Xiong, X., Zhang, W., Zhou, W.-X., & Stanley, H. E. (2016). Limit-order book resiliency after effective market orders: Spread, depth and intensity. (arXiv:1602.00731)
Lokin, F., & Yu, F. (2026 update of 2024 preprint). Fill Probabilities in a Limit Order Book with State-Dependent Stochastic Order Flows. (arXiv:2403.02572)
Alfonsi, A., Fruth, A., & Schied, A. (2015). Extension and calibration of a Hawkes-based optimal execution model. (arXiv:1506.08740)
Kim, H., & Kim, H. J. (2024). Introduction to Survival Analysis in the Presence of Competing Risks. J Lipid Atheroscler.

Bottom line: stop treating non-fills as nuisance censoring. Model fill/cancel/timeout as competing outcomes, attach explicit branch costs, and optimize execution on risk-adjusted total slippage. That is where a lot of hidden p95 leakage lives.