Competing-Risks Fill/Cancel Slippage Modeling Playbook
Date: 2026-03-03
Category: research
Focus: Treat passive execution as a competing-risks problem (fill vs cancel/replace vs timeout), then optimize tactics on expected + tail slippage, not fill-only optimism.
Why this matters
A lot of execution stacks still estimate passive quality from fills only. That creates survivorship bias:
- easy fills dominate the dataset,
- cancels/timeouts get treated like “missing data,”
- re-entry and delay costs are underpriced,
- tail slippage appears “unexpected” in live trading.
The practical fix is simple in concept: model mutually exclusive order outcomes explicitly.
Key idea: fill is one event among several
For each child order, define first event within horizon (T):
- (k=1): first fill (partial/full, configurable)
- (k=2): cancel/replace before fill
- (k=3): timeout/no fill (and forced follow-up action)
Use cause-specific hazards:
[ h_k(\tau\mid x)=h_{0,k}(\tau)\exp(\beta_k^\top x) ]
with state vector (x) (spread, queue ahead, imbalance, volatility, local order-flow toxicity, own urgency, etc.).
Then compute cumulative incidence functions (CIF):
[ F_k(T\mid x)=\Pr(\text{event }k\text{ occurs first by }T\mid x) ]
and survival (S(T\mid x)) for no event yet.
Slippage objective (the operational core)
Instead of "maximize fill probability," optimize expected execution cost including all branches:
[ \mathbb{E}[C\mid x,T]=F_1,C_{\text{fill}} + F_2,C_{\text{cancel}} + F_3,C_{\text{timeout}} + S(T),C_{\text{roll}} ]
Where:
- (C_{\text{fill}}): realized spread capture + short-horizon markout + fees/rebates
- (C_{\text{cancel}}): cancel/replace footprint + queue reset + likely worse re-entry
- (C_{\text{timeout}}): urgency jump cost (often later crossing)
- (C_{\text{roll}}): residual risk carried forward to next control step
Add tail term for production safety:
[ \min_{a\in\mathcal{A}};\mathbb{E}[C\mid a,x] + \lambda,\text{CVaR}_{95}(C\mid a,x) ]
This prevents "great mean, terrible p95" behavior.
What to model in features (minimum useful set)
- Queue state
- queue ahead at order price
- queue depletion rate
- local cancel intensity
- Tightness/depth/resiliency
- spread, top-of-book depth imbalance
- refill speed proxy after recent shocks
- Flow toxicity
- signed trade pressure (1s/5s/15s)
- short-horizon adverse markout proxy
- Own execution context
- parent residual %, time-to-deadline, urgency state
- participation and recent child cadence
- Venue/session context
- open/mid/close bucket, venue tag, event flag
Why this is consistent with literature
- Lo, MacKinlay, Zhang (2002): survival-analysis framing for limit-order execution times; time-to-first-fill and time-to-completion are different objects and sensitive to microstructure covariates.
- Huang, Lehalle, Rosenbaum (2014/2015): queue-reactive modeling where order-flow intensities depend on current LOB state.
- Xu et al. (2016): spread/depth/intensity co-evolve after liquidity shocks; resiliency behavior matters for tactic timing.
- Lokin & Yu (2024/2026): state-dependent interacting-queue framework gives tractable fill-probability expressions at best/deeper levels.
- Competing-risks statistics: treating competing events as ordinary censoring biases incidence estimates; finance execution has the same structural issue.
Control policy translation (desk-ready)
At each decision step, evaluate actions:
- stay passive at current level,
- improve one tick,
- cancel+repost deeper,
- cross now,
- pause briefly.
For each action (a), estimate branch probabilities and branch costs, then pick argmin of risk-adjusted objective.
Practical guardrails:
- hysteresis to avoid cancel/repost flapping,
- max cancel rate caps by venue,
- emergency fallback to conservative baseline when calibration drift alarms fire.
Calibration protocol
- Build child-order event table with strict first-event labeling.
- Fit cause-specific hazard models (Cox/AFT/GBDT-hazard) per event type.
- Reconstruct CIFs and validate probability calibration (Brier/reliability by horizon).
- Fit branch-cost models conditioned on event type.
- Backtest policy with full branching (not fill-only replay).
- Promote only if mean and p95/CVaR improve without completion-SLA breach.
Monitoring (must-have)
- Fill-CIF calibration error by regime bucket
- Cancel/timeout incidence drift (PSI or KL vs training)
- Expected vs realized branch mix
- p95 slippage gap in high-urgency windows
- Queue-reset penalty trend after replaces
Alarm examples:
- 15-min rolling CIF calibration error > threshold
- timeout incidence doubles vs trailing 20-day matched bucket
- realized p95 exceeds modeled p95 by > X bps for Y windows
Failure modes
- Event-label leakage: using post-decision info in predictors.
- Partial-fill mishandling: class definitions inconsistent across venues.
- Confounding from policy changes: model drift after router rule edits.
- Clock inconsistency: bad timestamp alignment corrupts hazard estimates.
Minimal implementation checklist
- First-event schema (fill/cancel/timeout) at child-order granularity
- Cause-specific hazard models with calibrated CIF output
- Event-conditional branch cost models
- Risk-adjusted action selector (mean + CVaR)
- Drift alarms + conservative fallback mode
- Weekly scenario replay (open stress, close stress, thin-liquidity shock)
References
- Lo, A. W., MacKinlay, A. C., & Zhang, J. (2002). Econometric models of limit-order executions. Journal of Financial Economics, 65(1), 31–71.
- Huang, W., Lehalle, C.-A., & Rosenbaum, M. (2015). Simulating and Analyzing Order Book Data: The Queue-Reactive Model. JASA. (arXiv:1312.0563)
- Xu, H.-C., Chen, W., Xiong, X., Zhang, W., Zhou, W.-X., & Stanley, H. E. (2016). Limit-order book resiliency after effective market orders: Spread, depth and intensity. (arXiv:1602.00731)
- Lokin, F., & Yu, F. (2026 update of 2024 preprint). Fill Probabilities in a Limit Order Book with State-Dependent Stochastic Order Flows. (arXiv:2403.02572)
- Alfonsi, A., Fruth, A., & Schied, A. (2015). Extension and calibration of a Hawkes-based optimal execution model. (arXiv:1506.08740)
- Kim, H., & Kim, H. J. (2024). Introduction to Survival Analysis in the Presence of Competing Risks. J Lipid Atheroscler.
Bottom line: stop treating non-fills as nuisance censoring. Model fill/cancel/timeout as competing outcomes, attach explicit branch costs, and optimize execution on risk-adjusted total slippage. That is where a lot of hidden p95 leakage lives.