Censoring-Aware Slippage Modeling Playbook
(Partial Fills, Cancels, and Survivorship Bias in Live Execution)
Date: 2026-02-26
Category: Research (Execution / Slippage Modeling)
Scope: Intraday live execution for single-name and baskets (KRX/NXT portable)
Why this model exists
Most slippage models are trained on filled child orders only. That is convenient—and wrong.
In production, a meaningful share of child intents are:
- partially filled,
- canceled and replaced,
- expired with zero fill,
- rerouted after adverse microstructure shifts.
If we ignore these censored outcomes, the model learns from survivors (orders that did get fills), and systematically underestimates true execution drag.
This playbook adds a censoring-aware layer so expected cost reflects the real question:
“What is expected implementation cost for an intent, not only for already-filled prints?”
Core problem: MNAR labels in execution data
For each child intent at decision time (t), define:
- Features: (x_t)
- Fill outcome over horizon (\Delta): (F_t \in [0,1]) (fill ratio)
- Slippage if any fill occurs: (S_t) (bps, conditional on (F_t>0))
Naive model trains (S_t \sim f(x_t)) only where (F_t>0). But fill is not random; it depends on queue state, urgency, toxicity, and your own policy. So missingness is MNAR (missing not at random).
Result:
- optimistic expected slippage,
- unstable controller behavior in thin/turbulent books,
- poor transfer across session buckets (open/close/news windows).
Target quantity (what to predict)
For each intent, predict expected all-in execution tax:
[ \mathbb{E}[C_t \mid x_t] = \mathbb{E}[F_t \cdot S_t \mid x_t] + \mathbb{E}[(1-F_t) \cdot O_t \mid x_t] ]
Where:
- (S_t): realized slippage on filled quantity,
- (O_t): opportunity/urgency penalty for unfilled residue (next reprice, chase, or deadline sweep).
Operationally, this decomposes into two heads:
- Fill head: (\hat{p}_t = P(F_t>0\mid x_t)), plus expected fill ratio (\widehat{\phi}_t=\mathbb{E}[F_t\mid x_t])
- Price head: (\hat{s}_t = \mathbb{E}[S_t\mid x_t, F_t>0])
- Opportunity head: (\hat{o}_t = \mathbb{E}[O_t\mid x_t, F_t=0]) (or from replay)
Then:
[ \widehat{C}_t = \widehat{\phi}_t \cdot \hat{s}_t + (1-\widehat{\phi}_t) \cdot \hat{o}_t ]
Modeling blueprint
1) Fill/Survival model
Use discrete-time survival or hazard model over short buckets (e.g., 100ms–1s):
[ h_{t,k} = P(\text{fill in bucket }k \mid \text{not filled before}, x_{t,k}) ]
Derive:
- fill probability within horizon,
- expected fill ratio,
- expected time-to-fill.
Useful features:
- queue position estimate + queue age,
- spread/microprice drift,
- cancel burst intensity,
- own participation and recent replace frequency,
- venue/session flags (open/close/auction adjacency).
2) Conditional slippage model (filled branch)
Model (S_t) on filled events only, but correct selection bias via weighting.
Two practical choices:
- IPW (inverse probability weighting) [ w_t = \frac{1}{\max(\hat{p}_t, \epsilon)} ]
- AIPW / doubly robust for better finite-sample stability.
Train with robust loss (Huber/quantile) because tails matter more than mean.
3) Opportunity-cost model (unfilled branch)
Estimate what happens to leftover quantity when the first intent fails:
- next decision slippage,
- sweep-to-complete premium near deadline,
- alpha decay penalty while waiting.
Best source: counterfactual/replay engine with policy logs. If unavailable, approximate with:
- next-touch arrival-to-fill delta,
- urgency bucket templates,
- symbol/session-specific fallback priors.
4) Joint estimator and uncertainty
Return not just point estimate but interval:
[ \widehat{C}_t^{q50}, \widehat{C}_t^{q90}, \widehat{C}_t^{q99} ]
Use conformal or quantile calibration by symbol-liquidity bucket. Controllers should consume q90+ for guardrails, not mean only.
Training dataset design (critical)
Unit of analysis = intent event, not trade print.
Include for each intent:
- parent order context (remaining qty, urgency, deadline),
- microstructure snapshot at send/cancel/replace,
- lifecycle outcomes (partial %, time-to-fill, canceled reason),
- linkage to subsequent residue handling.
Avoid leakage:
- purged split by parent-order time,
- no future-book states in features,
- keep policy version as explicit feature (behavior policy drift is real).
Evaluation scorecard
Primary offline metrics:
- calibration of (\widehat{\phi}_t) (Brier/ECE),
- MAE/Pinball for (S_t) and (O_t),
- error on combined (C_t) at q50/q90.
Policy-facing metrics:
- realized IS reduction at same completion SLA,
- q95/q99 tail slippage,
- underfill rate and deadline breach rate,
- “surprise loss” frequency (realized > predicted q90).
Ablation to prove value:
- filled-only baseline,
- fill+price two-head,
- full censoring-aware (fill+price+opportunity).
If (3) doesn’t improve tail control, calibration is not production-ready.
Online controller integration
At each decision tick, compute:
- expected cost of resting passive now,
- expected cost of reprice,
- expected cost of cross/spread-taking.
Choose action minimizing risk-adjusted objective:
[ \arg\min_a ; \widehat{C}_t(a) + \lambda \cdot \text{CompletionRisk}_t(a) ]
with (\lambda) increasing as deadline approaches.
Simple tiering:
- GREEN: q90 cost within budget → passive-first,
- YELLOW: budget pressure → shorten dwell/reprice faster,
- RED: high no-fill risk + high opportunity penalty → controlled aggression.
Pseudocode (intent-centric)
for intent in decision_stream:
x = build_features(intent)
phi = fill_ratio_model.predict(x) # E[F|x]
s_q50, s_q90 = slippage_model.predict(x) # conditional on fill, bias-corrected
o_q50, o_q90 = opp_model.predict(x) # unfilled residue penalty
c_q50 = phi * s_q50 + (1 - phi) * o_q50
c_q90 = phi * s_q90 + (1 - phi) * o_q90
action = policy_min_cost_under_budget(c_q90, completion_risk(intent))
send(action)
Common failure modes
Print-level training only
Ignores canceled intents; produces chronic optimism.No policy-version feature
Model drift appears as market drift.Opportunity cost set to zero
Encourages fake patience and deadline panic later.Mean-only optimization
Tail blowups continue even when average improves.No closed-loop monitoring
Calibration decays silently after router/risk rule changes.
Minimal production monitoring
- Fill-prob calibration dashboard by symbol bucket,
- Predicted vs realized q90 gap (rolling),
- Surprise-loss alerts (q90 breaches),
- Feature freshness/latency SLO,
- Policy-regime drift alarms (new behavior policy hash).
Run weekly champion-challenger:
- champion = current production model,
- challenger = latest retrain + fresh censoring correction,
- promote only if tail + completion KPIs both improve.
One-line takeaway
A slippage model trained only on fills is a survivorship trap; model costs at the intent level (fill + no-fill opportunity branch) to control real-world execution tails instead of paper averages.