Censoring-Aware Slippage Modeling Playbook

(Partial Fills, Cancels, and Survivorship Bias in Live Execution)

Date: 2026-02-26
Category: Research (Execution / Slippage Modeling)
Scope: Intraday live execution for single-name and baskets (KRX/NXT portable)

Why this model exists

Most slippage models are trained on filled child orders only. That is convenient—and wrong.

In production, a meaningful share of child intents are:

partially filled,
canceled and replaced,
expired with zero fill,
rerouted after adverse microstructure shifts.

If we ignore these censored outcomes, the model learns from survivors (orders that did get fills), and systematically underestimates true execution drag.

This playbook adds a censoring-aware layer so expected cost reflects the real question:

“What is expected implementation cost for an intent, not only for already-filled prints?”

Core problem: MNAR labels in execution data

For each child intent at decision time (t), define:

Features: (x_t)
Fill outcome over horizon (\Delta): (F_t \in [0,1]) (fill ratio)
Slippage if any fill occurs: (S_t) (bps, conditional on (F_t>0))

Naive model trains (S_t \sim f(x_t)) only where (F_t>0). But fill is not random; it depends on queue state, urgency, toxicity, and your own policy. So missingness is MNAR (missing not at random).

Result:

optimistic expected slippage,
unstable controller behavior in thin/turbulent books,
poor transfer across session buckets (open/close/news windows).

Target quantity (what to predict)

For each intent, predict expected all-in execution tax:

[ \mathbb{E}[C_t \mid x_t] = \mathbb{E}[F_t \cdot S_t \mid x_t] + \mathbb{E}[(1-F_t) \cdot O_t \mid x_t] ]

Where:

(S_t): realized slippage on filled quantity,
(O_t): opportunity/urgency penalty for unfilled residue (next reprice, chase, or deadline sweep).

Operationally, this decomposes into two heads:

Fill head: (\hat{p}_t = P(F_t>0\mid x_t)), plus expected fill ratio (\widehat{\phi}_t=\mathbb{E}[F_t\mid x_t])
Price head: (\hat{s}_t = \mathbb{E}[S_t\mid x_t, F_t>0])
Opportunity head: (\hat{o}_t = \mathbb{E}[O_t\mid x_t, F_t=0]) (or from replay)

Then:

[ \widehat{C}_t = \widehat{\phi}_t \cdot \hat{s}_t + (1-\widehat{\phi}_t) \cdot \hat{o}_t ]

Modeling blueprint

1) Fill/Survival model

Use discrete-time survival or hazard model over short buckets (e.g., 100ms–1s):

[ h_{t,k} = P(\text{fill in bucket }k \mid \text{not filled before}, x_{t,k}) ]

Derive:

fill probability within horizon,
expected fill ratio,
expected time-to-fill.

Useful features:

queue position estimate + queue age,
spread/microprice drift,
cancel burst intensity,
own participation and recent replace frequency,
venue/session flags (open/close/auction adjacency).

2) Conditional slippage model (filled branch)

Model (S_t) on filled events only, but correct selection bias via weighting.

Two practical choices:

IPW (inverse probability weighting) [ w_t = \frac{1}{\max(\hat{p}_t, \epsilon)} ]
AIPW / doubly robust for better finite-sample stability.

Train with robust loss (Huber/quantile) because tails matter more than mean.

3) Opportunity-cost model (unfilled branch)

Estimate what happens to leftover quantity when the first intent fails:

next decision slippage,
sweep-to-complete premium near deadline,
alpha decay penalty while waiting.

Best source: counterfactual/replay engine with policy logs. If unavailable, approximate with:

next-touch arrival-to-fill delta,
urgency bucket templates,
symbol/session-specific fallback priors.

4) Joint estimator and uncertainty

Return not just point estimate but interval:

[ \widehat{C}_t^{q50}, \widehat{C}_t^{q90}, \widehat{C}_t^{q99} ]

Use conformal or quantile calibration by symbol-liquidity bucket. Controllers should consume q90+ for guardrails, not mean only.

Training dataset design (critical)

Unit of analysis = intent event, not trade print.

Include for each intent:

parent order context (remaining qty, urgency, deadline),
microstructure snapshot at send/cancel/replace,
lifecycle outcomes (partial %, time-to-fill, canceled reason),
linkage to subsequent residue handling.

Avoid leakage:

purged split by parent-order time,
no future-book states in features,
keep policy version as explicit feature (behavior policy drift is real).

Evaluation scorecard

Primary offline metrics:

calibration of (\widehat{\phi}_t) (Brier/ECE),
MAE/Pinball for (S_t) and (O_t),
error on combined (C_t) at q50/q90.

Policy-facing metrics:

realized IS reduction at same completion SLA,
q95/q99 tail slippage,
underfill rate and deadline breach rate,
“surprise loss” frequency (realized > predicted q90).

Ablation to prove value:

filled-only baseline,
fill+price two-head,
full censoring-aware (fill+price+opportunity).

If (3) doesn’t improve tail control, calibration is not production-ready.

Online controller integration

At each decision tick, compute:

expected cost of resting passive now,
expected cost of reprice,
expected cost of cross/spread-taking.

Choose action minimizing risk-adjusted objective:

[ \arg\min_a ; \widehat{C}_t(a) + \lambda \cdot \text{CompletionRisk}_t(a) ]

with (\lambda) increasing as deadline approaches.

Simple tiering:

GREEN: q90 cost within budget → passive-first,
YELLOW: budget pressure → shorten dwell/reprice faster,
RED: high no-fill risk + high opportunity penalty → controlled aggression.

Pseudocode (intent-centric)

for intent in decision_stream:
    x = build_features(intent)

    phi = fill_ratio_model.predict(x)          # E[F|x]
    s_q50, s_q90 = slippage_model.predict(x)   # conditional on fill, bias-corrected
    o_q50, o_q90 = opp_model.predict(x)        # unfilled residue penalty

    c_q50 = phi * s_q50 + (1 - phi) * o_q50
    c_q90 = phi * s_q90 + (1 - phi) * o_q90

    action = policy_min_cost_under_budget(c_q90, completion_risk(intent))
    send(action)

Common failure modes

Print-level training only
Ignores canceled intents; produces chronic optimism.
No policy-version feature
Model drift appears as market drift.
Opportunity cost set to zero
Encourages fake patience and deadline panic later.
Mean-only optimization
Tail blowups continue even when average improves.
No closed-loop monitoring
Calibration decays silently after router/risk rule changes.

Minimal production monitoring

Fill-prob calibration dashboard by symbol bucket,
Predicted vs realized q90 gap (rolling),
Surprise-loss alerts (q90 breaches),
Feature freshness/latency SLO,
Policy-regime drift alarms (new behavior policy hash).

Run weekly champion-challenger:

champion = current production model,
challenger = latest retrain + fresh censoring correction,
promote only if tail + completion KPIs both improve.

One-line takeaway

A slippage model trained only on fills is a survivorship trap; model costs at the intent level (fill + no-fill opportunity branch) to control real-world execution tails instead of paper averages.