Causal Uplift Slippage Model for Execution Tactic Selection

TL;DR

Most slippage models answer: "How expensive will this order be?"

The more useful live question is: "Which tactic reduces slippage the most for this exact state?"

A practical way to do this is a causal uplift framework:

treat each tactic (join, improve, mid, take_small, pause) as a treatment,
estimate treatment propensities from historical router behavior,
estimate counterfactual slippage outcomes,
compute doubly robust uplift per tactic vs baseline,
execute only if uplift is positive under tail-risk and deadline constraints.

This turns execution from static prediction into counterfactual decisioning.

1) Why plain predictive slippage models are not enough

A standard model predicts expected slippage:

[ \hat y = E[\text{slippage}\mid X] ]

Useful, but incomplete for control.

Two hard production issues:

Action-selection bias: historical actions were chosen by old policies, so observed outcomes are confounded.
Action ambiguity: low predicted slippage does not tell you whether join beats take_small in this state.

Result: policy upgrades can look good in offline averages but fail live when state mix shifts.

2) Causal setup

Define:

(X_t): state features at decision time (queue, imbalance, spread, depth slope, cancel burst, volatility, urgency, residual/deadline),
(A_t \in \mathcal{A}): chosen tactic,
(Y_t(a)): potential slippage outcome if tactic (a) were used,
observed (Y_t = Y_t(A_t)).

We care about uplift relative to baseline tactic (a_0):

[ \tau_a(X_t)=E[Y(a)-Y(a_0)\mid X_t] ]

For buys, lower slippage is better, so negative (\tau_a) means improvement.

3) Data contract (decision-level, not order-level aggregate)

Per decision event, log:

parent/order metadata: side, residual qty, urgency bucket, deadline distance,
market microstructure snapshot: top-of-book, microprice drift, imbalance, queue rank estimate,
latency state: gateway/ack/book freshness,
action set available at decision time,
chosen action and router scores,
realized outcomes:
- realized slippage (bps),
- fill ratio within horizon,
- miss penalty / catch-up cost,
- post-fill 5s/30s markout.

Critical: include action availability mask. If a tactic was infeasible in a state, don’t treat it as an unchosen alternative.

4) Step A — propensity model (who chose what)

Estimate

[ \pi(a\mid X)=P(A=a\mid X) ]

with calibrated multinomial model (GBDT/NN + temperature or isotonic calibration).

Operational rules:

clip propensities: (\pi \leftarrow \max(\pi, \epsilon)),
monitor effective sample size per tactic,
reject offline evaluation when overlap collapses.

Overlap diagnostics (must-pass):

propensity histograms by tactic,
fraction below (\epsilon),
covariate balance after IPTW.

5) Step B — outcome models (what would happen)

Model conditional outcome for each tactic:

[ \mu_a(X)=E[Y\mid X,A=a] ]

Use quantile heads, not mean-only:

(q_{0.5}), (q_{0.9}), (q_{0.95}) of slippage,
optional joint head for fill-ratio and delay penalty.

Reason: execution incidents are tail-dominated; mean-only policies hide q95 blowups.

6) Step C — doubly robust uplift estimation

For each tactic (a), estimate value with AIPW/DR form:

[ \hat V(a)=\frac{1}{n}\sum_i \left[ \hat\mu_a(X_i) + \frac{\mathbf{1}[A_i=a]}{\hat\pi(a\mid X_i)}(Y_i-\hat\mu_a(X_i)) \right] ]

State-level uplift vs baseline (a_0):

[ \hat\tau_a(X)=\hat\mu_a(X)-\hat\mu_{a_0}(X) + \text{DR correction} ]

Why DR:

consistent if either propensity or outcome model is right,
lower bias than pure direct model or pure IPS under realistic misspecification.

7) Policy objective: uplift + risk + completion

At runtime choose tactic:

[ a_t^* = \arg\min_{a\in\mathcal{A}t} \big(\hat\mu_a^{\text{mean}}(X_t)+\lambda,\hat q{0.95,a}(X_t)+\gamma,\text{missPenalty}_a(X_t)\big) ]

subject to:

remaining slippage budget constraint,
hard deadline completion floor,
max aggression / max child notional,
venue/session constraints.

Equivalent uplift rule:

execute non-baseline tactic only if
- expected uplift is positive,
- q95 uplift is non-worsening,
- confidence interval excludes zero by configured margin.

8) Conservative deployment ladder

Offline replay only (no routing impact): DR uplift diagnostics by symbol/TOD/regime.
Shadow scoring in production: log proposed tactic vs incumbent.
Canary traffic (1–5%): strict rollback on q95 degradation.
State-gated expansion: only expand in regimes with stable overlap + calibration.
Full policy with fallback: instant revert to baseline on drift alarms.

Rollback triggers:

q95 slippage breach for 3 consecutive windows,
propensity overlap collapse,
sharp rise in late panic crossing ratio,
marked increase in action churn.

9) Monitoring panel (minimum)

uplift_realized_minus_predicted_p50/p95
dr_ess_by_tactic
propensity_overlap_violation_rate
policy_switch_rate and dwell-time distribution
deadline_miss_rate
panic_take_ratio
parent-level mean/q90/q95 slippage by regime

Add changepoint detectors on uplift residuals; when drift is detected, auto-tighten to safer action subset.

10) Failure modes to preempt

No overlap: model recommends tactics unseen in that region of state space.
Leaky features: post-decision fields accidentally included in training.
Objective mismatch: optimize slippage only, ignore completion risk.
State aliasing: missing microstructure regime features causes unstable recommendations.
Over-frequent policy flips: insufficient hysteresis increases signaling and fees.

Hard fixes:

action masking + overlap gates,
strict feature-time audit,
multi-objective loss (mean + q95 + miss),
minimum dwell + hysteresis,
safe baseline fallback at all times.

11) Expected production impact

When implemented with strict overlap/risk controls:

better tactic choice under heterogeneous regimes,
lower tail slippage than one-size-fits-all policy,
fewer false upgrades from confounded offline tests,
clearer post-trade causality: what tactic helped, where, and by how much.

What it does not solve:

poor market data quality,
high latency infrastructure debt,
venue-rule compliance logic.

Those remain prerequisite reliability layers.

Closing

Predicting slippage is useful; estimating action-specific causal uplift is operational.

If your router must decide among tactics every second, DR uplift modeling gives a practical way to move from "best average forecast" to "best action for this state, under tail and deadline constraints."

References

Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.
Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. PNAS, 113(27), 7353–7360.
Dudík, M., Langford, J., & Li, L. (2011). Doubly Robust Policy Evaluation and Learning. ICML.
Swaminathan, A., & Joachims, T. (2015). Counterfactual Risk Minimization. ICML.
Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and High-Frequency Trading. Cambridge University Press.
Gatheral, J., Schied, A., & Slynko, A. (2012). Transient linear price impact and Fredholm equations. Mathematical Finance, 22(3), 445–474.