Causal Uplift Slippage Model for Execution Tactic Selection
TL;DR
Most slippage models answer: "How expensive will this order be?"
The more useful live question is: "Which tactic reduces slippage the most for this exact state?"
A practical way to do this is a causal uplift framework:
- treat each tactic (
join,improve,mid,take_small,pause) as a treatment, - estimate treatment propensities from historical router behavior,
- estimate counterfactual slippage outcomes,
- compute doubly robust uplift per tactic vs baseline,
- execute only if uplift is positive under tail-risk and deadline constraints.
This turns execution from static prediction into counterfactual decisioning.
1) Why plain predictive slippage models are not enough
A standard model predicts expected slippage:
[ \hat y = E[\text{slippage}\mid X] ]
Useful, but incomplete for control.
Two hard production issues:
- Action-selection bias: historical actions were chosen by old policies, so observed outcomes are confounded.
- Action ambiguity: low predicted slippage does not tell you whether
joinbeatstake_smallin this state.
Result: policy upgrades can look good in offline averages but fail live when state mix shifts.
2) Causal setup
Define:
- (X_t): state features at decision time (queue, imbalance, spread, depth slope, cancel burst, volatility, urgency, residual/deadline),
- (A_t \in \mathcal{A}): chosen tactic,
- (Y_t(a)): potential slippage outcome if tactic (a) were used,
- observed (Y_t = Y_t(A_t)).
We care about uplift relative to baseline tactic (a_0):
[ \tau_a(X_t)=E[Y(a)-Y(a_0)\mid X_t] ]
For buys, lower slippage is better, so negative (\tau_a) means improvement.
3) Data contract (decision-level, not order-level aggregate)
Per decision event, log:
- parent/order metadata: side, residual qty, urgency bucket, deadline distance,
- market microstructure snapshot: top-of-book, microprice drift, imbalance, queue rank estimate,
- latency state: gateway/ack/book freshness,
- action set available at decision time,
- chosen action and router scores,
- realized outcomes:
- realized slippage (bps),
- fill ratio within horizon,
- miss penalty / catch-up cost,
- post-fill 5s/30s markout.
Critical: include action availability mask. If a tactic was infeasible in a state, don’t treat it as an unchosen alternative.
4) Step A — propensity model (who chose what)
Estimate
[ \pi(a\mid X)=P(A=a\mid X) ]
with calibrated multinomial model (GBDT/NN + temperature or isotonic calibration).
Operational rules:
- clip propensities: (\pi \leftarrow \max(\pi, \epsilon)),
- monitor effective sample size per tactic,
- reject offline evaluation when overlap collapses.
Overlap diagnostics (must-pass):
- propensity histograms by tactic,
- fraction below (\epsilon),
- covariate balance after IPTW.
5) Step B — outcome models (what would happen)
Model conditional outcome for each tactic:
[ \mu_a(X)=E[Y\mid X,A=a] ]
Use quantile heads, not mean-only:
- (q_{0.5}), (q_{0.9}), (q_{0.95}) of slippage,
- optional joint head for fill-ratio and delay penalty.
Reason: execution incidents are tail-dominated; mean-only policies hide q95 blowups.
6) Step C — doubly robust uplift estimation
For each tactic (a), estimate value with AIPW/DR form:
[ \hat V(a)=\frac{1}{n}\sum_i \left[ \hat\mu_a(X_i) + \frac{\mathbf{1}[A_i=a]}{\hat\pi(a\mid X_i)}(Y_i-\hat\mu_a(X_i)) \right] ]
State-level uplift vs baseline (a_0):
[ \hat\tau_a(X)=\hat\mu_a(X)-\hat\mu_{a_0}(X) + \text{DR correction} ]
Why DR:
- consistent if either propensity or outcome model is right,
- lower bias than pure direct model or pure IPS under realistic misspecification.
7) Policy objective: uplift + risk + completion
At runtime choose tactic:
[ a_t^* = \arg\min_{a\in\mathcal{A}t} \big(\hat\mu_a^{\text{mean}}(X_t)+\lambda,\hat q{0.95,a}(X_t)+\gamma,\text{missPenalty}_a(X_t)\big) ]
subject to:
- remaining slippage budget constraint,
- hard deadline completion floor,
- max aggression / max child notional,
- venue/session constraints.
Equivalent uplift rule:
- execute non-baseline tactic only if
- expected uplift is positive,
- q95 uplift is non-worsening,
- confidence interval excludes zero by configured margin.
8) Conservative deployment ladder
- Offline replay only (no routing impact): DR uplift diagnostics by symbol/TOD/regime.
- Shadow scoring in production: log proposed tactic vs incumbent.
- Canary traffic (1–5%): strict rollback on q95 degradation.
- State-gated expansion: only expand in regimes with stable overlap + calibration.
- Full policy with fallback: instant revert to baseline on drift alarms.
Rollback triggers:
- q95 slippage breach for 3 consecutive windows,
- propensity overlap collapse,
- sharp rise in late panic crossing ratio,
- marked increase in action churn.
9) Monitoring panel (minimum)
uplift_realized_minus_predicted_p50/p95dr_ess_by_tacticpropensity_overlap_violation_ratepolicy_switch_rateand dwell-time distributiondeadline_miss_ratepanic_take_ratio- parent-level
mean/q90/q95slippage by regime
Add changepoint detectors on uplift residuals; when drift is detected, auto-tighten to safer action subset.
10) Failure modes to preempt
- No overlap: model recommends tactics unseen in that region of state space.
- Leaky features: post-decision fields accidentally included in training.
- Objective mismatch: optimize slippage only, ignore completion risk.
- State aliasing: missing microstructure regime features causes unstable recommendations.
- Over-frequent policy flips: insufficient hysteresis increases signaling and fees.
Hard fixes:
- action masking + overlap gates,
- strict feature-time audit,
- multi-objective loss (mean + q95 + miss),
- minimum dwell + hysteresis,
- safe baseline fallback at all times.
11) Expected production impact
When implemented with strict overlap/risk controls:
- better tactic choice under heterogeneous regimes,
- lower tail slippage than one-size-fits-all policy,
- fewer false upgrades from confounded offline tests,
- clearer post-trade causality: what tactic helped, where, and by how much.
What it does not solve:
- poor market data quality,
- high latency infrastructure debt,
- venue-rule compliance logic.
Those remain prerequisite reliability layers.
Closing
Predicting slippage is useful; estimating action-specific causal uplift is operational.
If your router must decide among tactics every second, DR uplift modeling gives a practical way to move from "best average forecast" to "best action for this state, under tail and deadline constraints."
References
- Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.
- Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. PNAS, 113(27), 7353–7360.
- Dudík, M., Langford, J., & Li, L. (2011). Doubly Robust Policy Evaluation and Learning. ICML.
- Swaminathan, A., & Joachims, T. (2015). Counterfactual Risk Minimization. ICML.
- Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and High-Frequency Trading. Cambridge University Press.
- Gatheral, J., Schied, A., & Slynko, A. (2012). Transient linear price impact and Fredholm equations. Mathematical Finance, 22(3), 445–474.