Causal Uplift Slippage Model for Execution Tactic Selection

2026-03-01 · finance

Causal Uplift Slippage Model for Execution Tactic Selection

TL;DR

Most slippage models answer: "How expensive will this order be?"

The more useful live question is: "Which tactic reduces slippage the most for this exact state?"

A practical way to do this is a causal uplift framework:

This turns execution from static prediction into counterfactual decisioning.


1) Why plain predictive slippage models are not enough

A standard model predicts expected slippage:

[ \hat y = E[\text{slippage}\mid X] ]

Useful, but incomplete for control.

Two hard production issues:

  1. Action-selection bias: historical actions were chosen by old policies, so observed outcomes are confounded.
  2. Action ambiguity: low predicted slippage does not tell you whether join beats take_small in this state.

Result: policy upgrades can look good in offline averages but fail live when state mix shifts.


2) Causal setup

Define:

We care about uplift relative to baseline tactic (a_0):

[ \tau_a(X_t)=E[Y(a)-Y(a_0)\mid X_t] ]

For buys, lower slippage is better, so negative (\tau_a) means improvement.


3) Data contract (decision-level, not order-level aggregate)

Per decision event, log:

Critical: include action availability mask. If a tactic was infeasible in a state, don’t treat it as an unchosen alternative.


4) Step A — propensity model (who chose what)

Estimate

[ \pi(a\mid X)=P(A=a\mid X) ]

with calibrated multinomial model (GBDT/NN + temperature or isotonic calibration).

Operational rules:

Overlap diagnostics (must-pass):


5) Step B — outcome models (what would happen)

Model conditional outcome for each tactic:

[ \mu_a(X)=E[Y\mid X,A=a] ]

Use quantile heads, not mean-only:

Reason: execution incidents are tail-dominated; mean-only policies hide q95 blowups.


6) Step C — doubly robust uplift estimation

For each tactic (a), estimate value with AIPW/DR form:

[ \hat V(a)=\frac{1}{n}\sum_i \left[ \hat\mu_a(X_i) + \frac{\mathbf{1}[A_i=a]}{\hat\pi(a\mid X_i)}(Y_i-\hat\mu_a(X_i)) \right] ]

State-level uplift vs baseline (a_0):

[ \hat\tau_a(X)=\hat\mu_a(X)-\hat\mu_{a_0}(X) + \text{DR correction} ]

Why DR:


7) Policy objective: uplift + risk + completion

At runtime choose tactic:

[ a_t^* = \arg\min_{a\in\mathcal{A}t} \big(\hat\mu_a^{\text{mean}}(X_t)+\lambda,\hat q{0.95,a}(X_t)+\gamma,\text{missPenalty}_a(X_t)\big) ]

subject to:

Equivalent uplift rule:


8) Conservative deployment ladder

  1. Offline replay only (no routing impact): DR uplift diagnostics by symbol/TOD/regime.
  2. Shadow scoring in production: log proposed tactic vs incumbent.
  3. Canary traffic (1–5%): strict rollback on q95 degradation.
  4. State-gated expansion: only expand in regimes with stable overlap + calibration.
  5. Full policy with fallback: instant revert to baseline on drift alarms.

Rollback triggers:


9) Monitoring panel (minimum)

Add changepoint detectors on uplift residuals; when drift is detected, auto-tighten to safer action subset.


10) Failure modes to preempt

  1. No overlap: model recommends tactics unseen in that region of state space.
  2. Leaky features: post-decision fields accidentally included in training.
  3. Objective mismatch: optimize slippage only, ignore completion risk.
  4. State aliasing: missing microstructure regime features causes unstable recommendations.
  5. Over-frequent policy flips: insufficient hysteresis increases signaling and fees.

Hard fixes:


11) Expected production impact

When implemented with strict overlap/risk controls:

What it does not solve:

Those remain prerequisite reliability layers.


Closing

Predicting slippage is useful; estimating action-specific causal uplift is operational.

If your router must decide among tactics every second, DR uplift modeling gives a practical way to move from "best average forecast" to "best action for this state, under tail and deadline constraints."


References