Self-Impact-Corrected Implementation Shortfall Playbook

2026-03-11 · finance

Self-Impact-Corrected Implementation Shortfall Playbook

Date: 2026-03-11
Category: research
Scope: Live execution/TCA where benchmark drift is partially caused by your own flow


1) Why this matters in production

Most desks track implementation shortfall (IS) against decision/arrival benchmark and call it “slippage.”

But in live trading, that benchmark is often contaminated by your own child-order footprint:

Result: model upgrades are evaluated on a biased scorecard.


2) Core idea

Split observed price path into:

  1. exogenous move (what market would do without your flow),
  2. endogenous move (your own impact).

Then compute two KPIs:

The gap is the benchmark-pollution tax.


3) Minimal model

Let midprice be

[ m_t = f_t + I_t + \epsilon_t ]

For child fills ((t_i, q_i, s_i)) with side (s_i \in {+1,-1}):

[ I_t = \sum_{i:t_i\le t} s_i,\phi(|q_i|),G(t-t_i) ]

3.1 Observed IS

[ IS_{obs} = \sum_i s_i q_i (p_i - m_{dec}) ]

3.2 Debiased IS

Estimate counterfactual mid without own impact:

[ \hat f_t = m_t - \hat I_t ]

Then:

[ IS_{debias} = \sum_i s_i q_i (p_i - \hat f_{dec}) ]

3.3 Benchmark Pollution Ratio (BPR)

[ BPR = \frac{IS_{obs} - IS_{debias}}{\max(|IS_{obs}|, \varepsilon)} ]

Interpretation:


4) Practical calibration loop

Weekly (offline)

  1. Fit (\phi) by participation/liquidity buckets.
  2. Fit decay (G) (multi-exponential or power-law approximation).
  3. Enforce sanity constraints:
    • monotone non-increasing decay,
    • no-dynamic-arbitrage-consistent shape assumptions,
    • side symmetry checks.

Daily (online refresh)


5) Desk metrics that actually help decisions

Track these together (not one metric alone):

If observed IS improves but debiased IS does not, you likely optimized the benchmark artifact, not real execution quality.


6) Controller integration (production)

Use debiased metrics in routing urgency controller:

This prevents tactical overreaction to polluted benchmark signals.


7) Validation design (A/B)

Replay same parent orders with two scorecards:

Evaluate:

  1. mean and q95 debiased IS,
  2. completion rate,
  3. regime robustness (open/close/high-vol windows),
  4. policy stability (state flapping frequency).

Promotion condition: improved debiased tail metrics without completion degradation.


8) Implementation checklist


9) Failure modes to watch

  1. Overfitting decay kernel → unstable debiasing in stressed regimes
  2. Ignoring hidden liquidity shifts → residual bias attributed to model
  3. Venue-mix drift without refit → stale (\phi, G)
  4. Clock/timestamp quality issues → fake lead-lag and wrong attribution

10) References


One-line takeaway

If your benchmark includes your own footprint, “better slippage” can be a measurement illusion—debiased IS should be the promotion metric for execution policy changes.