Self-Impact-Corrected Implementation Shortfall Playbook
Date: 2026-03-11
Category: research
Scope: Live execution/TCA where benchmark drift is partially caused by your own flow
1) Why this matters in production
Most desks track implementation shortfall (IS) against decision/arrival benchmark and call it “slippage.”
But in live trading, that benchmark is often contaminated by your own child-order footprint:
- your early fills move the tape,
- later fills are compared against a benchmark path your flow already changed,
- strategy reviews then over/under-credit execution quality.
Result: model upgrades are evaluated on a biased scorecard.
2) Core idea
Split observed price path into:
- exogenous move (what market would do without your flow),
- endogenous move (your own impact).
Then compute two KPIs:
- Observed IS (legacy),
- Self-impact-corrected IS (debiased).
The gap is the benchmark-pollution tax.
3) Minimal model
Let midprice be
[ m_t = f_t + I_t + \epsilon_t ]
- (f_t): exogenous/fundamental component,
- (I_t): impact from own executions,
- (\epsilon_t): microstructure noise.
For child fills ((t_i, q_i, s_i)) with side (s_i \in {+1,-1}):
[ I_t = \sum_{i:t_i\le t} s_i,\phi(|q_i|),G(t-t_i) ]
- (\phi(\cdot)): instantaneous impact shape (often concave),
- (G(\cdot)): decay kernel.
3.1 Observed IS
[ IS_{obs} = \sum_i s_i q_i (p_i - m_{dec}) ]
3.2 Debiased IS
Estimate counterfactual mid without own impact:
[ \hat f_t = m_t - \hat I_t ]
Then:
[ IS_{debias} = \sum_i s_i q_i (p_i - \hat f_{dec}) ]
3.3 Benchmark Pollution Ratio (BPR)
[ BPR = \frac{IS_{obs} - IS_{debias}}{\max(|IS_{obs}|, \varepsilon)} ]
Interpretation:
- high (+) BPR: legacy IS is overstating cost due to self-footprint,
- negative BPR: possible model misspecification or over-correction.
4) Practical calibration loop
Weekly (offline)
- Fit (\phi) by participation/liquidity buckets.
- Fit decay (G) (multi-exponential or power-law approximation).
- Enforce sanity constraints:
- monotone non-increasing decay,
- no-dynamic-arbitrage-consistent shape assumptions,
- side symmetry checks.
Daily (online refresh)
- Re-estimate scale multipliers by symbol-liquidity × session bucket.
- Recompute q90/q95 error of post-trade markout residuals.
- Gate model if residual drift exceeds threshold.
5) Desk metrics that actually help decisions
Track these together (not one metric alone):
- Observed IS (bps)
- Debiased IS (bps)
- BPR
- Residual Exogenous Drift (RED): move in (\hat f_t) during schedule
- Completion Reliability: residual inventory at deadline
- Tail Health: q95/q99 of debiased IS
If observed IS improves but debiased IS does not, you likely optimized the benchmark artifact, not real execution quality.
6) Controller integration (production)
Use debiased metrics in routing urgency controller:
- GREEN: BPR stable, residual calibration healthy → normal policy
- AMBER: BPR rising or residual drift unstable → reduce aggression jumps, widen hysteresis
- RED: calibration break (coverage fail / residual bias) → conservative fallback policy
- SAFE: hard guard mode, bounded POV + strict deadline floor
This prevents tactical overreaction to polluted benchmark signals.
7) Validation design (A/B)
Replay same parent orders with two scorecards:
- A: legacy observed IS-only decisions
- B: debiased IS + BPR-aware controller
Evaluate:
- mean and q95 debiased IS,
- completion rate,
- regime robustness (open/close/high-vol windows),
- policy stability (state flapping frequency).
Promotion condition: improved debiased tail metrics without completion degradation.
8) Implementation checklist
- Child-order logs include nanosecond timestamps, side, qty, venue, order lifecycle
- Impact kernel service supports intraday bucket refresh
- TCA pipeline emits observed + debiased metrics side-by-side
- Dashboard includes BPR, RED, and calibration error bands
- Controller has fallback switch when calibration health fails
- Canary rollout with explicit rollback gates (q95, completion, reject rate)
9) Failure modes to watch
- Overfitting decay kernel → unstable debiasing in stressed regimes
- Ignoring hidden liquidity shifts → residual bias attributed to model
- Venue-mix drift without refit → stale (\phi, G)
- Clock/timestamp quality issues → fake lead-lag and wrong attribution
10) References
- Gatheral, J. (2010), No-Dynamic-Arbitrage and Market Impact
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1292353 - Obizhaeva, A. & Wang, J. (2013), Optimal Trading Strategy and Supply/Demand Dynamics
https://academic.oup.com/jfec/article/11/1/1/816163 - Almgren, R. & Chriss, N. (2000), Optimal Execution of Portfolio Transactions
https://www.smallake.kr/wp-content/uploads/2016/03/optliq.pdf - Taranto et al. (2016), Linear Models for the Impact of Order Flow on Prices (Propagators)
https://arxiv.org/abs/1602.02735
One-line takeaway
If your benchmark includes your own footprint, “better slippage” can be a measurement illusion—debiased IS should be the promotion metric for execution policy changes.