Self-Impact-Corrected Implementation Shortfall Playbook

Date: 2026-03-11
Category: research
Scope: Live execution/TCA where benchmark drift is partially caused by your own flow

1) Why this matters in production

Most desks track implementation shortfall (IS) against decision/arrival benchmark and call it “slippage.”

But in live trading, that benchmark is often contaminated by your own child-order footprint:

your early fills move the tape,
later fills are compared against a benchmark path your flow already changed,
strategy reviews then over/under-credit execution quality.

Result: model upgrades are evaluated on a biased scorecard.

2) Core idea

Split observed price path into:

exogenous move (what market would do without your flow),
endogenous move (your own impact).

Then compute two KPIs:

Observed IS (legacy),
Self-impact-corrected IS (debiased).

The gap is the benchmark-pollution tax.

3) Minimal model

Let midprice be

[ m_t = f_t + I_t + \epsilon_t ]

(f_t): exogenous/fundamental component,
(I_t): impact from own executions,
(\epsilon_t): microstructure noise.

For child fills ((t_i, q_i, s_i)) with side (s_i \in {+1,-1}):

[ I_t = \sum_{i:t_i\le t} s_i,\phi(|q_i|),G(t-t_i) ]

(\phi(\cdot)): instantaneous impact shape (often concave),
(G(\cdot)): decay kernel.

3.1 Observed IS

[ IS_{obs} = \sum_i s_i q_i (p_i - m_{dec}) ]

3.2 Debiased IS

Estimate counterfactual mid without own impact:

[ \hat f_t = m_t - \hat I_t ]

Then:

[ IS_{debias} = \sum_i s_i q_i (p_i - \hat f_{dec}) ]

3.3 Benchmark Pollution Ratio (BPR)

[ BPR = \frac{IS_{obs} - IS_{debias}}{\max(|IS_{obs}|, \varepsilon)} ]

Interpretation:

high (+) BPR: legacy IS is overstating cost due to self-footprint,
negative BPR: possible model misspecification or over-correction.

4) Practical calibration loop

Weekly (offline)

Fit (\phi) by participation/liquidity buckets.
Fit decay (G) (multi-exponential or power-law approximation).
Enforce sanity constraints:
- monotone non-increasing decay,
- no-dynamic-arbitrage-consistent shape assumptions,
- side symmetry checks.

Daily (online refresh)

Re-estimate scale multipliers by symbol-liquidity × session bucket.
Recompute q90/q95 error of post-trade markout residuals.
Gate model if residual drift exceeds threshold.

5) Desk metrics that actually help decisions

Track these together (not one metric alone):

Observed IS (bps)
Debiased IS (bps)
BPR
Residual Exogenous Drift (RED): move in (\hat f_t) during schedule
Completion Reliability: residual inventory at deadline
Tail Health: q95/q99 of debiased IS

If observed IS improves but debiased IS does not, you likely optimized the benchmark artifact, not real execution quality.

6) Controller integration (production)

Use debiased metrics in routing urgency controller:

GREEN: BPR stable, residual calibration healthy → normal policy
AMBER: BPR rising or residual drift unstable → reduce aggression jumps, widen hysteresis
RED: calibration break (coverage fail / residual bias) → conservative fallback policy
SAFE: hard guard mode, bounded POV + strict deadline floor

This prevents tactical overreaction to polluted benchmark signals.

7) Validation design (A/B)

Replay same parent orders with two scorecards:

A: legacy observed IS-only decisions
B: debiased IS + BPR-aware controller

Evaluate:

mean and q95 debiased IS,
completion rate,
regime robustness (open/close/high-vol windows),
policy stability (state flapping frequency).

Promotion condition: improved debiased tail metrics without completion degradation.

8) Implementation checklist

Child-order logs include nanosecond timestamps, side, qty, venue, order lifecycle
Impact kernel service supports intraday bucket refresh
TCA pipeline emits observed + debiased metrics side-by-side
Dashboard includes BPR, RED, and calibration error bands
Controller has fallback switch when calibration health fails
Canary rollout with explicit rollback gates (q95, completion, reject rate)

9) Failure modes to watch

Overfitting decay kernel → unstable debiasing in stressed regimes
Ignoring hidden liquidity shifts → residual bias attributed to model
Venue-mix drift without refit → stale (\phi, G)
Clock/timestamp quality issues → fake lead-lag and wrong attribution

10) References

Gatheral, J. (2010), No-Dynamic-Arbitrage and Market Impact
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1292353
Obizhaeva, A. & Wang, J. (2013), Optimal Trading Strategy and Supply/Demand Dynamics
https://academic.oup.com/jfec/article/11/1/1/816163
Almgren, R. & Chriss, N. (2000), Optimal Execution of Portfolio Transactions
https://www.smallake.kr/wp-content/uploads/2016/03/optliq.pdf
Taranto et al. (2016), Linear Models for the Impact of Order Flow on Prices (Propagators)
https://arxiv.org/abs/1602.02735

One-line takeaway

If your benchmark includes your own footprint, “better slippage” can be a measurement illusion—debiased IS should be the promotion metric for execution policy changes.