Slippage Modeling with Point-in-Time Feature Integrity

A Practical Playbook to Kill Lookahead Leakage Before It Hits Live PnL

Why this note: Many slippage models look great offline not because they are smart, but because they accidentally peek into the future (late prints, corrected books, revised reference data). This note is about making execution-cost modeling time-honest.

1) The Hidden Failure Mode

A model leaks when feature values at decision time (t_d) are built with information that became available at (t > t_d).

In slippage pipelines, this happens more often than teams think:

As-of join errors: joining on event time only, ignoring when data actually arrived.
Late/repair data contamination: corrected trades, replayed packets, post-facto venue state updates.
Feature recompute drift: rolling features rebuilt from “final” historical data instead of live-available snapshots.
Label/feature clock mismatch: label anchored at decision time, features anchored at end-of-second batch.

Result: offline MAE improves, live markout worsens.

2) Time Model You Need (No Exceptions)

For every signal and every label, store at least:

event_time — when exchange/venue says event happened.
ingest_time — when your system first saw it.
decision_time — when tactic/router actually made action.
publish_version — immutable feature/lookup version at decision time.

Rule: feature eligibility is based on ingest_time <= decision_time, not event time.

3) Common Leak Sources in Execution Research

Book-state leakage
- Using reconstructed L2 that includes packets received after action.
Benchmark leakage
- Arrival/VWAP benchmark computed from cleaned bars unavailable in real-time.
Reference data leakage
- Tick-size, lot-size, fee tier, SSR flags, corporate-action state applied with revision hindsight.
Transport-state leakage
- Using final ACK/cancel status in pre-trade features.
Cross-venue synchronization leakage
- Assuming zero skew between venues/feeds in historical replay.

4) Point-in-Time Join Contract (Minimal SQL Pattern)

-- For each decision row d, pick the latest feature snapshot f
-- that was actually ingested before the decision.
SELECT d.order_id,
       d.decision_time,
       f.feature_value,
       f.ingest_time AS feature_ingest_time
FROM decisions d
LEFT JOIN LATERAL (
  SELECT feature_value, ingest_time
  FROM feature_store f
  WHERE f.symbol = d.symbol
    AND f.ingest_time <= d.decision_time
  ORDER BY f.ingest_time DESC
  LIMIT 1
) f ON TRUE;

If this constraint is missing, assume leakage until proven otherwise.

5) Leakage Diagnostics (Production KPIs)

Track these continuously:

Forward Availability Violations (FAV): share of rows where any feature has ingest_time > decision_time.
Leakage Gap (LG): median/95p(feature_ingest_time - decision_time) clipped at positive values.
Shadow-Live Delta (SLD): offline prediction vs live shadow prediction drift on identical orders.
PIT Coverage Ratio (PCR): fraction of features backed by versioned PIT snapshots.
Replay Integrity Mismatch (RIM): discrepancy between packet-order replay and stored feature stream.

Red line: FAV > 0.1% in active regimes should block promotion.

6) Training/Evaluation Setup that Resists Leakage

Purged walk-forward splits (Lopez de Prado style): prevent temporal overlap contamination.
Embargo window around split boundaries: drop samples near boundaries where latent leakage is highest.
Decision-time feature freeze: train and infer from same feature materialization logic.
Dual backtests:
- Idealized (clean final data)
- Deployable PIT (realistically delayed/partial data)

Promote only when deployable PIT metrics pass.

7) Model Design Implications

Leakage-safe slippage models should explicitly consume data-freshness state:

feature age (decision_time - feature_ingest_time)
feed skew estimates
recent gap-fill/recovery flags
per-source staleness buckets

Aging/freshness is not metadata; it is predictive microstructure context.

8) Safe Rollout Pattern

PIT linter in CI
- fail build if schema lacks event/ingest/decision clocks.
Shadow mode with frozen features (2+ weeks)
- compare live-shadow vs offline-shadow residuals.
Canary by symbol-liquidity buckets
- start with liquid names + low volatility sessions.
Leakage circuit breaker
- auto-fallback if FAV/LG/SLD breaches threshold.

9) Fast Implementation Checklist

[ ] Add event_time + ingest_time + decision_time to all datasets
[ ] Enforce ingest_time <= decision_time in feature joins
[ ] Version reference data (fee tiers, lot/tick, SSR, corp actions)
[ ] Add PIT diagnostics (FAV, LG, SLD, PCR, RIM)
[ ] Run purged walk-forward + embargo evaluation
[ ] Gate release on deployable-PIT metrics, not clean-replay metrics

10) References

Lopez de Prado, M. (2018), Advances in Financial Machine Learning (purged/embargo CV).
Perold, A. (1988), The Implementation Shortfall.
Almgren, R. & Chriss, N. (2001), Optimal Execution of Portfolio Transactions.
Cartea, Á., Jaimungal, S., Penalva, J. (2015), Algorithmic and High-Frequency Trading.

TL;DR

If your slippage model can “see” data that was not available at decision time, your backtest edge is fake. Use strict point-in-time joins, version every mutable reference source, monitor leakage KPIs in production, and only ship models that survive deployable-PIT evaluation.