Slippage Modeling Selective Prediction & Abstention Playbook

Trade Only When the Model Knows Enough — Otherwise Fall Back Safely

Why this note: In production, slippage models fail less from average error and more from overconfident errors during regime breaks. This playbook adds a reject option (abstention) to execution policy so the system can defer to robust baseline tactics when uncertainty is too high.

1) Failure Mode in One Sentence

If your slippage model is forced to predict/action on every state, it will eventually trade aggressively in out-of-distribution regimes and convert model overconfidence into real PnL leakage.

2) Core Idea: Add a Reject Option to Execution Decisions

At each decision time (t), choose either:

a tactic (a \in \mathcal{A}) (passive/pegged/marketable/sweep variants), or
abstain (\bot): hand control to a conservative baseline policy (\pi_b).

Define per-decision cost:

[ L_t(a)=IS_t(a)+\lambda,Risk_t(a) ]

For abstention:

[ L_t(\bot)=L_t(\pi_b)+\kappa ]

Where:

(IS_t): expected implementation shortfall component,
(Risk_t): tail penalty (e.g., CVaR exceedance proxy),
(\kappa): abstention friction (opportunity cost / reduced aggressiveness).

The system chooses action only when confidence exceeds threshold; otherwise fallback to (\pi_b).

3) Objective with Coverage Budget

A practical target is not "always predict," but bounded risk at controlled coverage:

[ \min ; \mathbb{E}[L_t] \quad \text{s.t.} \quad \mathbb{P}(\text{act}) \ge c_{min}, ; \mathbb{E}[L_t \mid \text{act}] \le r_{max} ]

Interpretation:

Keep enough automation coverage (avoid abstaining on everything),
Enforce quality on acted decisions,
Let fallback absorb states the model does not understand.

This is execution-ops equivalent of selective prediction / reject-option classification.

4) Confidence Stack (Do Not Use One Score Only)

Use a composite confidence signal (C_t) from three channels:

A) Aleatoric uncertainty (market randomness now)

spread/depth instability,
microprice variance,
short-horizon volatility burst.

B) Epistemic uncertainty (model ignorance)

ensemble disagreement,
distance to training manifold,
leaf occupancy / low support diagnostics.

C) Data quality confidence

feature freshness age,
missing-field ratio,
market-data/order-path timestamp coherence,
stale snapshot risk.

Then gate decisions by calibrated threshold (C_t \ge \tau).

5) Telemetry Contract (Mandatory)

Decision facts

decision_ts, symbol, side, parent_id, child_id
remaining_qty, time_to_deadline_ms, urgency_state

Model outputs

pred_is_bps, pred_tail_q95_bps
confidence_score, coverage_bucket
uncertainty_aleatoric, uncertainty_epistemic

Abstention diagnostics

abstain_flag
abstain_reason (ood, low_support, stale_features, tail_risk, model_timeout)
fallback_policy_id, fallback_action

Outcomes

realized_is_bps, realized_markout_5s/30s/120s
completion_prob_pred, completion_realized
deadline_breach_flag, forced_catchup_flag

Without explicit abstain reason codes, you cannot improve coverage policy safely.

6) Calibration and Guarantees

A) Confidence calibration

Reliability curves by bucket (symbol × session × urgency),
isotonic/Platt/temperature methods per model family.

B) Conformalized risk bands

Use conformal quantiles for finite-sample style risk control on predicted tail cost, refreshed on rolling windows.

C) Coverage guardrails

Set explicit bounds:

coverage_floor (e.g., 65–80% depending on strategy),
abstain_ceiling for prolonged fallback episodes,
dynamic threshold widening during known stress windows.

D) Baseline safety floor

Fallback (\pi_b) must be audited, deterministic-enough, and capacity-tested. A weak baseline nullifies abstention benefits.

7) Runtime State Machine

`NORMAL`

Standard threshold (\tau_0), normal tactic set.

`CAUTION`

Trigger: drift monitors or calibration warning.

Raise threshold to (\tau_1 > \tau_0),
reduce high-impact tactics,
increase fallback share.

`ABSTAIN_GUARD`

Trigger: confidence collapse / stale critical features / severe undercoverage.

Force abstention for affected buckets,
route via conservative baseline schedule,
page operator if duration > SLA.

`RECOVERY`

Gradually lower threshold after calibration recovers,
hysteresis + min-dwell to avoid thrash.

8) KPIs That Actually Matter

Selective IS [ \mathbb{E}[IS \mid \text{act}] ]
Coverage [ \mathbb{P}(\text{act}) ]
Risk-at-Coverage (RaC) Tail loss conditional on acted subset.
Abstention Regret vs Baseline Difference between fallback realized cost and model-acted counterfactual estimate (with conservative bounds).
Bad Act Rate (BAR) Rate of acted decisions whose realized loss exceeds abstention fallback by threshold.
Reason Mix Stability Share of abstain reasons over time (ood spike = regime shift alarm).

Monitor these by liquidity bucket and session segment; aggregate daily averages hide failure clusters.

9) Rollout Blueprint (Low-Regret)

Shadow stage
- Log confidence + abstain decisions, but do not enforce.
Soft gate
- Enforce abstain only for extreme low-confidence tail.
Bucket canary
- Enable by symbols/venues with notional caps.
Hard gate promotion
- Require improved tail metrics with bounded coverage loss.
Incident drills
- Simulate data staleness/OOD bursts and verify automatic fallback.

10) Common Mistakes

Using uncalibrated model score as "confidence".
Penalizing abstention so hard that system never abstains.
Ignoring fallback policy quality and assuming abstention is free.
Tracking only mean IS and missing tail conditional on acted subset.
Letting coverage collapse silently during drift.

11) Fast Implementation Checklist

[ ] Add abstain action with explicit fallback policy mapping
[ ] Build confidence stack (aleatoric + epistemic + data quality)
[ ] Calibrate confidence and conformalize tail-risk estimates
[ ] Define coverage floor / abstain ceiling / incident thresholds
[ ] Log abstain reasons as first-class telemetry
[ ] Roll out via shadow -> soft gate -> canary -> promotion
[ ] Gate promotion on tail-risk improvement, not mean-only IS

References

Chow, C. K. (1970), On Optimum Recognition Error and Reject Tradeoff.
Geifman, Y., El-Yaniv, R. (2017), Selective Classification for Deep Neural Networks.
Vovk, V., Gammerman, A., Shafer, G. (2005), Algorithmic Learning in a Random World (conformal prediction).
Angelopoulos, A. N., et al. (2022), Conformal Risk Control.
Laroche, R., Trichelair, P., Des Combes, R. T. (2019), Safe Policy Improvement with Baseline Bootstrapping (SPIBB).
Gatheral, J. (2010), No-Dynamic-Arbitrage and Market Impact.

TL;DR

A slippage model should earn the right to act. Add calibrated selective prediction with abstention, keep a strong baseline fallback, and optimize risk-at-coverage rather than forcing full automation in unknown regimes.