Slippage Modeling Selective Prediction & Abstention Playbook
Trade Only When the Model Knows Enough — Otherwise Fall Back Safely
Why this note: In production, slippage models fail less from average error and more from overconfident errors during regime breaks. This playbook adds a reject option (abstention) to execution policy so the system can defer to robust baseline tactics when uncertainty is too high.
1) Failure Mode in One Sentence
If your slippage model is forced to predict/action on every state, it will eventually trade aggressively in out-of-distribution regimes and convert model overconfidence into real PnL leakage.
2) Core Idea: Add a Reject Option to Execution Decisions
At each decision time (t), choose either:
- a tactic (a \in \mathcal{A}) (passive/pegged/marketable/sweep variants), or
- abstain (\bot): hand control to a conservative baseline policy (\pi_b).
Define per-decision cost:
[ L_t(a)=IS_t(a)+\lambda,Risk_t(a) ]
For abstention:
[ L_t(\bot)=L_t(\pi_b)+\kappa ]
Where:
- (IS_t): expected implementation shortfall component,
- (Risk_t): tail penalty (e.g., CVaR exceedance proxy),
- (\kappa): abstention friction (opportunity cost / reduced aggressiveness).
The system chooses action only when confidence exceeds threshold; otherwise fallback to (\pi_b).
3) Objective with Coverage Budget
A practical target is not "always predict," but bounded risk at controlled coverage:
[ \min ; \mathbb{E}[L_t] \quad \text{s.t.} \quad \mathbb{P}(\text{act}) \ge c_{min}, ; \mathbb{E}[L_t \mid \text{act}] \le r_{max} ]
Interpretation:
- Keep enough automation coverage (avoid abstaining on everything),
- Enforce quality on acted decisions,
- Let fallback absorb states the model does not understand.
This is execution-ops equivalent of selective prediction / reject-option classification.
4) Confidence Stack (Do Not Use One Score Only)
Use a composite confidence signal (C_t) from three channels:
A) Aleatoric uncertainty (market randomness now)
- spread/depth instability,
- microprice variance,
- short-horizon volatility burst.
B) Epistemic uncertainty (model ignorance)
- ensemble disagreement,
- distance to training manifold,
- leaf occupancy / low support diagnostics.
C) Data quality confidence
- feature freshness age,
- missing-field ratio,
- market-data/order-path timestamp coherence,
- stale snapshot risk.
Then gate decisions by calibrated threshold (C_t \ge \tau).
5) Telemetry Contract (Mandatory)
Decision facts
decision_ts,symbol,side,parent_id,child_idremaining_qty,time_to_deadline_ms,urgency_state
Model outputs
pred_is_bps,pred_tail_q95_bpsconfidence_score,coverage_bucketuncertainty_aleatoric,uncertainty_epistemic
Abstention diagnostics
abstain_flagabstain_reason(ood,low_support,stale_features,tail_risk,model_timeout)fallback_policy_id,fallback_action
Outcomes
realized_is_bps,realized_markout_5s/30s/120scompletion_prob_pred,completion_realizeddeadline_breach_flag,forced_catchup_flag
Without explicit abstain reason codes, you cannot improve coverage policy safely.
6) Calibration and Guarantees
A) Confidence calibration
- Reliability curves by bucket (symbol × session × urgency),
- isotonic/Platt/temperature methods per model family.
B) Conformalized risk bands
Use conformal quantiles for finite-sample style risk control on predicted tail cost, refreshed on rolling windows.
C) Coverage guardrails
Set explicit bounds:
coverage_floor(e.g., 65–80% depending on strategy),abstain_ceilingfor prolonged fallback episodes,- dynamic threshold widening during known stress windows.
D) Baseline safety floor
Fallback (\pi_b) must be audited, deterministic-enough, and capacity-tested. A weak baseline nullifies abstention benefits.
7) Runtime State Machine
NORMAL
- Standard threshold (\tau_0), normal tactic set.
CAUTION
Trigger: drift monitors or calibration warning.
- Raise threshold to (\tau_1 > \tau_0),
- reduce high-impact tactics,
- increase fallback share.
ABSTAIN_GUARD
Trigger: confidence collapse / stale critical features / severe undercoverage.
- Force abstention for affected buckets,
- route via conservative baseline schedule,
- page operator if duration > SLA.
RECOVERY
- Gradually lower threshold after calibration recovers,
- hysteresis + min-dwell to avoid thrash.
8) KPIs That Actually Matter
Selective IS [ \mathbb{E}[IS \mid \text{act}] ]
Coverage [ \mathbb{P}(\text{act}) ]
Risk-at-Coverage (RaC) Tail loss conditional on acted subset.
Abstention Regret vs Baseline Difference between fallback realized cost and model-acted counterfactual estimate (with conservative bounds).
Bad Act Rate (BAR) Rate of acted decisions whose realized loss exceeds abstention fallback by threshold.
Reason Mix Stability Share of abstain reasons over time (
oodspike = regime shift alarm).
Monitor these by liquidity bucket and session segment; aggregate daily averages hide failure clusters.
9) Rollout Blueprint (Low-Regret)
- Shadow stage
- Log confidence + abstain decisions, but do not enforce.
- Soft gate
- Enforce abstain only for extreme low-confidence tail.
- Bucket canary
- Enable by symbols/venues with notional caps.
- Hard gate promotion
- Require improved tail metrics with bounded coverage loss.
- Incident drills
- Simulate data staleness/OOD bursts and verify automatic fallback.
10) Common Mistakes
- Using uncalibrated model score as "confidence".
- Penalizing abstention so hard that system never abstains.
- Ignoring fallback policy quality and assuming abstention is free.
- Tracking only mean IS and missing tail conditional on acted subset.
- Letting coverage collapse silently during drift.
11) Fast Implementation Checklist
[ ] Add abstain action with explicit fallback policy mapping
[ ] Build confidence stack (aleatoric + epistemic + data quality)
[ ] Calibrate confidence and conformalize tail-risk estimates
[ ] Define coverage floor / abstain ceiling / incident thresholds
[ ] Log abstain reasons as first-class telemetry
[ ] Roll out via shadow -> soft gate -> canary -> promotion
[ ] Gate promotion on tail-risk improvement, not mean-only IS
References
- Chow, C. K. (1970), On Optimum Recognition Error and Reject Tradeoff.
- Geifman, Y., El-Yaniv, R. (2017), Selective Classification for Deep Neural Networks.
- Vovk, V., Gammerman, A., Shafer, G. (2005), Algorithmic Learning in a Random World (conformal prediction).
- Angelopoulos, A. N., et al. (2022), Conformal Risk Control.
- Laroche, R., Trichelair, P., Des Combes, R. T. (2019), Safe Policy Improvement with Baseline Bootstrapping (SPIBB).
- Gatheral, J. (2010), No-Dynamic-Arbitrage and Market Impact.
TL;DR
A slippage model should earn the right to act. Add calibrated selective prediction with abstention, keep a strong baseline fallback, and optimize risk-at-coverage rather than forcing full automation in unknown regimes.