Slippage Modeling Selective Prediction & Abstention Playbook

2026-03-29 · finance

Slippage Modeling Selective Prediction & Abstention Playbook

Trade Only When the Model Knows Enough — Otherwise Fall Back Safely

Why this note: In production, slippage models fail less from average error and more from overconfident errors during regime breaks. This playbook adds a reject option (abstention) to execution policy so the system can defer to robust baseline tactics when uncertainty is too high.


1) Failure Mode in One Sentence

If your slippage model is forced to predict/action on every state, it will eventually trade aggressively in out-of-distribution regimes and convert model overconfidence into real PnL leakage.


2) Core Idea: Add a Reject Option to Execution Decisions

At each decision time (t), choose either:

Define per-decision cost:

[ L_t(a)=IS_t(a)+\lambda,Risk_t(a) ]

For abstention:

[ L_t(\bot)=L_t(\pi_b)+\kappa ]

Where:

The system chooses action only when confidence exceeds threshold; otherwise fallback to (\pi_b).


3) Objective with Coverage Budget

A practical target is not "always predict," but bounded risk at controlled coverage:

[ \min ; \mathbb{E}[L_t] \quad \text{s.t.} \quad \mathbb{P}(\text{act}) \ge c_{min}, ; \mathbb{E}[L_t \mid \text{act}] \le r_{max} ]

Interpretation:

  1. Keep enough automation coverage (avoid abstaining on everything),
  2. Enforce quality on acted decisions,
  3. Let fallback absorb states the model does not understand.

This is execution-ops equivalent of selective prediction / reject-option classification.


4) Confidence Stack (Do Not Use One Score Only)

Use a composite confidence signal (C_t) from three channels:

A) Aleatoric uncertainty (market randomness now)

B) Epistemic uncertainty (model ignorance)

C) Data quality confidence

Then gate decisions by calibrated threshold (C_t \ge \tau).


5) Telemetry Contract (Mandatory)

Decision facts

Model outputs

Abstention diagnostics

Outcomes

Without explicit abstain reason codes, you cannot improve coverage policy safely.


6) Calibration and Guarantees

A) Confidence calibration

B) Conformalized risk bands

Use conformal quantiles for finite-sample style risk control on predicted tail cost, refreshed on rolling windows.

C) Coverage guardrails

Set explicit bounds:

D) Baseline safety floor

Fallback (\pi_b) must be audited, deterministic-enough, and capacity-tested. A weak baseline nullifies abstention benefits.


7) Runtime State Machine

NORMAL

CAUTION

Trigger: drift monitors or calibration warning.

ABSTAIN_GUARD

Trigger: confidence collapse / stale critical features / severe undercoverage.

RECOVERY


8) KPIs That Actually Matter

  1. Selective IS [ \mathbb{E}[IS \mid \text{act}] ]

  2. Coverage [ \mathbb{P}(\text{act}) ]

  3. Risk-at-Coverage (RaC) Tail loss conditional on acted subset.

  4. Abstention Regret vs Baseline Difference between fallback realized cost and model-acted counterfactual estimate (with conservative bounds).

  5. Bad Act Rate (BAR) Rate of acted decisions whose realized loss exceeds abstention fallback by threshold.

  6. Reason Mix Stability Share of abstain reasons over time (ood spike = regime shift alarm).

Monitor these by liquidity bucket and session segment; aggregate daily averages hide failure clusters.


9) Rollout Blueprint (Low-Regret)

  1. Shadow stage
    • Log confidence + abstain decisions, but do not enforce.
  2. Soft gate
    • Enforce abstain only for extreme low-confidence tail.
  3. Bucket canary
    • Enable by symbols/venues with notional caps.
  4. Hard gate promotion
    • Require improved tail metrics with bounded coverage loss.
  5. Incident drills
    • Simulate data staleness/OOD bursts and verify automatic fallback.

10) Common Mistakes


11) Fast Implementation Checklist

[ ] Add abstain action with explicit fallback policy mapping
[ ] Build confidence stack (aleatoric + epistemic + data quality)
[ ] Calibrate confidence and conformalize tail-risk estimates
[ ] Define coverage floor / abstain ceiling / incident thresholds
[ ] Log abstain reasons as first-class telemetry
[ ] Roll out via shadow -> soft gate -> canary -> promotion
[ ] Gate promotion on tail-risk improvement, not mean-only IS

References


TL;DR

A slippage model should earn the right to act. Add calibrated selective prediction with abstention, keep a strong baseline fallback, and optimize risk-at-coverage rather than forcing full automation in unknown regimes.