The Fill-Probability Trap: Why Easy Fills Can Be Bad Fills

Date: 2026-03-08
Category: explore (market microstructure)

Why this is interesting

A lot of execution logic still assumes:

high fill probability = good
low fill probability = bad

But in modern limit order books, this can fail hard. Sometimes the orders that fill fastest are exactly the ones with the worst post-fill markout.

This note explores that paradox and how to operationalize it.

Core intuition: queue mechanics create a hidden trade-off

At the touch (best bid/ask), your passive order quality depends on two competing effects:

Execution likelihood (you want this high)
Post-fill return quality / toxicity (you want this not-too-negative)

In stressed or one-sided books, these often move in opposite directions.

If your quote is very likely to be hit, it may be because informed/toxic flow is about to run through that level.

Evidence stack (papers + practitioner tooling)

1) Order-flow imbalance drives short-horizon price moves

Cont, Kukanov, and Stoikov (2011/2014) show short-horizon price changes are strongly linked to order-flow imbalance (OFI) and approximately linear in OFI, with slope tied to depth.

Implication: your fill probability is not independent of future markout; it co-moves with local supply/demand pressure.

Source: https://arxiv.org/abs/1011.6402

2) Queue-reactive dynamics: state matters, not just time averages

Huang, Lehalle, Rosenbaum model the book as a state-dependent queuing system (queue-reactive model), where order-flow intensities depend on current queue state.

Implication: fill outcomes should be conditioned on queue state, not global unconditional rates.

Source: https://arxiv.org/abs/1312.0563

3) Fill probabilities can be modeled semi-analytically under state-dependent flows

Yu et al. derive tractable expressions for fill probabilities at best and deeper levels in state-dependent stochastic LOB models, validated on FX data.

Implication: “probability of fill by horizon T” can be treated as a first-class model output in execution control.

Source: https://arxiv.org/abs/2403.02572

4) Live crypto evidence of the paradox

Albers et al. (2025) report a negative correlation between maker fill likelihood and post-fill returns on live Binance BTC perpetual experiments, framing a practical “market maker’s dilemma.”

Implication: optimizing fill-rate alone can destroy net edge.

Source: https://arxiv.org/abs/2502.18625

5) Practical backtest support: queue model choice matters

hftbacktest explicitly exposes multiple queue-position models (risk-averse, probabilistic variants), and documents that model choice materially changes simulated fills and performance.

Implication: naive queue assumptions create fake strategy quality.

Sources:

The paradox in one equation

If the objective is only:

[ \max ; P(\text{fill in } T) ]

you often end up selecting states with poor conditional post-fill outcomes.

A safer objective is:

[ \max_a ; \mathbb{E}[\text{NetEdge}\mid a] = \mathbb{E}[\text{spread capture} - \text{markout} - \text{fees/slippage} \mid a] ]

with explicit constraints on completion risk and tail outcomes.

Practical operator playbook

1) Track both sides of the trade-off

At minimum, maintain joint dashboards for:

fill probability by horizon (e.g., 100ms/1s/5s)
post-fill markout ladder (e.g., 100ms/1s/5s/30s)
queue state bins (imbalance, depth, cancel intensity)

2) Stop using fill-rate as a standalone KPI

Use a paired KPI:

FillQuality = FillRate × (−MarkoutPenalty-adjusted edge)

or directly optimize expected net edge with q90/q95 guardrails.

3) Introduce a contrarian safety mode

When imbalance/toxicity proxies are extreme:

reduce passive posting size,
shorten quote lifetime,
prefer less adverse queue states,
allow selective crossing only when delay cost dominates.

4) Calibrate queue model against live, not only backtest

Backtest queue model should be selected by minimizing fill/markout mismatch versus live trading, not by maximizing backtest Sharpe.

5) Keep model governance simple but strict

Promote policy changes only if all hold in canary:

completion non-inferior,
q95 markout not worse,
expected net edge improved,
no reject/cancel explosion.

Minimal experiment design (for future implementation)

Partition events by queue-state deciles (imbalance + depth + cancel intensity).
For each bucket, estimate:
- fill probability by horizon,
- conditional markout distribution.
Compute Pareto frontier: high fill vs low toxicity.
Deploy policy that stays on/near frontier.
Monitor drift and retrain bucket mapping periodically.

Bottom line

The key microstructure lesson:

Fast fills are not automatically good fills.

Queue position, state-dependent flows, and toxicity make fill probability and post-fill quality a coupled control problem. The edge is not “fill more,” but “fill selectively where conditional markout is survivable.”