The Fill-Probability Trap: Why Easy Fills Can Be Bad Fills
Date: 2026-03-08
Category: explore (market microstructure)
Why this is interesting
A lot of execution logic still assumes:
- high fill probability = good
- low fill probability = bad
But in modern limit order books, this can fail hard. Sometimes the orders that fill fastest are exactly the ones with the worst post-fill markout.
This note explores that paradox and how to operationalize it.
Core intuition: queue mechanics create a hidden trade-off
At the touch (best bid/ask), your passive order quality depends on two competing effects:
- Execution likelihood (you want this high)
- Post-fill return quality / toxicity (you want this not-too-negative)
In stressed or one-sided books, these often move in opposite directions.
If your quote is very likely to be hit, it may be because informed/toxic flow is about to run through that level.
Evidence stack (papers + practitioner tooling)
1) Order-flow imbalance drives short-horizon price moves
Cont, Kukanov, and Stoikov (2011/2014) show short-horizon price changes are strongly linked to order-flow imbalance (OFI) and approximately linear in OFI, with slope tied to depth.
Implication: your fill probability is not independent of future markout; it co-moves with local supply/demand pressure.
Source: https://arxiv.org/abs/1011.6402
2) Queue-reactive dynamics: state matters, not just time averages
Huang, Lehalle, Rosenbaum model the book as a state-dependent queuing system (queue-reactive model), where order-flow intensities depend on current queue state.
Implication: fill outcomes should be conditioned on queue state, not global unconditional rates.
Source: https://arxiv.org/abs/1312.0563
3) Fill probabilities can be modeled semi-analytically under state-dependent flows
Yu et al. derive tractable expressions for fill probabilities at best and deeper levels in state-dependent stochastic LOB models, validated on FX data.
Implication: “probability of fill by horizon T” can be treated as a first-class model output in execution control.
Source: https://arxiv.org/abs/2403.02572
4) Live crypto evidence of the paradox
Albers et al. (2025) report a negative correlation between maker fill likelihood and post-fill returns on live Binance BTC perpetual experiments, framing a practical “market maker’s dilemma.”
Implication: optimizing fill-rate alone can destroy net edge.
Source: https://arxiv.org/abs/2502.18625
5) Practical backtest support: queue model choice matters
hftbacktest explicitly exposes multiple queue-position models (risk-averse, probabilistic variants), and documents that model choice materially changes simulated fills and performance.
Implication: naive queue assumptions create fake strategy quality.
Sources:
- https://hftbacktest.readthedocs.io/en/latest/tutorials/Probability%20Queue%20Models.html
- https://hftbacktest.readthedocs.io/en/v1.8.4/reference/queue_models.html
The paradox in one equation
If the objective is only:
[ \max ; P(\text{fill in } T) ]
you often end up selecting states with poor conditional post-fill outcomes.
A safer objective is:
[ \max_a ; \mathbb{E}[\text{NetEdge}\mid a] = \mathbb{E}[\text{spread capture} - \text{markout} - \text{fees/slippage} \mid a] ]
with explicit constraints on completion risk and tail outcomes.
Practical operator playbook
1) Track both sides of the trade-off
At minimum, maintain joint dashboards for:
- fill probability by horizon (e.g., 100ms/1s/5s)
- post-fill markout ladder (e.g., 100ms/1s/5s/30s)
- queue state bins (imbalance, depth, cancel intensity)
2) Stop using fill-rate as a standalone KPI
Use a paired KPI:
FillQuality = FillRate × (−MarkoutPenalty-adjusted edge)
or directly optimize expected net edge with q90/q95 guardrails.
3) Introduce a contrarian safety mode
When imbalance/toxicity proxies are extreme:
- reduce passive posting size,
- shorten quote lifetime,
- prefer less adverse queue states,
- allow selective crossing only when delay cost dominates.
4) Calibrate queue model against live, not only backtest
Backtest queue model should be selected by minimizing fill/markout mismatch versus live trading, not by maximizing backtest Sharpe.
5) Keep model governance simple but strict
Promote policy changes only if all hold in canary:
- completion non-inferior,
- q95 markout not worse,
- expected net edge improved,
- no reject/cancel explosion.
Minimal experiment design (for future implementation)
- Partition events by queue-state deciles (imbalance + depth + cancel intensity).
- For each bucket, estimate:
- fill probability by horizon,
- conditional markout distribution.
- Compute Pareto frontier: high fill vs low toxicity.
- Deploy policy that stays on/near frontier.
- Monitor drift and retrain bucket mapping periodically.
Bottom line
The key microstructure lesson:
Fast fills are not automatically good fills.
Queue position, state-dependent flows, and toxicity make fill probability and post-fill quality a coupled control problem. The edge is not “fill more,” but “fill selectively where conditional markout is survivable.”