Market-by-Order vs Market-by-Price Queue Observability Slippage Playbook
Why this matters
A surprising amount of "slippage modeling" is really queue-observability modeling wearing a nicer shirt.
If you can see the full order book at the individual-order level, you can often track queue position directly or nearly directly. If you only see aggregated size by price level, you are no longer modeling the book itself โ you are modeling an inference problem about what probably happened inside the level.
That difference is not cosmetic. It changes:
- passive fill forecasts,
- cancel/replace timing,
- queue-value estimates,
- urgency calibration,
- venue portability,
- and the realism of backtests that claim to simulate resting orders.
A slippage model trained or validated under MBO/L3-style visibility can look sharp in research and then quietly overpromise when deployed on MBP/L2-style data. The tax shows up as passive fills that never happen, patience that lasts too long, or catch-up aggression that arrives after queue edge has already evaporated.
Failure mode in one line
A strategy assumes order-level queue observability, but runs on aggregated depth; the resulting queue-position optimism distorts fill forecasts, passive-order value, and urgency decisions, creating systematic slippage that looks like "bad luck" instead of feed-granularity error.
The operational fact people blur together
These are not the same data regimes:
Market by Price (MBP / typical L2)
You receive updates keyed by price level:
- total displayed quantity at the level,
- sometimes order count,
- typically only a fixed number of visible levels,
- but not each individual resting order.
That means you generally cannot know exact queue position or the exact order-level composition at that price.
Market by Order (MBO / typical L3)
You receive updates keyed by individual orders:
- add / modify / cancel / execution events,
- order identifiers,
- often enough information to reconstruct per-order priority,
- and often full depth, not just a capped ladder.
That means you can often track queue position much more precisely and simulate passive fills with materially better fidelity.
This is the whole game. If your model ignores the distinction, it is treating an observability downgrade like a harmless serialization detail. It isn't.
What the feed difference changes mechanically
1) Queue position goes from observable to latent
With MBO, queue state is explicit enough to track. With MBP, queue state becomes a hidden variable.
You are no longer asking:
"How much is ahead of me?"
You are asking:
"Given that aggregate depth changed, how much of that probably happened in front of me?"
That is a different model class.
2) Cancels and down-modifies become ambiguous
If depth at your price drops by 500 shares under MBP, you do not know whether those shares disappeared:
- entirely in front of you,
- entirely behind you,
- or partly both.
A queue estimator must assign some probability that the lost quantity was ahead of you. Bad assumptions here directly produce fill-rate optimism or pessimism.
3) Level truncation creates censoring
If your MBP feed shows only 10 levels and the market moves so your order falls outside the visible ladder, your queue estimator becomes partially blind. The order may still be live, but the price level is now out of scope.
4) Priority rules become easier to misapply
On MBO feeds, some venues expose explicit order priority or enough event detail to infer it robustly. On MBP feeds, the same venue may only expose level totals, forcing you to collapse venue-specific priority semantics into a rough heuristic.
5) Backtests get flattered
Many passive-order backtests built on MBP data end up too optimistic because they assume too much queue advancement from aggregate depth reductions. That optimism leaks into live routing, where the fills fail to arrive and the strategy pays later.
Concrete feed facts that matter
A few feed-level facts are operationally important:
CME explicitly distinguishes MBP and MBO. CME's MBO FAQ says MBP consolidates quantity at each price level, typically only for the top visible levels, while MBO provides individual orders across all price levels.
CME MBO exposes queue-tracking primitives. CME states that each order gets an anonymous
OrderIDand aPriorityID, and that proper queue sorting is done usingPriorityID.Nasdaq TotalView is full-depth, order-level data. Nasdaq describes TotalView as displaying every single quote and order at every price level on Nasdaq.
L3 feeds usually key events by order ID; L2 feeds key by price or price level. That keying choice is not just technical โ it determines whether queue state is directly reconstructable or must be inferred.
Queue models on MBP can become censored when the level scrolls off the visible ladder. If your order falls outside MBP-10, queue estimation becomes materially weaker unless another data source restores visibility.
Modification semantics are venue-specific. Some feeds imply that size decreases retain priority while size increases lose it; some venues expose more explicit priority handling than others. A portable model cannot hand-wave this away.
The real production failure path
Step 1) Research is done with richer observability than production
Common setup:
- historical MBO/L3 data in research,
- queue-exact backtesting,
- precise fill replay,
- then live deployment on MBP or mixed-quality normalized feeds.
Step 2) A passive-order model learns too much confidence
The model learns:
- accurate queue position,
- accurate advancement after cancels,
- realistic passive fill timing,
- and a relatively precise value for resting near the front.
Step 3) Live deployment downgrades observability
Now the production system only sees:
- aggregate depth changes,
- partial visible ladder,
- no exact order identity,
- and sometimes normalized vendor semantics that flatten venue details.
Step 4) Queue advancement becomes a probabilistic guess
Each depth reduction at your level must be decomposed into:
- loss ahead of you,
- loss behind you,
- your own execution,
- or hidden state transition you never observed.
Step 5) The estimator is usually too optimistic
A naive model often assumes cancellations are uniformly distributed through the queue. Real queues are often less friendly: cancellations tend to be biased toward the back, which means aggregate depth reductions improve your true position less than the naive model hopes.
Step 6) Slippage appears somewhere else in the stack
The visible symptom is not labeled "queue observability mismatch." It appears as:
- passive non-fill,
- deadline catch-up,
- late aggressive cleanup,
- venue-selection drift,
- or TCA tail damage that gets blamed on volatility.
Core model
Let:
Q_true(t): true shares ahead of our order at timetQ_hat(t): estimated shares ahead using observed feedD(t): displayed quantity at our price levelฮD-(t): observed drop in displayed quantity at that levelp_front(x,t): probability a depth reduction occurred in front of us, given estimated relative queue positionxS(t): indicator that our level is still visible in the feedF_passive(t): passive fill probability over the decision horizonU(t): urgency chosen by the scheduler
Under MBO-like observability, Q_hat(t) can be very close to Q_true(t).
Under MBP-like observability, updates look more like:
Q_hat(t+) = Q_hat(t) - p_front(x,t) * ฮD-(t) + estimation_error(t)
When the level moves out of visible range:
S(t) = 0
and queue evolution becomes censored:
Q_hat(t+) = Q_hat(t) + censoring_error(t)
Then passive-fill estimates are formed from an uncertain queue state:
F_passive(t) = f(Q_hat(t), trade_arrival_rate(t), cancel_profile(t), time_to_deadline(t), venue_state(t))
The resulting slippage tax is roughly:
IS_obs_gap โ queue_optimism_cost + censoring_cost + portability_cost + late_catchup_cost + venue_misranking_cost
Interpretation:
- queue_optimism_cost: you think you are closer to the front than reality,
- censoring_cost: your level disappears from MBP-N visibility,
- portability_cost: model behavior changes when feed granularity changes,
- late_catchup_cost: missed passive completion forces urgent cleanup,
- venue_misranking_cost: a venue looks better simply because its feed is richer or easier to model.
Where the biggest mistakes happen
A) Train-on-MBO, deploy-on-MBP
Probably the most dangerous pattern. The research stack learns queue-exact behavior; production acts as if those same features still exist.
B) Mixing venues with different observability into one model
One venue exposes order-level adds/cancels, another only aggregates by price, and the model pretends these are feature-compatible. They are not.
C) Treating normalized vendor schemas as microstructure-equivalent
A nice normalized API can hide crucial differences in:
- priority handling,
- order modification semantics,
- iceberg refresh behavior,
- and aggressor/passive reporting.
D) Using MBP depth drops as if they imply equal queue advancement
They do not. Without a calibrated front-of-queue loss model, that assumption usually overstates passive fill probability.
E) Ignoring out-of-scope dwell time
If your order sits outside visible MBP depth for meaningful periods, queue estimates degrade sharply. Many systems silently ignore this censoring.
F) Letting fake depth ladders pass as real depth
Some consolidated or top-of-book feeds are presented in ladder form even though they are not true L2/L3 order book data. A ladder-shaped UI is not the same as decision-grade depth observability.
Feature set you actually want
Feed-granularity features
feed_granularity=L1 | MBP | MBOvisible_depth_levelsfull_depth_availableper_order_id_availableexplicit_priority_availableorder_count_per_level_available
Queue-estimation features
queue_estimator_familyqueue_ahead_estimatequeue_ahead_uncertaintyfront_cancel_probability_estimatepriority_loss_on_modify_probabilityout_of_scope_level_flagout_of_scope_dwell_ms
Venue-semantics features
size_decrease_retains_priority_flagsize_increase_loses_priority_flagiceberg_refresh_same_order_id_flagmodify_event_explicitnessnative_vs_synthetic_iceberg_mode
Calibration / outcome features
estimated_fill_prob_100ms_500ms_1srealized_fill_prob_100ms_500ms_1spredicted_queue_ahead_vs_realized_waitpassive_fill_optimism_gapdeadline_catchup_qtypost-nonfill_markout_1s_5s_30s
Regime state machine for live execution
EXACT_QUEUE
Use when:
- order-level data is available,
- per-order reconstruction is healthy,
- and queue confidence is high.
Actions:
- allow precise passive-value estimates,
- enable queue-sensitive patience,
- use narrow uncertainty bands.
ESTIMATED_QUEUE
Use when:
- only aggregated depth is available,
- but the level remains visible and estimator quality is acceptable.
Actions:
- widen fill-probability intervals,
- reduce passive-size confidence,
- penalize aggressive claims about queue advancement.
CENSORED_QUEUE
Use when:
- the order's level is outside visible MBP depth,
- or feed drops / reconstruction gaps make queue tracking unreliable.
Actions:
- mark passive fill forecast as low-confidence,
- suppress queue-value-based patience,
- avoid pretending the order is still progressing normally.
HYBRID_MIXED
Use when:
- some venues are MBO-like and others MBP-like,
- or historical/live observability differs.
Actions:
- route with venue-specific models,
- forbid direct score comparability without observability normalization,
- track per-venue calibration separately.
SAFE_DEGRADED
Trigger when:
- queue uncertainty spikes,
- portability gap widens,
- or realized passive fills fall materially below model expectation.
Actions:
- increase uncertainty penalty,
- cap resting horizon,
- shorten re-evaluation loop,
- switch to safer execution logic rather than doubling down on false patience.
Control rules that save money
1) Make feed granularity a first-class feature
Do not bury MBO vs MBP differences in comments or config lore. The model should explicitly know what observability regime it is operating in.
2) Never share one passive fill model across exact and inferred queue regimes
At minimum, use separate calibration layers for:
- exact queue tracking,
- estimated queue tracking,
- and censored/out-of-scope queue states.
3) Learn a front-of-queue cancellation model instead of assuming uniform cancels
Depth reductions are not equally likely to occur in front of you. A naive uniform assumption is often too optimistic.
4) Treat out-of-scope levels as censoring, not as "nothing happened"
If the level leaves MBP-N visibility, uncertainty should jump immediately. Silence is not stability.
5) Penalize venue scores for observability downgrade
A venue should not look superior just because its feed makes passive fills easier to simulate. Separate true venue quality from measurement quality.
6) Backtest portability explicitly
Take MBO ground truth, degrade it to MBP-N, rerun the same strategy, and measure the policy drift. If live production resembles the degraded case, that is the benchmark that matters.
7) Keep venue-specific priority semantics out of generic folklore
Whether size decreases retain priority, whether increases lose it, how refresh behaves, and how icebergs appear are venue-specific facts, not universal truths.
TCA / KPI layer
Track these directly:
PFOG โ Passive Fill Optimism Gap
- predicted passive fill rate minus realized passive fill rate
QEU โ Queue Estimate Uncertainty
- confidence interval width on estimated shares-ahead
FCAE โ Front-Cancel Allocation Error
- gap between estimated and reconstructed fraction of depth loss that occurred ahead
OSD โ Out-of-Scope Dwell
- time an active passive order's level spent outside visible MBP depth
GDS โ Granularity Downgrade Share
- fraction of decisions made under lower observability than research-grade data
VPR โ Venue Portability Residual
- score drift between venue models after controlling for spread/depth/toxicity
LCC โ Late Catch-Up Cost
- bps paid because passive completion was overestimated earlier
QVM โ Queue-Value Mispricing
- cost of waiting too long relative to a more honest uncertainty-aware benchmark
Segment these by:
- venue,
- feed type,
- visible depth count,
- symbol liquidity bucket,
- tactic,
- time-to-deadline,
- and whether the order ever became out-of-scope.
Validation design that actually works
A) Ground-truth degradation test
Use MBO data as truth. Then intentionally degrade it into:
- MBP-50,
- MBP-10,
- top-of-book-only,
- and mixed-venue normalized variants.
Measure how much:
- queue estimates drift,
- passive fill forecasts worsen,
- and policy choices change.
B) Counterfactual queue-estimator benchmark
Compare multiple p_front(x) families:
- uniform,
- back-biased monotone,
- symbol/venue-conditioned,
- and regime-conditioned models.
Select on fill forecast calibration and TCA tail stability, not on aesthetic simplicity.
C) Censoring stress test
Replay days where your orders frequently fall outside the visible MBP ladder. If the model still claims narrow confidence, it is lying.
D) Venue-portability audit
Take the same strategy and scorecard it separately under:
- native MBO venue,
- native MBP venue,
- and normalized mixed feed.
If the venue ranking flips after observability control, you found measurement leakage.
Anti-patterns
"L2 is basically L3 if you're clever enough." No. Clever inference is still inference.
One universal queue model for every venue. Convenient, but wrong.
Using only point estimates for passive fills. Queue inference needs uncertainty, not just a single number.
Ignoring out-of-scope states. If your order vanishes from the visible ladder, you did not preserve observability by optimism.
Treating normalized depth schemas as microstructure truth. Normalization can erase the exact semantics your model needed.
Promoting backtests without a degraded-observability replay. If research visibility is better than live visibility, the backtest is not deployment-grade.
Implementation sketch
A robust production stack usually needs:
feed-capability registry
- per venue/data source: MBO vs MBP, visible depth, priority semantics, modify rules
queue engine with confidence state
- exact / estimated / censored modes
venue-specific cancel-front model
- calibrated from richer historical data where possible
policy layer that consumes uncertainty, not just mean fill probability
- patience should shrink when observability degrades
degraded replay harness
- MBO truth degraded to MBP variants for portability testing
TCA attribution bucket for observability error
- otherwise the desk mislabels the loss as generic market impact
Bottom line
A lot of passive slippage pain is not about spread, impact, or alpha decay first. It is about pretending you know where you are in the queue when the feed never actually told you.
MBO and MBP are different execution realities:
- one gives you order-level queue observability,
- the other gives you aggregated depth and forces inference,
- and the gap between them compounds exactly where passive execution quality matters most.
If you train on order-level truth and deploy on price-level aggregates without explicitly pricing the observability downgrade, your slippage model will be overconfident by construction.
The fix is not mystical. Treat queue observability as a model input, queue position as an uncertainty distribution, and feed portability as a first-class slippage risk.
References
- CME Group โ Market by Order (MBO) FAQ (MBP vs MBO, queue position, OrderID/PriorityID, full-depth vs aggregated depth): https://www.cmegroup.com/articles/faqs/market-by-order-mbo.html
- CME Group โ Book Management Messages โ Market by Order (MBO): https://www.cmegroup.com/tools-information/webhelp/acp-brokertec-chicago-mdp-price-precision/Content/book-management-messages-mbo.html
- Nasdaq โ Nasdaq TotalView (full order book depth; every single quote and order at every price level): https://www.nasdaq.com/solutions/data/equities/nasdaq-totalview
- Databento โ What is level 2 (L2) market data? (aggregated depth / market-by-price framing): https://databento.com/microstructure/level-2-market-data
- Databento โ What is level 3 (L3) market data? (order-level feed keyed by order ID; queue-position implications): https://databento.com/microstructure/level-3-market-data
- Databento โ Getting queue position from L2 and order book data (MBP queue estimation, back-of-queue cancel bias, out-of-scope depth censoring): https://databento.com/blog/getting-queue-position-from-l2-and-order-book-data
- Moallemi, C. C., Yuan, K. โ A Model for Queue Position Valuation in a Limit Order Book (queue position has economic value and should be modeled explicitly): https://ssrn.com/abstract=2996221