Market-by-Order vs Market-by-Price Queue Observability Slippage Playbook

Why this matters

A surprising amount of "slippage modeling" is really queue-observability modeling wearing a nicer shirt.

If you can see the full order book at the individual-order level, you can often track queue position directly or nearly directly. If you only see aggregated size by price level, you are no longer modeling the book itself — you are modeling an inference problem about what probably happened inside the level.

That difference is not cosmetic. It changes:

passive fill forecasts,
cancel/replace timing,
queue-value estimates,
urgency calibration,
venue portability,
and the realism of backtests that claim to simulate resting orders.

A slippage model trained or validated under MBO/L3-style visibility can look sharp in research and then quietly overpromise when deployed on MBP/L2-style data. The tax shows up as passive fills that never happen, patience that lasts too long, or catch-up aggression that arrives after queue edge has already evaporated.

Failure mode in one line

A strategy assumes order-level queue observability, but runs on aggregated depth; the resulting queue-position optimism distorts fill forecasts, passive-order value, and urgency decisions, creating systematic slippage that looks like "bad luck" instead of feed-granularity error.

The operational fact people blur together

These are not the same data regimes:

Market by Price (MBP / typical L2)

You receive updates keyed by price level:

total displayed quantity at the level,
sometimes order count,
typically only a fixed number of visible levels,
but not each individual resting order.

That means you generally cannot know exact queue position or the exact order-level composition at that price.

Market by Order (MBO / typical L3)

You receive updates keyed by individual orders:

add / modify / cancel / execution events,
order identifiers,
often enough information to reconstruct per-order priority,
and often full depth, not just a capped ladder.

That means you can often track queue position much more precisely and simulate passive fills with materially better fidelity.

This is the whole game. If your model ignores the distinction, it is treating an observability downgrade like a harmless serialization detail. It isn't.

What the feed difference changes mechanically

1) Queue position goes from observable to latent

With MBO, queue state is explicit enough to track. With MBP, queue state becomes a hidden variable.

You are no longer asking:

"How much is ahead of me?"

You are asking:

"Given that aggregate depth changed, how much of that probably happened in front of me?"

That is a different model class.

2) Cancels and down-modifies become ambiguous

If depth at your price drops by 500 shares under MBP, you do not know whether those shares disappeared:

entirely in front of you,
entirely behind you,
or partly both.

A queue estimator must assign some probability that the lost quantity was ahead of you. Bad assumptions here directly produce fill-rate optimism or pessimism.

3) Level truncation creates censoring

If your MBP feed shows only 10 levels and the market moves so your order falls outside the visible ladder, your queue estimator becomes partially blind. The order may still be live, but the price level is now out of scope.

4) Priority rules become easier to misapply

On MBO feeds, some venues expose explicit order priority or enough event detail to infer it robustly. On MBP feeds, the same venue may only expose level totals, forcing you to collapse venue-specific priority semantics into a rough heuristic.

5) Backtests get flattered

Many passive-order backtests built on MBP data end up too optimistic because they assume too much queue advancement from aggregate depth reductions. That optimism leaks into live routing, where the fills fail to arrive and the strategy pays later.

Concrete feed facts that matter

A few feed-level facts are operationally important:

CME explicitly distinguishes MBP and MBO. CME's MBO FAQ says MBP consolidates quantity at each price level, typically only for the top visible levels, while MBO provides individual orders across all price levels.
CME MBO exposes queue-tracking primitives. CME states that each order gets an anonymous OrderID and a PriorityID, and that proper queue sorting is done using PriorityID.
Nasdaq TotalView is full-depth, order-level data. Nasdaq describes TotalView as displaying every single quote and order at every price level on Nasdaq.
L3 feeds usually key events by order ID; L2 feeds key by price or price level. That keying choice is not just technical — it determines whether queue state is directly reconstructable or must be inferred.
Queue models on MBP can become censored when the level scrolls off the visible ladder. If your order falls outside MBP-10, queue estimation becomes materially weaker unless another data source restores visibility.
Modification semantics are venue-specific. Some feeds imply that size decreases retain priority while size increases lose it; some venues expose more explicit priority handling than others. A portable model cannot hand-wave this away.

The real production failure path

Step 1) Research is done with richer observability than production

Common setup:

historical MBO/L3 data in research,
queue-exact backtesting,
precise fill replay,
then live deployment on MBP or mixed-quality normalized feeds.

Step 2) A passive-order model learns too much confidence

The model learns:

accurate queue position,
accurate advancement after cancels,
realistic passive fill timing,
and a relatively precise value for resting near the front.

Step 3) Live deployment downgrades observability

Now the production system only sees:

aggregate depth changes,
partial visible ladder,
no exact order identity,
and sometimes normalized vendor semantics that flatten venue details.

Step 4) Queue advancement becomes a probabilistic guess

Each depth reduction at your level must be decomposed into:

loss ahead of you,
loss behind you,
your own execution,
or hidden state transition you never observed.

Step 5) The estimator is usually too optimistic

A naive model often assumes cancellations are uniformly distributed through the queue. Real queues are often less friendly: cancellations tend to be biased toward the back, which means aggregate depth reductions improve your true position less than the naive model hopes.

Step 6) Slippage appears somewhere else in the stack

The visible symptom is not labeled "queue observability mismatch." It appears as:

passive non-fill,
deadline catch-up,
late aggressive cleanup,
venue-selection drift,
or TCA tail damage that gets blamed on volatility.

Core model

Let:

Q_true(t): true shares ahead of our order at time t
Q_hat(t): estimated shares ahead using observed feed
D(t): displayed quantity at our price level
ΔD-(t): observed drop in displayed quantity at that level
p_front(x,t): probability a depth reduction occurred in front of us, given estimated relative queue position x
S(t): indicator that our level is still visible in the feed
F_passive(t): passive fill probability over the decision horizon
U(t): urgency chosen by the scheduler

Under MBO-like observability, Q_hat(t) can be very close to Q_true(t). Under MBP-like observability, updates look more like:

Q_hat(t+) = Q_hat(t) - p_front(x,t) * ΔD-(t) + estimation_error(t)

When the level moves out of visible range:

S(t) = 0

and queue evolution becomes censored:

Q_hat(t+) = Q_hat(t) + censoring_error(t)

Then passive-fill estimates are formed from an uncertain queue state:

F_passive(t) = f(Q_hat(t), trade_arrival_rate(t), cancel_profile(t), time_to_deadline(t), venue_state(t))

The resulting slippage tax is roughly:

IS_obs_gap ≈ queue_optimism_cost + censoring_cost + portability_cost + late_catchup_cost + venue_misranking_cost

Interpretation:

queue_optimism_cost: you think you are closer to the front than reality,
censoring_cost: your level disappears from MBP-N visibility,
portability_cost: model behavior changes when feed granularity changes,
late_catchup_cost: missed passive completion forces urgent cleanup,
venue_misranking_cost: a venue looks better simply because its feed is richer or easier to model.

Where the biggest mistakes happen

A) Train-on-MBO, deploy-on-MBP

Probably the most dangerous pattern. The research stack learns queue-exact behavior; production acts as if those same features still exist.

B) Mixing venues with different observability into one model

One venue exposes order-level adds/cancels, another only aggregates by price, and the model pretends these are feature-compatible. They are not.

C) Treating normalized vendor schemas as microstructure-equivalent

A nice normalized API can hide crucial differences in:

priority handling,
order modification semantics,
iceberg refresh behavior,
and aggressor/passive reporting.

D) Using MBP depth drops as if they imply equal queue advancement

They do not. Without a calibrated front-of-queue loss model, that assumption usually overstates passive fill probability.

E) Ignoring out-of-scope dwell time

If your order sits outside visible MBP depth for meaningful periods, queue estimates degrade sharply. Many systems silently ignore this censoring.

F) Letting fake depth ladders pass as real depth

Some consolidated or top-of-book feeds are presented in ladder form even though they are not true L2/L3 order book data. A ladder-shaped UI is not the same as decision-grade depth observability.

Feature set you actually want

Feed-granularity features

feed_granularity = L1 | MBP | MBO
visible_depth_levels
full_depth_available
per_order_id_available
explicit_priority_available
order_count_per_level_available

Queue-estimation features

queue_estimator_family
queue_ahead_estimate
queue_ahead_uncertainty
front_cancel_probability_estimate
priority_loss_on_modify_probability
out_of_scope_level_flag
out_of_scope_dwell_ms

Venue-semantics features

size_decrease_retains_priority_flag
size_increase_loses_priority_flag
iceberg_refresh_same_order_id_flag
modify_event_explicitness
native_vs_synthetic_iceberg_mode

Calibration / outcome features

estimated_fill_prob_100ms_500ms_1s
realized_fill_prob_100ms_500ms_1s
predicted_queue_ahead_vs_realized_wait
passive_fill_optimism_gap
deadline_catchup_qty
post-nonfill_markout_1s_5s_30s

Regime state machine for live execution

EXACT_QUEUE

Use when:

order-level data is available,
per-order reconstruction is healthy,
and queue confidence is high.

Actions:

allow precise passive-value estimates,
enable queue-sensitive patience,
use narrow uncertainty bands.

ESTIMATED_QUEUE

Use when:

only aggregated depth is available,
but the level remains visible and estimator quality is acceptable.

Actions:

widen fill-probability intervals,
reduce passive-size confidence,
penalize aggressive claims about queue advancement.

CENSORED_QUEUE

Use when:

the order's level is outside visible MBP depth,
or feed drops / reconstruction gaps make queue tracking unreliable.

Actions:

mark passive fill forecast as low-confidence,
suppress queue-value-based patience,
avoid pretending the order is still progressing normally.

HYBRID_MIXED

Use when:

some venues are MBO-like and others MBP-like,
or historical/live observability differs.

Actions:

route with venue-specific models,
forbid direct score comparability without observability normalization,
track per-venue calibration separately.

SAFE_DEGRADED

Trigger when:

queue uncertainty spikes,
portability gap widens,
or realized passive fills fall materially below model expectation.

Actions:

increase uncertainty penalty,
cap resting horizon,
shorten re-evaluation loop,
switch to safer execution logic rather than doubling down on false patience.

Control rules that save money

1) Make feed granularity a first-class feature

Do not bury MBO vs MBP differences in comments or config lore. The model should explicitly know what observability regime it is operating in.

2) Never share one passive fill model across exact and inferred queue regimes

At minimum, use separate calibration layers for:

exact queue tracking,
estimated queue tracking,
and censored/out-of-scope queue states.

3) Learn a front-of-queue cancellation model instead of assuming uniform cancels

Depth reductions are not equally likely to occur in front of you. A naive uniform assumption is often too optimistic.

4) Treat out-of-scope levels as censoring, not as "nothing happened"

If the level leaves MBP-N visibility, uncertainty should jump immediately. Silence is not stability.

5) Penalize venue scores for observability downgrade

A venue should not look superior just because its feed makes passive fills easier to simulate. Separate true venue quality from measurement quality.

6) Backtest portability explicitly

Take MBO ground truth, degrade it to MBP-N, rerun the same strategy, and measure the policy drift. If live production resembles the degraded case, that is the benchmark that matters.

7) Keep venue-specific priority semantics out of generic folklore

Whether size decreases retain priority, whether increases lose it, how refresh behaves, and how icebergs appear are venue-specific facts, not universal truths.

TCA / KPI layer

Track these directly:

PFOG — Passive Fill Optimism Gap
- predicted passive fill rate minus realized passive fill rate
QEU — Queue Estimate Uncertainty
- confidence interval width on estimated shares-ahead
FCAE — Front-Cancel Allocation Error
- gap between estimated and reconstructed fraction of depth loss that occurred ahead
OSD — Out-of-Scope Dwell
- time an active passive order's level spent outside visible MBP depth
GDS — Granularity Downgrade Share
- fraction of decisions made under lower observability than research-grade data
VPR — Venue Portability Residual
- score drift between venue models after controlling for spread/depth/toxicity
LCC — Late Catch-Up Cost
- bps paid because passive completion was overestimated earlier
QVM — Queue-Value Mispricing
- cost of waiting too long relative to a more honest uncertainty-aware benchmark

Segment these by:

venue,
feed type,
visible depth count,
symbol liquidity bucket,
tactic,
time-to-deadline,
and whether the order ever became out-of-scope.

Validation design that actually works

A) Ground-truth degradation test

Use MBO data as truth. Then intentionally degrade it into:

MBP-50,
MBP-10,
top-of-book-only,
and mixed-venue normalized variants.

Measure how much:

queue estimates drift,
passive fill forecasts worsen,
and policy choices change.

B) Counterfactual queue-estimator benchmark

Compare multiple p_front(x) families:

uniform,
back-biased monotone,
symbol/venue-conditioned,
and regime-conditioned models.

Select on fill forecast calibration and TCA tail stability, not on aesthetic simplicity.

C) Censoring stress test

Replay days where your orders frequently fall outside the visible MBP ladder. If the model still claims narrow confidence, it is lying.

D) Venue-portability audit

Take the same strategy and scorecard it separately under:

native MBO venue,
native MBP venue,
and normalized mixed feed.

If the venue ranking flips after observability control, you found measurement leakage.

Anti-patterns

"L2 is basically L3 if you're clever enough." No. Clever inference is still inference.
One universal queue model for every venue. Convenient, but wrong.
Using only point estimates for passive fills. Queue inference needs uncertainty, not just a single number.
Ignoring out-of-scope states. If your order vanishes from the visible ladder, you did not preserve observability by optimism.
Treating normalized depth schemas as microstructure truth. Normalization can erase the exact semantics your model needed.
Promoting backtests without a degraded-observability replay. If research visibility is better than live visibility, the backtest is not deployment-grade.

Implementation sketch

A robust production stack usually needs:

feed-capability registry
- per venue/data source: MBO vs MBP, visible depth, priority semantics, modify rules
queue engine with confidence state
- exact / estimated / censored modes
venue-specific cancel-front model
- calibrated from richer historical data where possible
policy layer that consumes uncertainty, not just mean fill probability
- patience should shrink when observability degrades
degraded replay harness
- MBO truth degraded to MBP variants for portability testing
TCA attribution bucket for observability error
- otherwise the desk mislabels the loss as generic market impact

Bottom line

A lot of passive slippage pain is not about spread, impact, or alpha decay first. It is about pretending you know where you are in the queue when the feed never actually told you.

MBO and MBP are different execution realities:

one gives you order-level queue observability,
the other gives you aggregated depth and forces inference,
and the gap between them compounds exactly where passive execution quality matters most.

If you train on order-level truth and deploy on price-level aggregates without explicitly pricing the observability downgrade, your slippage model will be overconfident by construction.

The fix is not mystical. Treat queue observability as a model input, queue position as an uncertainty distribution, and feed portability as a first-class slippage risk.

References

CME Group — Market by Order (MBO) FAQ (MBP vs MBO, queue position, OrderID/PriorityID, full-depth vs aggregated depth): https://www.cmegroup.com/articles/faqs/market-by-order-mbo.html
CME Group — Book Management Messages — Market by Order (MBO): https://www.cmegroup.com/tools-information/webhelp/acp-brokertec-chicago-mdp-price-precision/Content/book-management-messages-mbo.html
Nasdaq — Nasdaq TotalView (full order book depth; every single quote and order at every price level): https://www.nasdaq.com/solutions/data/equities/nasdaq-totalview
Databento — What is level 2 (L2) market data? (aggregated depth / market-by-price framing): https://databento.com/microstructure/level-2-market-data
Databento — What is level 3 (L3) market data? (order-level feed keyed by order ID; queue-position implications): https://databento.com/microstructure/level-3-market-data
Databento — Getting queue position from L2 and order book data (MBP queue estimation, back-of-queue cancel bias, out-of-scope depth censoring): https://databento.com/blog/getting-queue-position-from-l2-and-order-book-data
Moallemi, C. C., Yuan, K. — A Model for Queue Position Valuation in a Limit Order Book (queue position has economic value and should be modeled explicitly): https://ssrn.com/abstract=2996221