Market-by-Order vs Market-by-Price Queue Observability Slippage Playbook

2026-04-11 ยท finance

Market-by-Order vs Market-by-Price Queue Observability Slippage Playbook

Why this matters

A surprising amount of "slippage modeling" is really queue-observability modeling wearing a nicer shirt.

If you can see the full order book at the individual-order level, you can often track queue position directly or nearly directly. If you only see aggregated size by price level, you are no longer modeling the book itself โ€” you are modeling an inference problem about what probably happened inside the level.

That difference is not cosmetic. It changes:

A slippage model trained or validated under MBO/L3-style visibility can look sharp in research and then quietly overpromise when deployed on MBP/L2-style data. The tax shows up as passive fills that never happen, patience that lasts too long, or catch-up aggression that arrives after queue edge has already evaporated.


Failure mode in one line

A strategy assumes order-level queue observability, but runs on aggregated depth; the resulting queue-position optimism distorts fill forecasts, passive-order value, and urgency decisions, creating systematic slippage that looks like "bad luck" instead of feed-granularity error.


The operational fact people blur together

These are not the same data regimes:

Market by Price (MBP / typical L2)

You receive updates keyed by price level:

That means you generally cannot know exact queue position or the exact order-level composition at that price.

Market by Order (MBO / typical L3)

You receive updates keyed by individual orders:

That means you can often track queue position much more precisely and simulate passive fills with materially better fidelity.

This is the whole game. If your model ignores the distinction, it is treating an observability downgrade like a harmless serialization detail. It isn't.


What the feed difference changes mechanically

1) Queue position goes from observable to latent

With MBO, queue state is explicit enough to track. With MBP, queue state becomes a hidden variable.

You are no longer asking:

"How much is ahead of me?"

You are asking:

"Given that aggregate depth changed, how much of that probably happened in front of me?"

That is a different model class.

2) Cancels and down-modifies become ambiguous

If depth at your price drops by 500 shares under MBP, you do not know whether those shares disappeared:

A queue estimator must assign some probability that the lost quantity was ahead of you. Bad assumptions here directly produce fill-rate optimism or pessimism.

3) Level truncation creates censoring

If your MBP feed shows only 10 levels and the market moves so your order falls outside the visible ladder, your queue estimator becomes partially blind. The order may still be live, but the price level is now out of scope.

4) Priority rules become easier to misapply

On MBO feeds, some venues expose explicit order priority or enough event detail to infer it robustly. On MBP feeds, the same venue may only expose level totals, forcing you to collapse venue-specific priority semantics into a rough heuristic.

5) Backtests get flattered

Many passive-order backtests built on MBP data end up too optimistic because they assume too much queue advancement from aggregate depth reductions. That optimism leaks into live routing, where the fills fail to arrive and the strategy pays later.


Concrete feed facts that matter

A few feed-level facts are operationally important:

  1. CME explicitly distinguishes MBP and MBO. CME's MBO FAQ says MBP consolidates quantity at each price level, typically only for the top visible levels, while MBO provides individual orders across all price levels.

  2. CME MBO exposes queue-tracking primitives. CME states that each order gets an anonymous OrderID and a PriorityID, and that proper queue sorting is done using PriorityID.

  3. Nasdaq TotalView is full-depth, order-level data. Nasdaq describes TotalView as displaying every single quote and order at every price level on Nasdaq.

  4. L3 feeds usually key events by order ID; L2 feeds key by price or price level. That keying choice is not just technical โ€” it determines whether queue state is directly reconstructable or must be inferred.

  5. Queue models on MBP can become censored when the level scrolls off the visible ladder. If your order falls outside MBP-10, queue estimation becomes materially weaker unless another data source restores visibility.

  6. Modification semantics are venue-specific. Some feeds imply that size decreases retain priority while size increases lose it; some venues expose more explicit priority handling than others. A portable model cannot hand-wave this away.


The real production failure path

Step 1) Research is done with richer observability than production

Common setup:

Step 2) A passive-order model learns too much confidence

The model learns:

Step 3) Live deployment downgrades observability

Now the production system only sees:

Step 4) Queue advancement becomes a probabilistic guess

Each depth reduction at your level must be decomposed into:

Step 5) The estimator is usually too optimistic

A naive model often assumes cancellations are uniformly distributed through the queue. Real queues are often less friendly: cancellations tend to be biased toward the back, which means aggregate depth reductions improve your true position less than the naive model hopes.

Step 6) Slippage appears somewhere else in the stack

The visible symptom is not labeled "queue observability mismatch." It appears as:


Core model

Let:

Under MBO-like observability, Q_hat(t) can be very close to Q_true(t). Under MBP-like observability, updates look more like:

Q_hat(t+) = Q_hat(t) - p_front(x,t) * ฮ”D-(t) + estimation_error(t)

When the level moves out of visible range:

S(t) = 0

and queue evolution becomes censored:

Q_hat(t+) = Q_hat(t) + censoring_error(t)

Then passive-fill estimates are formed from an uncertain queue state:

F_passive(t) = f(Q_hat(t), trade_arrival_rate(t), cancel_profile(t), time_to_deadline(t), venue_state(t))

The resulting slippage tax is roughly:

IS_obs_gap โ‰ˆ queue_optimism_cost + censoring_cost + portability_cost + late_catchup_cost + venue_misranking_cost

Interpretation:


Where the biggest mistakes happen

A) Train-on-MBO, deploy-on-MBP

Probably the most dangerous pattern. The research stack learns queue-exact behavior; production acts as if those same features still exist.

B) Mixing venues with different observability into one model

One venue exposes order-level adds/cancels, another only aggregates by price, and the model pretends these are feature-compatible. They are not.

C) Treating normalized vendor schemas as microstructure-equivalent

A nice normalized API can hide crucial differences in:

D) Using MBP depth drops as if they imply equal queue advancement

They do not. Without a calibrated front-of-queue loss model, that assumption usually overstates passive fill probability.

E) Ignoring out-of-scope dwell time

If your order sits outside visible MBP depth for meaningful periods, queue estimates degrade sharply. Many systems silently ignore this censoring.

F) Letting fake depth ladders pass as real depth

Some consolidated or top-of-book feeds are presented in ladder form even though they are not true L2/L3 order book data. A ladder-shaped UI is not the same as decision-grade depth observability.


Feature set you actually want

Feed-granularity features

Queue-estimation features

Venue-semantics features

Calibration / outcome features


Regime state machine for live execution

EXACT_QUEUE

Use when:

Actions:

ESTIMATED_QUEUE

Use when:

Actions:

CENSORED_QUEUE

Use when:

Actions:

HYBRID_MIXED

Use when:

Actions:

SAFE_DEGRADED

Trigger when:

Actions:


Control rules that save money

1) Make feed granularity a first-class feature

Do not bury MBO vs MBP differences in comments or config lore. The model should explicitly know what observability regime it is operating in.

2) Never share one passive fill model across exact and inferred queue regimes

At minimum, use separate calibration layers for:

3) Learn a front-of-queue cancellation model instead of assuming uniform cancels

Depth reductions are not equally likely to occur in front of you. A naive uniform assumption is often too optimistic.

4) Treat out-of-scope levels as censoring, not as "nothing happened"

If the level leaves MBP-N visibility, uncertainty should jump immediately. Silence is not stability.

5) Penalize venue scores for observability downgrade

A venue should not look superior just because its feed makes passive fills easier to simulate. Separate true venue quality from measurement quality.

6) Backtest portability explicitly

Take MBO ground truth, degrade it to MBP-N, rerun the same strategy, and measure the policy drift. If live production resembles the degraded case, that is the benchmark that matters.

7) Keep venue-specific priority semantics out of generic folklore

Whether size decreases retain priority, whether increases lose it, how refresh behaves, and how icebergs appear are venue-specific facts, not universal truths.


TCA / KPI layer

Track these directly:

Segment these by:


Validation design that actually works

A) Ground-truth degradation test

Use MBO data as truth. Then intentionally degrade it into:

Measure how much:

B) Counterfactual queue-estimator benchmark

Compare multiple p_front(x) families:

Select on fill forecast calibration and TCA tail stability, not on aesthetic simplicity.

C) Censoring stress test

Replay days where your orders frequently fall outside the visible MBP ladder. If the model still claims narrow confidence, it is lying.

D) Venue-portability audit

Take the same strategy and scorecard it separately under:

If the venue ranking flips after observability control, you found measurement leakage.


Anti-patterns


Implementation sketch

A robust production stack usually needs:

  1. feed-capability registry

    • per venue/data source: MBO vs MBP, visible depth, priority semantics, modify rules
  2. queue engine with confidence state

    • exact / estimated / censored modes
  3. venue-specific cancel-front model

    • calibrated from richer historical data where possible
  4. policy layer that consumes uncertainty, not just mean fill probability

    • patience should shrink when observability degrades
  5. degraded replay harness

    • MBO truth degraded to MBP variants for portability testing
  6. TCA attribution bucket for observability error

    • otherwise the desk mislabels the loss as generic market impact

Bottom line

A lot of passive slippage pain is not about spread, impact, or alpha decay first. It is about pretending you know where you are in the queue when the feed never actually told you.

MBO and MBP are different execution realities:

If you train on order-level truth and deploy on price-level aggregates without explicitly pricing the observability downgrade, your slippage model will be overconfident by construction.

The fix is not mystical. Treat queue observability as a model input, queue position as an uncertainty distribution, and feed portability as a first-class slippage risk.


References