Microprice + Order-Book Imbalance: A Practical Modeling Playbook

If you only look at mid-price, you miss the queue pressure that often drives the next short-horizon move.

For intraday execution and market making, microprice + imbalance features are a compact, high-signal baseline before going full deep-learning.

One-Line Intuition

Mid-price tells you where price is; microprice + imbalance tell you where price pressure is.

Core Definitions You Actually Need

Let:

Best bid price/size: ((P_b, Q_b))
Best ask price/size: ((P_a, Q_a))
Mid-price: (M = (P_b + P_a)/2)
Spread: (S = P_a - P_b)

1) Top-of-book imbalance

Two common forms:

[ I_t^{(ratio)} = \frac{Q_b}{Q_b + Q_a}, \qquad I_t^{(signed)} = \frac{Q_b - Q_a}{Q_b + Q_a} ]

Interpretation:

Higher bid-side size (relative to ask) -> upward short-horizon pressure
Higher ask-side size -> downward pressure

2) Classic level-1 microprice

A common weighted estimator:

[ \mu_t = \frac{P_a Q_b + P_b Q_a}{Q_b + Q_a} ]

Equivalent view:

[ \mu_t = M_t + \frac{S_t}{2} \cdot I_t^{(signed)} ]

So microprice is mid-price adjusted by spread and queue imbalance.

3) Order-flow imbalance (OFI)

At event level, track queue changes (new/cancel/execute) rather than static snapshot only:

[ \text{OFI}_{[t,t+\Delta]} = \sum e_n ]

where (e_n) is signed contribution from best bid/ask queue updates. In practice this often explains immediate (\Delta P) better than plain traded volume.

Empirical Facts Worth Building Around

Short-horizon price change is strongly linked to order-flow imbalance (linear first-order approximation is surprisingly useful).
Depth matters as a scaling denominator: same OFI has smaller impact in deeper books.
Volume alone is noisier than queue-aware flow metrics at short horizons.
State dependence is real: spread regime, queue shape, and event intensity alter the mapping from imbalance to future move.

Practical Feature Set (Strong Baseline)

For horizons like 100ms / 500ms / 1s:

L1 snapshot: (S_t, I_t^{(signed)}, \mu_t - M_t)
Multi-level imbalance (L1~L5 or L10)
OFI windows: 50ms/100ms/500ms rolling
Event intensity: updates/sec, trades/sec, cancels/sec
Queue turnover: cancel-to-add ratio by side
Volatility proxy: microprice variance in rolling micro-window
Time-of-day phase (open/close/midday)

Minimal engineered target examples:

Classification: (\operatorname{sign}(M_{t+h} - M_t))
Regression: (M_{t+h} - M_t)
Execution-oriented: expected markout at (h)

Modeling Ladder (Start Simple, Then Escalate)

Stage A — Fast linear baseline

Ridge/Lasso on ([I, \text{OFI}, S, depth])
Separate models by spread regime (1 tick vs >1 tick)
Refit frequently (intraday/rolling)

Stage B — Regime-aware nonlinear model

Gradient boosting or small MLP
Interaction terms: (\text{OFI}/\text{depth}), (I \times S), cancel intensity × spread
Calibrate per symbol cluster (liquid/illiquid buckets)

Stage C — Sequence model

CNN/LSTM/Transformer over event stream or LOB tensors
Keep microprice/OFI features as explicit channels (helps robustness/debuggability)
Use strict walk-forward + latency-aware inference budget

Calibration & Monitoring (Where Most Systems Fail)

1) Calibration

Check reliability by predicted move-probability deciles
Maintain horizon-specific calibration (100ms model calibration rarely transfers to 1s)

2) Drift diagnostics

Monitor at least:

hit-rate by spread regime
signed error conditional on imbalance decile
realized markout vs predicted markout
feature distribution drift (PSI/KS on (I), OFI, spread, depth)

3) Risk gating

Downweight or suspend signal when:

spread widens abruptly
cancellation burst exceeds threshold
market data gap / sequence uncertainty
auction or halt-transition regime detected

Execution Integration (Decision Layer)

Use forecast as one input, not sole trigger:

If upward microprice pressure + low adverse-selection score -> lean passive bid / delay aggressive buy
If downward pressure while long inventory -> accelerate passive unwind or cross smaller slices
Tie decision to inventory, urgency, and remaining schedule budget

A practical control form:

[ \text{Aggressiveness} = f(\text{forecast edge}, \text{inventory risk}, \text{time urgency}, \text{slippage budget left}) ]

Common Mistakes

Training on snapshot imbalance only; ignoring event-flow dynamics
Mixing horizons (features at 50ms, labels at 2s) without explicit rationale
Using random CV instead of chronological walk-forward
Ignoring queue position and fill uncertainty in execution evaluation
Treating paper alpha as deployable alpha without inference-latency accounting

Minimal Implementation Checklist

Build event-time LOB reconstruction with deterministic replay
Compute L1/Lk imbalance + OFI features
Train per-horizon baseline (linear + tree)
Evaluate with walk-forward and markout-oriented metrics
Add calibration + drift monitor
Integrate with execution policy under risk gates
Promote via champion/challenger rollout

One-Sentence Summary

Microprice and order-book imbalance are low-latency, high-value priors for short-horizon direction and execution timing; the edge comes from regime-aware calibration, drift control, and disciplined execution integration—not from a fancy model alone.

References (Starter Set)

Cont, R., Kukanov, A., Stoikov, S. (2014). The Price Impact of Order Book Events. Journal of Financial Econometrics. arXiv:1011.6402 — https://arxiv.org/abs/1011.6402
Huang, W., Lehalle, C.-A., Rosenbaum, M. (2015). Simulating and Analyzing Order Book Data: The Queue-Reactive Model. JASA. arXiv:1312.0563 — https://arxiv.org/abs/1312.0563
Stoikov, S. (2018). The Micro-Price: A High-Frequency Estimator of Future Prices. SSRN 2970694 — https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2970694
Zhang, Z., Zohren, S., Roberts, S. (2019). DeepLOB. IEEE TSP. arXiv:1808.03668 — https://arxiv.org/abs/1808.03668
Blakely, C. D. (2024). High resolution microprice estimates... arXiv:2411.13594 — https://arxiv.org/abs/2411.13594