Byte Queue Limits (BQL) Oscillation & Wire-Cadence Slippage Playbook

Why this matters

Many execution stacks optimize strategy logic, venue routing, and feed latency, but miss a kernel-level source of hidden cost: transmit queue-limit oscillation.

On Linux, BQL dynamically controls how many bytes can sit in each NIC TX queue. When this control loop becomes unstable (too permissive, then too tight, then permissive again), wire cadence becomes sawtoothed:

brief serialization bursts,
queue drain/starvation cycles,
clustered ACK/fill visibility,
execution-policy overreaction (late urgency, cancel/replace spikes),
p95/p99 implementation-shortfall lift.

Median latency can look acceptable while tail slippage quietly worsens.

Failure mechanism (host TX control loop -> execution tails)

Application + qdisc produce bursty enqueue patterns (often amplified by offloads).
Driver/NIC TX queue drains asynchronously; BQL limit adapts from completion feedback.
Under unstable conditions, limit oscillates around the true operating point.
Wire departure cadence alternates between mini-burst and underfill/starvation phases.
Child-order timing dephases from intended schedule and queue-priority assumptions.

Result: tail IS inflation driven by host transmit-control instability, not purely market regime.

Slippage decomposition with BQL term

For parent order (i):

[ IS_i = C_{impact} + C_{timing} + C_{routing} + C_{bql} ]

Where:

[ C_{bql} = C_{serialize} + C_{starve} + C_{burst-recover} ]

(C_{serialize}): excess delay from oversized TX queue occupancy windows
(C_{starve}): missed dispatch opportunities when queue goes briefly empty
(C_{burst-recover}): clustered send behavior after starvation/limit correction

Operational metrics (new)

1) BUI — Byte-Queue Utilization

[ BUI_t = \frac{inflight_t}{\max(limit_t, \epsilon)} ] Per-queue occupancy pressure relative to dynamic limit.

2) LOS — Limit Oscillation Score

[ LOS = p95\left(\left|\Delta \log(limit_t + 1)\right|\right) ] Captures instability in BQL control movement.

3) TSR — TX Stall Rate

[ TSR = \frac{\Delta stall_cnt}{\Delta t} ] Uses kernel BQL stall counters (where available) to quantify completion-stall episodes.

4) WCV95 — Wire Cadence Variability p95

p95 absolute deviation of inter-departure gaps from target pacing gap.

5) BOT — BQL Oscillation Tax

Incremental IS in high-LOS/high-TSR windows vs matched stable windows.

What to log in production

Kernel/NIC queue layer (per TX queue)

.../byte_queue_limits/limit
.../byte_queue_limits/inflight
.../byte_queue_limits/limit_min, limit_max, hold_time
.../byte_queue_limits/stall_cnt, stall_max, stall_thrs (if kernel/driver supports)
NIC ring/queue stats and TX timeout counters

Qdisc/transport layer

qdisc backlog bytes/packets
pacing config (sch_fq/socket pacing caps)
retransmit bursts and ACK inter-arrival variance

Execution layer

dispatch gap deviation vs schedule
cancel/replace burst ratio around high-LOS windows
short-horizon markout and tail IS uplift (BOT)

Identification strategy (causal)

Match windows by spread, volatility, participation, and TOD bucket.
Segment into BQL_STABLE vs BQL_OSCILLATING by LOS/TSR thresholds.
Estimate incremental tail IS with host and symbol fixed effects.
Run intervention canaries:
- pacing/qdisc changes (e.g., fq tuning),
- TX queue/ring tuning,
- offload profile changes,
- BQL bound adjustments where policy permits.
Confirm BOT reduction while market covariates stay matched.

If BOT falls after host-TX interventions, the effect is infra-causal.

Regime state machine

`BQL_STABLE`

low LOS, near-target BUI, no meaningful stall growth
normal execution policy

`BQL_SWING`

rising LOS, intermittent cadence distortion
damp urgency escalation, tighten retry aggressiveness

`BQL_STALLING`

elevated TSR/stall_max with cadence collapse episodes
cap aggression, increase schedule smoothing, preserve control stability

`BQL_SAFE_CONTAIN`

repeated tail breaches under unstable TX loop
force conservative mode, isolate path, prioritize deterministic dispatch

Use hysteresis + minimum dwell to avoid control flapping.

Control ladder

Make TX queue state observable first
- without per-queue BQL telemetry, “random venue noise” diagnosis is unreliable.
Stabilize pacing upstream of NIC queue
- use fair-queue pacing intentionally; avoid unbounded burst injection.
Tune queue bounds conservatively
- over-large limits can hide latency in driver/NIC queues.
Handle offload interactions explicitly
- TSO/GSO profiles can amplify byte-burst shape into cadence distortion.
Use stall counters as hard safety signals
- repeated completion stalls should trigger automatic defensive execution mode.
Model LOS/TSR as first-class slippage features
- include in mean + tail heads, not just dashboard alerts.

Failure drills (must run)

Burst-injection drill
- reproduce high enqueue burstiness and validate LOS/TSR detection.
Pacing-canary drill
- compare BOT before/after pacing policy changes.
Bound-sensitivity drill
- controlled limit_min/limit_max experiments with rollback plan.
Stall-threshold drill
- validate stall_thrs alerting and SAFE_CONTAIN transition behavior.

Common mistakes

Treating BQL as “kernel internals” irrelevant to execution quality
Optimizing median latency while ignoring cadence distortion tails
Raising queue depth to fix throughput and unintentionally increasing slippage tails
Running pacing without validating per-queue TX stability outcomes

Bottom line

BQL is a control loop, not just a queue knob.

When that loop oscillates, execution timing becomes non-deterministic and tail slippage rises. Treat per-queue BQL telemetry and stall signals as first-class inputs to live slippage control.

References

Linux kernel ABI: /sys/class/net/<iface>/queues/.../byte_queue_limits/*
- https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net-queues
tc-fq(8) manual (Linux fair-queue pacing)
- https://man7.org/linux/man-pages/man8/tc-fq.8.html
Dan Siemon, Queueing in the Linux Network Stack (BQL and buffering behavior)
- https://www.coverfire.com/articles/queueing-in-the-linux-network-stack/
LWN.net overview of Byte Queue Limits
- https://lwn.net/Articles/469652/