TCP Small Queues (TSQ) Throttle Oscillation & Slippage Playbook
Why this matters
Execution teams often tune strategy logic and venue selection, but miss a kernel send-path control loop: TCP Small Queues (TSQ).
TSQ (governed by net.ipv4.tcp_limit_output_bytes) limits per-socket bytes queued in qdisc/device,
so one sender cannot create excessive local buffering.
When mis-tuned for low-latency execution traffic, TSQ can create a hidden slippage tax in two opposite ways:
- Too tight -> repeated throttle/unthrottle cycles (pushback bursts)
- Too loose -> local queue inflation (artificial RTT, stale dispatch)
Either way, child-order wire timing drifts from intended schedule, and p95/p99 implementation shortfall worsens.
Failure mechanism (socket pushback loop -> execution tails)
- Strategy emits clustered small writes (common near urgency transitions or multi-venue rebalancing).
- Per-socket queued bytes rise toward
tcp_limit_output_bytes. - Socket is throttled; app-side sends observe pushback (or delayed dequeue opportunity).
- As queued skbs free, socket resumes; writes re-enter in mini-bursts.
- Child-order timing phase-locks to kernel wake/dequeue cycles instead of execution policy cadence.
Result: dispatch aliasing - a transport-side cadence imposed on trading logic.
Slippage decomposition with TSQ term
For parent order (i):
[ IS_i = C_{impact} + C_{timing} + C_{routing} + C_{tsq} ]
Where:
[ C_{tsq} = C_{pushback} + C_{phase-lock} + C_{recovery-burst} ]
- (C_{pushback}): delay from repeated TSQ throttling windows
- (C_{phase-lock}): mismatch between intended child cadence and kernel release cadence
- (C_{recovery-burst}): urgency overshoot after throttle release (cancel/replace or sweep clusters)
Operational metrics (new)
1) TTR - TSQ Throttle Ratio
[ TTR = \frac{t_{throttled}}{t_{active_send}} ] Fraction of active send time spent in throttled condition.
2) TUR95 - Throttle-Unthrottle Run p95
p95 duration of contiguous throttle episodes (ms).
3) WDA95 - Write-to-Departure Age p95
p95 delay from app write timestamp to first observed wire departure for that write cohort.
4) SCA - Send Cadence Aliasing
Normalized mismatch between intended dispatch gap and realized wire gap.
5) TPT - TSQ Pushback Tax
Incremental IS in high-TTR/high-WDA windows versus matched low-TTR windows.
What to log in production
Kernel / transport layer
net.ipv4.tcp_limit_output_bytes(current value and change history)- qdisc backlog bytes/packets by interface/class
- socket send-memory signals (
ss -tinmsnapshots, per-session skmem trends) - pacing/qdisc policy (
fq, pacing caps, host-level shaping) - relevant TCP send-path sysctls:
tcp_autocorkingtcp_tso_win_divisortcp_pacing_ss_ratio,tcp_pacing_ca_ratio
Execution layer
- per-child decision timestamp vs send timestamp vs ACK/fill visibility
- dispatch-gap error vs intended schedule
- cancel/replace burst factor during high-TTR windows
- short-horizon markout and tail IS uplift conditioned on TTR/WDA regime
Identification strategy (causal)
- Match windows by spread, volatility, participation, and time-of-day.
- Split into
TSQ_BALANCEDvsTSQ_CLIPPINGby TTR/TUR95 thresholds. - Estimate incremental tail IS (TPT) with symbol and host fixed effects.
- Run controlled canaries:
- moderate
tcp_limit_output_bytesadjustments, - pacing policy adjustments (
fq/socket pacing), - TSO burst-shape adjustments (
tcp_tso_win_divisor), - send-pattern smoothing in execution gateway.
- moderate
- Validate that TPT falls without degrading completion reliability.
If tail IS drops while market covariates remain matched, TSQ regime is infra-causal, not alpha decay.
Regime state machine
TSQ_BALANCED
- low TTR, stable WDA, low cadence aliasing
- run normal execution policy
TSQ_CLIPPING
- frequent short throttle bursts, rising SCA
- smooth dispatch cadence, damp urgency flips
TSQ_BLOATED
- low throttle but rising local queue age / WDA
- tighten queue budget, reduce burst size, preserve freshness
TSQ_SAFE_CONTAIN
- repeated tail breaches under unstable send-path dynamics
- force conservative pacing and protect completion deadline reliability
Use hysteresis and minimum dwell to avoid policy flapping.
Control ladder
- Measure first, tune second
- blind tuning
tcp_limit_output_bytesis a classic tail-latency footgun.
- blind tuning
- Stabilize application send cadence
- reduce write microbursts before touching kernel knobs.
- Tune TSQ together with pacing/qdisc
- TSQ alone cannot fix burst shape if upstream pacing is unstable.
- Control TSO burst size explicitly
- very large segmentation batches can reintroduce cadence spikes.
- Promote TTR/WDA into live execution features
- treat send-path stress as first-class slippage signal, not infra-only telemetry.
Failure drills
- Synthetic microburst drill
- replay clustered child-write patterns and verify TTR/TUR95 alarms.
- TSQ-step canary drill
- test small up/down
tcp_limit_output_byteschanges with rollback triggers.
- test small up/down
- Cadence-recovery drill
- validate that recovery from throttled windows does not induce panic bursts.
- Tail-budget drill
- assert automatic transition to
TSQ_SAFE_CONTAINon repeated p95 breaches.
- assert automatic transition to
Common mistakes
- Treating TSQ as "throughput tuning only" rather than timing-risk control
- Using one static
tcp_limit_output_bytesprofile across very different session types - Ignoring app-side microburst shape and only tweaking kernel sysctls
- Celebrating median latency while WDA95 and tail IS deteriorate
Bottom line
TSQ is a slippage control surface, not just a TCP safeguard.
If per-socket queue budgets are misaligned with execution cadence, transport pushback becomes a hidden scheduler that taxes p95/p99 fills. Model and control TSQ regime directly in live execution operations.
References
- Linux kernel IP sysctl docs (
tcp_limit_output_bytes, pacing, TSO controls) - LWN: TCP Small Queues
- LWN: initial TSQ patch discussion
- Queueing in the Linux Network Stack (BQL/TSQ/qdisc practical context)
tc-fq(8)man page (pacing/qdisc behavior)