TCP Small Queues (TSQ) Throttle Oscillation & Slippage Playbook

2026-03-20 · finance

TCP Small Queues (TSQ) Throttle Oscillation & Slippage Playbook

Why this matters

Execution teams often tune strategy logic and venue selection, but miss a kernel send-path control loop: TCP Small Queues (TSQ).

TSQ (governed by net.ipv4.tcp_limit_output_bytes) limits per-socket bytes queued in qdisc/device, so one sender cannot create excessive local buffering.

When mis-tuned for low-latency execution traffic, TSQ can create a hidden slippage tax in two opposite ways:

Either way, child-order wire timing drifts from intended schedule, and p95/p99 implementation shortfall worsens.


Failure mechanism (socket pushback loop -> execution tails)

  1. Strategy emits clustered small writes (common near urgency transitions or multi-venue rebalancing).
  2. Per-socket queued bytes rise toward tcp_limit_output_bytes.
  3. Socket is throttled; app-side sends observe pushback (or delayed dequeue opportunity).
  4. As queued skbs free, socket resumes; writes re-enter in mini-bursts.
  5. Child-order timing phase-locks to kernel wake/dequeue cycles instead of execution policy cadence.

Result: dispatch aliasing - a transport-side cadence imposed on trading logic.


Slippage decomposition with TSQ term

For parent order (i):

[ IS_i = C_{impact} + C_{timing} + C_{routing} + C_{tsq} ]

Where:

[ C_{tsq} = C_{pushback} + C_{phase-lock} + C_{recovery-burst} ]


Operational metrics (new)

1) TTR - TSQ Throttle Ratio

[ TTR = \frac{t_{throttled}}{t_{active_send}} ] Fraction of active send time spent in throttled condition.

2) TUR95 - Throttle-Unthrottle Run p95

p95 duration of contiguous throttle episodes (ms).

3) WDA95 - Write-to-Departure Age p95

p95 delay from app write timestamp to first observed wire departure for that write cohort.

4) SCA - Send Cadence Aliasing

Normalized mismatch between intended dispatch gap and realized wire gap.

5) TPT - TSQ Pushback Tax

Incremental IS in high-TTR/high-WDA windows versus matched low-TTR windows.


What to log in production

Kernel / transport layer

Execution layer


Identification strategy (causal)

  1. Match windows by spread, volatility, participation, and time-of-day.
  2. Split into TSQ_BALANCED vs TSQ_CLIPPING by TTR/TUR95 thresholds.
  3. Estimate incremental tail IS (TPT) with symbol and host fixed effects.
  4. Run controlled canaries:
    • moderate tcp_limit_output_bytes adjustments,
    • pacing policy adjustments (fq/socket pacing),
    • TSO burst-shape adjustments (tcp_tso_win_divisor),
    • send-pattern smoothing in execution gateway.
  5. Validate that TPT falls without degrading completion reliability.

If tail IS drops while market covariates remain matched, TSQ regime is infra-causal, not alpha decay.


Regime state machine

TSQ_BALANCED

TSQ_CLIPPING

TSQ_BLOATED

TSQ_SAFE_CONTAIN

Use hysteresis and minimum dwell to avoid policy flapping.


Control ladder

  1. Measure first, tune second
    • blind tuning tcp_limit_output_bytes is a classic tail-latency footgun.
  2. Stabilize application send cadence
    • reduce write microbursts before touching kernel knobs.
  3. Tune TSQ together with pacing/qdisc
    • TSQ alone cannot fix burst shape if upstream pacing is unstable.
  4. Control TSO burst size explicitly
    • very large segmentation batches can reintroduce cadence spikes.
  5. Promote TTR/WDA into live execution features
    • treat send-path stress as first-class slippage signal, not infra-only telemetry.

Failure drills

  1. Synthetic microburst drill
    • replay clustered child-write patterns and verify TTR/TUR95 alarms.
  2. TSQ-step canary drill
    • test small up/down tcp_limit_output_bytes changes with rollback triggers.
  3. Cadence-recovery drill
    • validate that recovery from throttled windows does not induce panic bursts.
  4. Tail-budget drill
    • assert automatic transition to TSQ_SAFE_CONTAIN on repeated p95 breaches.

Common mistakes


Bottom line

TSQ is a slippage control surface, not just a TCP safeguard.

If per-socket queue budgets are misaligned with execution cadence, transport pushback becomes a hidden scheduler that taxes p95/p99 fills. Model and control TSQ regime directly in live execution operations.


References