TCP Autocorking & Not-Sent Queue Drain-Burst Slippage Playbook

Date: 2026-03-23
Category: research
Scope: How Linux TCP autocorking + not-sent queue buildup creates hidden freeze→flush child-order cadence and tail slippage

Why this matters

Many execution stacks optimize for median wire latency and miss a transport behavior that quietly taxes tails: small writes are intentionally coalesced, then flushed in bursts.

On Linux, tcp_autocorking may hold tiny writes briefly to improve packet efficiency. Combined with application pacing jitter or busy sender threads, this can accumulate bytes in the TCP not-sent queue. When flush conditions trigger, child orders can leave in a compressed burst.

That burstiness degrades queue priority and increases impact/markout risk exactly when strategies think they are sending "smoothly".

Failure mechanism (operator timeline)

Strategy emits frequent small writes (child-order/control frames).
Kernel autocork logic defers immediate packetization for efficiency.
Not-sent queue depth grows during micro-pauses or scheduling hiccups.
Flush trigger fires (ACK progression, send-path wakeup, uncork-like condition).
Accumulated bytes drain rapidly across few scheduling quanta.
Venue sees clustered child arrivals instead of intended cadence.
Queue-age and adverse-selection penalties rise.

This is a transport pacing artifact, not pure market microstructure drift.

Extend slippage decomposition with a cork/queue term

[ IS = IS_{market} + IS_{impact} + IS_{timing} + IS_{fees} + \underbrace{IS_{cork}}_{\text{autocork + not-sent drain tax}} ]

Operational approximation:

[ IS_{cork,t} \approx a\cdot NSQ95_t + b\cdot FCI_t + c\cdot BDR_t + d\cdot CJD_t + e\cdot NUD_t ]

Where:

(NSQ95): p95 TCP not-sent queue bytes,
(FCI): flush-compression index,
(BDR): burst-drain ratio,
(CJD): child-join delay (decision→wire inflation due to coalescing),
(NUD): not-sent urgency divergence (urgency-weighted queue buildup).

Production metrics to add

1) Not-Sent Queue p95 (NSQ95)

[ NSQ95 = p95(\text{tcpi_notsent_bytes}) ]

Track by strategy, symbol bucket, and urgency tier.

2) Flush Compression Index (FCI)

[ FCI = \frac{p95(\Delta N_{wire}/\Delta t)}{p50(\Delta N_{wire}/\Delta t)} ]

Measures how sharply transmission rate spikes during drain windows.

3) Burst-Drain Ratio (BDR)

[ BDR = \frac{\text{bytes drained in top 1% drain windows}}{\text{total sent bytes}} ]

High BDR means most outbound flow is concentrated in short bursts.

4) Child Join Delay (CJD)

[ CJD = p95(t_{wire}-t_{decision}) - p50(t_{wire}-t_{decision}) ]

Captures coalescing-driven tail expansion beyond baseline jitter.

5) Not-Sent Urgency Divergence (NUD)

Urgency-weighted average not-sent bytes. Detects when high-priority flow is delayed behind transport accumulation.

6) Compression-Conditioned Markout Delta (CCMD)

Matched-cohort post-fill markout delta between COMPRESSED_DRAIN vs SMOOTH_DRAIN windows.

Modeling architecture

Stage 1: transport-compression regime detector

Inputs:

tcpi_notsent_bytes dynamics,
decision→wire delay quantiles,
sender CPU/run-queue pressure,
write-size histogram + inter-write intervals,
ACK-clock and send-queue wakeup context.

Output:

(P(\text{COMPRESSED_DRAIN}))

Stage 2: conditional slippage model

Predict (E[IS]), tail IS, and completion risk conditioned on regime probability.

Key interaction:

[ \Delta IS \sim \beta_1 urgency + \beta_2 comp + \beta_3(urgency \times comp) ]

Urgent schedules usually pay the largest tax when compression is high.

Controller state machine

GREEN — SMOOTH_DRAIN

Low not-sent buildup, stable cadence.
Use baseline execution policy.

YELLOW — ACCUMULATION_RISK

NSQ tail rising, mild cadence skew.
Actions:
- tighten sender pacing budget,
- reduce write-fragmentation,
- increase telemetry granularity.

ORANGE — COMPRESSED_DRAIN

Repeated queue buildup + burst flush.
Actions:
- cap per-flush child fanout,
- smooth release via explicit micro-batching windows,
- lower urgency in thin books.

RED — CONTAINMENT

Compression-linked tail-cost blowout.
Actions:
- switch to containment schedule,
- enforce strict slippage budget clamps,
- trigger runtime/network co-diagnosis.

Use hysteresis + minimum dwell time to avoid policy thrash.

Engineering mitigations (high ROI first)

Measure before toggling knobs
Add socket-level tcpi_notsent_bytes telemetry and decision→wire traces first.
Stabilize application write cadence
Avoid pathological tiny-write storms; use deterministic micro-batching rather than accidental coalescing.
Review tcp_autocorking behavior in low-latency paths
Consider targeted disable/override testing for latency-critical sockets.
Tune tcp_notsent_lowat with guardrails
Prevent excessive hidden backlog while preserving throughput where needed.
Pair TCP_NODELAY decisions with pacing discipline
Blindly disabling Nagle/autocork can shift cost into packet-rate overhead; validate end-to-end.
Add compression-aware routing penalties
Down-rank routes/hosts showing repeated compressed-drain signatures.

Validation protocol

Label COMPRESSED_DRAIN windows via NSQ95 + FCI + BDR thresholds.
Build matched cohorts by symbol, spread, volatility, participation, and urgency.
Estimate uplift in mean and q95 slippage plus completion shortfall.
Canary mitigations (cadence control, knob changes) on subset traffic.
Promote only if tail improvements persist without unacceptable throughput regression.

Practical observability checklist

tcpi_notsent_bytes distribution/time series
write-size and inter-write interval histograms
decision→wire latency quantiles split by strategy urgency
per-window drain-rate spikes and burst concentration
sender CPU pressure / run-queue overlays
matched-cohort markout deltas (COMPRESSED_DRAIN vs SMOOTH_DRAIN)

Success criterion: stable tail execution quality under sender-side coalescing pressure, not just good median send latency.

Pseudocode sketch

features = collect_transport_compression_features()  # NSQ95, FCI, BDR, CJD, NUD
p_comp = compression_model.predict_proba(features)
state = decode_compression_state(p_comp, features)

if state == "GREEN":
    params = baseline_policy()
elif state == "YELLOW":
    params = tighten_pacing_and_reduce_fragmentation()
elif state == "ORANGE":
    params = controlled_micro_batch_release()
else:  # RED
    params = containment_with_hard_tail_budget()

execute_with(params)
log(state=state, p_comp=p_comp)

Bottom line

Autocorking and not-sent queue dynamics can make a strategy look smooth in application logs while arriving bursty at the venue. Model this transport-compression branch explicitly, or tail slippage will keep masquerading as market noise.

References

Linux tcp(7) manual (socket behavior and options):
https://man7.org/linux/man-pages/man7/tcp.7.html
Linux kernel IP sysctl docs (tcp_autocorking, tcp_notsent_lowat):
https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html
RFC 896 — Congestion Control in IP/TCP Internetworks (Nagle context):
https://www.rfc-editor.org/rfc/rfc896
RFC 1122 — Requirements for Internet Hosts (TCP behavior requirements):
https://www.rfc-editor.org/rfc/rfc1122