TSO/GSO Segmentation-Burst Serialization Slippage Playbook
Why this exists
Execution stacks can show acceptable median latency while still leaking p95/p99 implementation shortfall.
One under-modeled source is transmit-path burst serialization from offload behavior:
- large socket writes are coalesced,
- TSO/GSO emits bigger wire bursts than tactic logic intends,
- ACK timing compresses and bunches control decisions,
- cancel/replace cadence dephases,
- queue priority decays at exactly the wrong moments.
When this path is ignored, desks often classify the outcome as "market randomness" instead of a repeatable host-side timing tax.
Core failure mode
In low-latency execution, child-order schedules are usually designed in fine time granularity. But TX offload can warp the realized wire-time pattern:
- Strategy issues smooth child intents.
- Kernel/NIC aggregates payloads (TSO/GSO path).
- Wire sees short microbursts with larger serialization blocks.
- ACK/feedback timing gets phase-distorted (sometimes compressed, sometimes delayed).
- Follow-up decisions run on mis-timed control feedback.
- Queue placement quality degrades and fallback aggression rises.
Result: tail slippage inflation without obvious spread/volatility anomalies.
Slippage decomposition with TX-burst term
For parent order (i):
[ IS_i = C_{delay} + C_{impact} + C_{miss} + C_{tx-burst} ]
Where:
[ C_{tx-burst} = C_{wire-phase} + C_{feedback-distortion} + C_{queue-reset} ]
- Wire-phase cost: intended schedule vs actual serialization pattern mismatch
- Feedback-distortion cost: ACK/markout feedback arrives in clustered or lagged form
- Queue-reset cost: extra amend/cancel/new churn after timing misses
Feature set (production-ready)
1) TX-path/offload features
- TSO/GSO enable-state per interface/queue
- packets-per-skb distribution (or segment fan-out proxy)
- bytes-per-transmit-burst quantiles (p50/p95/p99)
- qdisc enqueue/dequeue burst-shape metrics
- NIC TX queue occupancy + drain-interval jitter
2) Control-loop timing features
- decision-to-wire latency quantiles
- wire-to-ACK latency quantiles and tail slope
- ACK compression ratio by venue/session segment
- cancel/replace inter-arrival burst index
- child-order cadence phase error vs intended scheduler clock
3) Outcome features
- passive fill ratio drop under high burst-factor windows
- 10ms/100ms/1s/5s markout ladder shifts
- completion shortfall vs urgency target
- regime labels:
WIRE_CLEAN,BURSTING,DEPHASED,SAFE_TX_CONTAIN
Practical metrics
- SBF (Segmentation Burst Factor): realized wire burstiness vs intended smooth cadence
- WST95 (Wire Serialization Tail p95): high-quantile serialization delay proxy
- ACR (ACK Compression Ratio): clustered ACK intensity vs baseline
- CPE (Cadence Phase Error): scheduler intent vs realized transmit phase drift
- BTU (Burst Tax Uplift): realized IS uplift attributable to TX-burst regimes
Track by host class, NIC model/driver, kernel version, and session segment (open/mid/close/event windows).
Model architecture
Use baseline + infra-overlay:
- Baseline slippage model
- spread/impact/urgency/deadline in infra-clean assumptions
- TX-burst overlay model
- incremental mean/tail uplift from SBF/WST95/ACR/CPE
Final estimator:
[ \hat{IS}{final} = \hat{IS}{baseline} + \Delta\hat{IS}_{tx-burst} ]
Calibration rule: compare like-for-like market states (liquidity, vol, participation bucket) so offload-induced timing distortion is not confounded with regime volatility.
Regime controller
State A: WIRE_CLEAN
- low SBF/WST95, stable ACK timing
- normal tactic/pacing
State B: BURSTING
- rising SBF, intermittent ACK compression
- reduce discretionary replace churn, smooth child cadence
State C: DEPHASED
- persistent CPE + elevated ACR/WST95
- prioritize queue-preserving actions, cap bursty tactical switches
State D: SAFE_TX_CONTAIN
- sustained dephasing + deadline risk
- route flow toward cleaner host/path profiles, simplify tactic set, protect completion reliability
Use hysteresis and minimum dwell time to avoid oscillatory policy flips.
Mitigation ladder
- Offload policy by role
- execution-critical interfaces may need stricter TSO/GSO posture than throughput-oriented roles
- Queueing discipline alignment
- validate qdisc + pacing assumptions against real wire shape
- TX queue topology hygiene
- map critical threads and NIC queues to minimize incidental contention
- Control-loop damping
- avoid high-frequency amend/cancel reactions when ACK compression rises
- Post-change recalibration
- retrain overlay after kernel, driver, NIC firmware, or qdisc/pacing changes
Failure drills (must run)
- Synthetic burst-shape replay
- verify
WIRE_CLEAN->BURSTING->DEPHASEDtransitions
- verify
- Offload A/B drill
- controlled TSO/GSO policy comparison with BTU tail impact tracking
- ACK-path stress drill
- validate controller behavior under compressed/delayed feedback timing
- Containment failover drill
- deterministic shift to safer host/path profile under sustained dephasing
Anti-patterns
- Treating "TSO/GSO is enabled" as a binary tuning checkbox instead of a regime variable
- Assuming strategy cadence equals wire cadence without measuring serialization shape
- Chasing micro-tactic gains while control-loop phase drift remains unmodeled
- Using only median latency dashboards to certify TX-path health
Bottom line
TSO/GSO behavior can silently reshape execution timing by turning smooth intent into bursty wire reality.
If you model that transmit-path distortion explicitly, slippage tails become attributable and controllable. If you do not, you keep paying a recurring basis-point tax and mislabeling it as market noise.