Path-MTU Black-Hole & MSS-Collapse Slippage Playbook
Date: 2026-03-22
Category: research
Scope: How PMTU discovery failure (ICMP/PTB blind spots) creates hidden decision-to-fill latency tails and slippage drift
Why this matters
Many execution systems treat transport latency as a smooth background process.
But PMTU failure creates a branching transport regime:
- larger packets get dropped on a path bottleneck,
- sender waits for retransmission/timeout and eventually shrinks MSS,
- send cadence shifts from smooth flow to stall→burst behavior.
In practice, this looks like random microstructure toxicity, while root cause is often path-level packetization failure.
Failure mechanism (operator timeline)
- Route path includes a lower-MTU segment (overlay, tunnel, VPN, middlebox path).
- Sender transmits packets sized for a larger MTU belief.
- PMTU signal is missing, filtered, delayed, or distrusted (classic black-hole condition).
- Larger packets repeatedly fail; retransmission and RTO pressure rise.
- Stack falls back to smaller effective MSS (or probes down/up slowly).
- Order-flow dispatch cadence becomes discontinuous; child-order timing drifts.
- Queue priority decays and deadline urgency overpays into thinner books.
This is not a strategy bug; it is a transport-state regime shift.
Extend slippage decomposition with PMTU-blackhole term
[ IS = IS_{market} + IS_{impact} + IS_{timing} + IS_{fees} + \underbrace{IS_{pmtu}}_{\text{PMTU black-hole tax}} ]
Operational approximation:
[ IS_{pmtu,t} \approx a\cdot LSR_t + b\cdot RTO95_t + c\cdot MFD_t + d\cdot PRL_t + e\cdot SBC_t ]
Where:
- (LSR): large-segment retransmission rate,
- (RTO95): p95 retransmission-timeout burden,
- (MFD): MSS fallback depth,
- (PRL): probe recovery latency (time to regain stable MSS),
- (SBC): send-burst compression after stall windows.
What to measure in production
1) Large-Segment Retransmission Rate (LSR)
[ LSR = \frac{#(retransmissions;on;segments;>;MSS_{safe})}{#(all;segments;>;MSS_{safe})} ]
Rising LSR with stable exchange-side health is a strong PMTU stress hint.
2) MSS Fallback Depth (MFD)
[ MFD = 1 - \frac{MSS_{effective}}{MSS_{baseline}} ]
Large MFD indicates costly downshift from expected wire efficiency.
3) Probe Recovery Latency (PRL)
Time from first black-hole signature to restored stable effective MSS.
Long PRL means prolonged degraded execution cadence.
4) Send-Burst Compression (SBC)
[ SBC = \frac{p95(\Delta t_{child_send})}{p50(\Delta t_{child_send})} ]
SBC expansion captures stall→flush packetization behavior leaking into execution timing.
5) Decision-to-Wire Tail Expansion (DWT95/99)
Primary KPI for policy impact.
Compare PMTU_STABLE vs PMTU_STRESS windows by cohort.
6) Markout Degradation Under PMTU Stress (MDP)
Matched-cohort post-fill markout delta between normal vs PMTU-stress episodes.
Minimal model architecture
Stage 1: PMTU stress classifier
Inputs:
- retransmission profile by payload size,
- effective MSS evolution,
- RTO tails,
- burst-compression metrics,
- path/route class metadata (tunnel/VPN/overlay tags).
Output:
- (P(\text{PMTU_STRESS}))
Stage 2: Conditional cost model
Predict:
- (E[IS]), (q95(IS)), completion risk conditioned on PMTU stress probability.
Include interaction term:
[ \Delta IS \sim \beta_1 urgency + \beta_2 pmtu + \beta_3(urgency \times pmtu) ]
Urgency tends to become most expensive exactly when PMTU instability is active.
Controller state machine
GREEN — PMTU_STABLE
- Stable MSS, low retransmission tails
- Normal execution policy
YELLOW — PMTU_SUSPECT
- LSR/RTO rising, early MSS instability
- Actions:
- reduce burst fan-out,
- tighten pacing jitter bounds,
- increase transport observability sampling.
ORANGE — PMTU_BLACKHOLE_LIKELY
- Persistent large-packet failure + fallback behavior
- Actions:
- switch to conservative packetization profile,
- reduce aggression on thin books,
- prioritize robust route class (known-good path).
RED — PMTU_CONTAINMENT
- Repeated collapse/reprobe loops + slippage tail blowout
- Actions:
- containment execution mode,
- strict participation caps,
- incident escalation with packet evidence.
Apply hysteresis + minimum dwell time to prevent policy thrash.
Engineering mitigations (ROI order)
Enable Packetization-Layer PMTU probing where appropriate
Linux:net.ipv4.tcp_mtu_probing(0/1/2) with explicit policy.Tune probe controls intentionally
tcp_base_mss,tcp_mtu_probe_floor,tcp_probe_interval,tcp_probe_thresholdshould be reviewed for latency-critical links.Audit ICMP/PTB handling across network boundaries
PMTU relies on receiving trustworthy path-size feedback (IPv4 frag-needed / IPv6 packet-too-big).Use MSS clamping on known encapsulation edges
Tunnels/overlays frequently create hidden MTU cliffs; enforce conservative MSS at boundaries.Canary network changes with PMTU telemetry gates
Promote only if LSR/RTO tails and MFD remain stable.Tag and route by path reliability class
Treat PMTU reliability as a first-class route feature in execution stack decisions.
Validation protocol
- Label PMTU stress windows using LSR + MFD + PRL thresholds.
- Build matched cohorts by symbol, spread, volatility, participation, venue, and time bucket.
- Estimate uplift in (E[IS]), (q95(IS)), and completion shortfall.
- Run canary policy: probing/tuning + route-class controls.
- Promote only when tail-cost reduction persists without unacceptable completion drag.
Practical observability checklist
- Effective MSS time series per flow class
- Retransmission by payload-size bucket
- RTO distribution conditioned on PMTU state
- Stall→burst send cadence metrics
- Decision-to-wire latency split by PMTU state
- Cohort markout comparison under PMTU stress
- Packet captures around collapse/reprobe episodes
Success criterion: stable tail latency and fill quality during path-MTU disturbances, not just normal-window average throughput.
Pseudocode sketch
features = collect_pmtu_features() # LSR, MFD, RTO95, PRL, SBC
p_stress = pmtu_stress_model.predict_proba(features)
state = decode_pmtu_state(p_stress, features)
if state == "GREEN":
params = default_execution_policy()
elif state == "YELLOW":
params = bounded_fanout_with_tighter_pacing()
elif state == "ORANGE":
params = conservative_packetization_and_route_hardening()
else: # RED
params = containment_mode_with_tail_budget_lock()
execute_with(params)
log(state=state, p_stress=p_stress)
Bottom line
PMTU failures are a hidden transport tax: they turn packetization assumptions into latency regime shifts, then into execution slippage tails.
Model PMTU stress as a first-class feature, instrument collapse/recovery dynamics, and wire explicit controller actions before path-level packet loss silently bills your basis points.
References
- RFC 1191 — Path MTU Discovery for IPv4:
https://www.rfc-editor.org/rfc/rfc1191 - RFC 8201 — Path MTU Discovery for IPv6:
https://www.rfc-editor.org/rfc/rfc8201 - RFC 4821 — Packetization Layer PMTU Discovery (PLPMTUD):
https://www.rfc-editor.org/rfc/rfc4821 - RFC 8899 — Datagram PLPMTUD (DPLPMTUD) update/extension context:
https://www.rfc-editor.org/rfc/rfc8899 - Linux kernel IP/TCP sysctl documentation (
tcp_mtu_probing, probe controls, PMTU behavior):
https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html