Coordinated Omission in Latency Benchmarks: A Practical Detection & Mitigation Playbook
Date: 2026-03-08
Category: knowledge
Domain: software / performance engineering / reliability
Why this matters
Many load tests underreport tail latency exactly when systems are unhealthy.
If your generator waits for a slow response before sending the next request, it silently stops sampling the bad period. This is coordinated omission (CO): the measurement process unintentionally synchronizes with the SUT and omits pain.
Result: p99/p99.9 can look great while users are having a terrible time.
One-line definition
Coordinated omission = a benchmarking artifact where request generation and latency sampling are coupled to completion, so delayed periods cause missed arrivals and missing bad samples.
Mental model: open vs closed workload
Closed model (CO-prone by default)
- New iteration starts after previous one finishes.
- Arrival rate depends on response time.
- When SUT slows down, load generator also slows down.
Open model (CO-resistant)
- Arrivals are scheduled independently of completion.
- SUT slowdown does not reduce scheduled arrival pressure.
- Better for modeling internet/mobile/API traffic with exogenous arrivals.
Rule of thumb:
- User-concurrency question (“what can N users do?”) → closed can be fine.
- Throughput/SLO question (“what happens at 20k rps?”) → use open/constant-arrival.
Why CO destroys tail truth
Suppose target arrival interval is 10ms (100 rps), then SUT stalls for 2 seconds.
- Honest accounting: during the stall, ~200 arrivals are “late experiences” and should influence tail distribution.
- CO-prone accounting: you might record only one 2s sample, then resume normal samples.
This massively compresses tail probability mass and makes percentiles look safer than reality.
Practical symptoms you likely have CO
- Latency tails barely move when you intentionally inject pauses/stalls.
- Throughput drops during degradation, but percentile charts remain oddly smooth.
- Tool uses fixed VU loops and no constant-arrival executor for your scenario.
- Results look much better than production telemetry at similar offered load.
Mitigation ladder (in order)
1) Pick the right load model first
For arrival-rate/SLO studies, prefer:
- constant-arrival/open-loop generators,
- explicit request schedule,
- sufficient worker pool so scheduler itself is not bottlenecked.
2) Record latency relative to planned send time
Measure from when request should have been sent (schedule time), not only actual send timestamp.
That captures queueing delay caused by missed schedule slots.
3) Use CO-aware histogram recording
HdrHistogram supports correction methods such as recording with expected interval and copy/add corrected variants.
Conceptually, when an observed latency L exceeds expected interval I, add synthetic samples at L-I, L-2I, ... down to I.
This approximates omitted arrivals that experienced long waits.
4) Publish full percentile spectrum
Always expose:
- max,
- p99, p99.9, p99.99 (as sample size allows),
- histogram/log-scale distribution,
- offered load vs achieved throughput.
Do not rely on median or p95-only dashboards.
5) Run controlled “truth tests”
Inject deterministic stalls (e.g., 1s pause every 30s) and verify your tooling reflects expected tail inflation. If not, instrumentation is lying.
Tooling notes
- wrk2: designed for constant-throughput load and CO-aware latency accounting via planned send-time semantics.
- Vegeta: emphasizes constant-rate attacks and explicitly calls out avoiding coordinated omission.
- k6: docs clearly separate closed vs open models; use arrival-rate executors for CO-sensitive tests.
- HdrHistogram: provides APIs for recording/copying while correcting for coordinated omission.
Minimal benchmark protocol (production-worthy)
- Define objective: saturation curve, SLO at fixed offered load, or capacity frontier.
- Choose workload model matching objective (open for arrival-rate realism).
- Fix offered load schedule (warmup, steady-state, stress steps).
- Capture both offered and achieved rates.
- Log full latency histogram per interval (not just summarized percentiles).
- Report CO-corrected and raw views when possible.
- Inject synthetic pauses to validate measurement honesty.
- Compare benchmark tails with production traces for sanity.
Common mistakes
Using closed-loop VU tests to claim fixed-RPS SLOs
→ invalid inference.Plotting only p50/p95
→ hides user-impactful outliers.Ignoring max values as “noise”
→ often throwing away the signal.Generator saturation mistaken for SUT saturation
→ benchmarking the client, not the server.No pause/fault injection in benchmark validation
→ cannot detect CO failure mode.
Decision cheatsheet
- Need realism for internet arrivals / API spikes? → Open model.
- Need “N users cycling” UX simulation? → Closed model (but don’t overclaim throughput truth).
- Need tail-SLO signoff? → Open + CO-aware histograms + stall validation.
Implementation snippet (conceptual)
Given expected interval I:
- observe latency
L - record
L - while
L > I:L = L - I- record
L
This reconstructs omitted samples implied by schedule misses.
KPI set to track benchmark integrity
- Offered RPS vs achieved RPS gap
- p99/p99.9/p99.99 and max per interval
- Fraction of intervals with generator lag
- CO-corrected vs uncorrected tail ratio
- Tail amplification during synthetic stall tests
If these are absent, you are probably optimizing against comforting artifacts.
One-line takeaway
Before tuning your service, verify your benchmark is not coordinating away the very tail events your users actually feel.
References
- Gil Tene,
wrk2README (constant throughput + CO-aware latency accounting):
https://github.com/giltene/wrk2 - HdrHistogram JavaDoc (CO correction APIs:
recordValueWithExpectedInterval,copyCorrectedForCoordinatedOmission, etc.):
https://hdrhistogram.github.io/HdrHistogram/JavaDoc/org/HdrHistogram/AbstractHistogram.html
https://hdrhistogram.github.io/HdrHistogram/JavaDoc/org/HdrHistogram/Recorder.html - Grafana k6 docs: open vs closed models and CO discussion:
https://grafana.com/docs/k6/latest/using-k6/scenarios/concepts/open-vs-closed/ - Vegeta README (constant rate and CO-avoidance positioning):
https://github.com/tsenart/vegeta - ScyllaDB explainer on CO (open vs closed framing and practical pitfalls):
https://www.scylladb.com/2021/04/22/on-coordinated-omission/ - HighScalability summary of Gil Tene’s “How NOT to Measure Latency”:
https://highscalability.com/your-load-generator-is-probably-lying-to-you-take-the-red-pi/