Coordinated Omission in Latency Benchmarks: A Practical Detection & Mitigation Playbook

Date: 2026-03-08
Category: knowledge
Domain: software / performance engineering / reliability

Why this matters

Many load tests underreport tail latency exactly when systems are unhealthy.

If your generator waits for a slow response before sending the next request, it silently stops sampling the bad period. This is coordinated omission (CO): the measurement process unintentionally synchronizes with the SUT and omits pain.

Result: p99/p99.9 can look great while users are having a terrible time.

One-line definition

Coordinated omission = a benchmarking artifact where request generation and latency sampling are coupled to completion, so delayed periods cause missed arrivals and missing bad samples.

Mental model: open vs closed workload

Closed model (CO-prone by default)

New iteration starts after previous one finishes.
Arrival rate depends on response time.
When SUT slows down, load generator also slows down.

Open model (CO-resistant)

Arrivals are scheduled independently of completion.
SUT slowdown does not reduce scheduled arrival pressure.
Better for modeling internet/mobile/API traffic with exogenous arrivals.

Rule of thumb:

User-concurrency question (“what can N users do?”) → closed can be fine.
Throughput/SLO question (“what happens at 20k rps?”) → use open/constant-arrival.

Why CO destroys tail truth

Suppose target arrival interval is 10ms (100 rps), then SUT stalls for 2 seconds.

Honest accounting: during the stall, ~200 arrivals are “late experiences” and should influence tail distribution.
CO-prone accounting: you might record only one 2s sample, then resume normal samples.

This massively compresses tail probability mass and makes percentiles look safer than reality.

Practical symptoms you likely have CO

Latency tails barely move when you intentionally inject pauses/stalls.
Throughput drops during degradation, but percentile charts remain oddly smooth.
Tool uses fixed VU loops and no constant-arrival executor for your scenario.
Results look much better than production telemetry at similar offered load.

Mitigation ladder (in order)

1) Pick the right load model first

For arrival-rate/SLO studies, prefer:

constant-arrival/open-loop generators,
explicit request schedule,
sufficient worker pool so scheduler itself is not bottlenecked.

2) Record latency relative to planned send time

Measure from when request should have been sent (schedule time), not only actual send timestamp.

That captures queueing delay caused by missed schedule slots.

3) Use CO-aware histogram recording

HdrHistogram supports correction methods such as recording with expected interval and copy/add corrected variants.

Conceptually, when an observed latency L exceeds expected interval I, add synthetic samples at L-I, L-2I, ... down to I.

This approximates omitted arrivals that experienced long waits.

4) Publish full percentile spectrum

Always expose:

max,
p99, p99.9, p99.99 (as sample size allows),
histogram/log-scale distribution,
offered load vs achieved throughput.

Do not rely on median or p95-only dashboards.

5) Run controlled “truth tests”

Inject deterministic stalls (e.g., 1s pause every 30s) and verify your tooling reflects expected tail inflation. If not, instrumentation is lying.

Tooling notes

wrk2: designed for constant-throughput load and CO-aware latency accounting via planned send-time semantics.
Vegeta: emphasizes constant-rate attacks and explicitly calls out avoiding coordinated omission.
k6: docs clearly separate closed vs open models; use arrival-rate executors for CO-sensitive tests.
HdrHistogram: provides APIs for recording/copying while correcting for coordinated omission.

Minimal benchmark protocol (production-worthy)

Define objective: saturation curve, SLO at fixed offered load, or capacity frontier.
Choose workload model matching objective (open for arrival-rate realism).
Fix offered load schedule (warmup, steady-state, stress steps).
Capture both offered and achieved rates.
Log full latency histogram per interval (not just summarized percentiles).
Report CO-corrected and raw views when possible.
Inject synthetic pauses to validate measurement honesty.
Compare benchmark tails with production traces for sanity.

Common mistakes

Using closed-loop VU tests to claim fixed-RPS SLOs
→ invalid inference.
Plotting only p50/p95
→ hides user-impactful outliers.
Ignoring max values as “noise”
→ often throwing away the signal.
Generator saturation mistaken for SUT saturation
→ benchmarking the client, not the server.
No pause/fault injection in benchmark validation
→ cannot detect CO failure mode.

Decision cheatsheet

Need realism for internet arrivals / API spikes? → Open model.
Need “N users cycling” UX simulation? → Closed model (but don’t overclaim throughput truth).
Need tail-SLO signoff? → Open + CO-aware histograms + stall validation.

Implementation snippet (conceptual)

Given expected interval I:

observe latency L
record L
while L > I:
- L = L - I
- record L

This reconstructs omitted samples implied by schedule misses.

KPI set to track benchmark integrity

Offered RPS vs achieved RPS gap
p99/p99.9/p99.99 and max per interval
Fraction of intervals with generator lag
CO-corrected vs uncorrected tail ratio
Tail amplification during synthetic stall tests

If these are absent, you are probably optimizing against comforting artifacts.

One-line takeaway

Before tuning your service, verify your benchmark is not coordinating away the very tail events your users actually feel.

References

Gil Tene, wrk2 README (constant throughput + CO-aware latency accounting):
https://github.com/giltene/wrk2
HdrHistogram JavaDoc (CO correction APIs: recordValueWithExpectedInterval, copyCorrectedForCoordinatedOmission, etc.):
https://hdrhistogram.github.io/HdrHistogram/JavaDoc/org/HdrHistogram/AbstractHistogram.html
https://hdrhistogram.github.io/HdrHistogram/JavaDoc/org/HdrHistogram/Recorder.html
Grafana k6 docs: open vs closed models and CO discussion:
https://grafana.com/docs/k6/latest/using-k6/scenarios/concepts/open-vs-closed/
Vegeta README (constant rate and CO-avoidance positioning):
https://github.com/tsenart/vegeta
ScyllaDB explainer on CO (open vs closed framing and practical pitfalls):
https://www.scylladb.com/2021/04/22/on-coordinated-omission/
HighScalability summary of Gil Tene’s “How NOT to Measure Latency”:
https://highscalability.com/your-load-generator-is-probably-lying-to-you-take-the-red-pi/