Cloudflare Queues Idempotency, Concurrency, and DLQ Practical Playbook
Date: 2026-04-07
Category: knowledge
Domain: software / cloudflare / async systems
Why this matters
Cloudflare Queues looks deceptively simple: push messages in, let a Worker consume them later. That simplicity is real, but the operational traps are also real.
If you model it like a perfectly ordered, exactly-once job runner, you usually build the wrong thing. What actually bites people in production is more mundane:
- duplicate side effects because delivery is at least once,
- whole-batch retries because one message failed late,
- backlog growth because concurrency is capped too low,
- silent message loss because retries exhausted without a DLQ,
- and misleading latency expectations because batching and delay settings interact.
The practical mental model is:
Cloudflare Queues is a durable at-least-once buffer with batch delivery, autoscaling consumers, and explicit per-message acknowledgement.
That combination is powerful, but only if your consumer is designed around it.
1) Fast mental model
Think of a Queue pipeline as four independent concerns:
Producer durability
Whensend()/sendBatch()resolves, the message is written durably.Delivery semantics
Delivery is at least once, not exactly once. Rare duplicates are part of the contract.Batch processing semantics
Consumers receive batches, and without explicitack(), a failure can cause the whole batch to be retried.Consumer scaling semantics
Consumers can autoscale horizontally, but only up to configured or platform limits, and only after batches finish processing.
This leads to the core rule:
Treat correctness as an application concern, not something the queue guarantees for you.
2) What Queues guarantees — and what it does not
What you do get
- Durable writes after successful
send()/sendBatch(). - Push-based Worker consumers or pull-based HTTP consumers.
- Automatic batching.
- Automatic consumer concurrency for push consumers.
- Explicit
ack()/retry()so partial failures do not always poison the whole batch. - Retry policy plus optional dead-letter queue.
What you do not get
- Exactly-once processing.
- Global ordering guarantees.
- Infinite retention.
- Infinite throughput.
- Automatic deduplication of business side effects.
Cloudflare’s docs are quite direct here: Queues optimizes for at least once delivery because stronger guarantees usually cost extra latency and throughput. So if duplicate processing is dangerous, the fix is not wishful thinking. The fix is idempotency keys plus durable deduplication at the side-effect boundary.
3) The most important design decision: where idempotency lives
If a duplicated message can cause damage, your architecture should answer this question clearly:
What durable system decides whether this message has already been applied?
Good answers:
- a D1 table with a unique key on
message_id, - a Durable Object keyed by entity ID that serializes application of events,
- an upstream API that supports an idempotency key,
- or a relational insert/upsert whose primary key is the queue message’s business ID.
Bad answers:
- an in-memory
Set, - console logs,
- “duplicates are probably rare,”
- or acknowledging the message before the side effect is durably committed.
Practical pattern
Generate or carry a stable business ID at publish time:
email_idpayment_attempt_idimport_job_id + row_idtenant_id + event_id
Then make the consumer do one of these:
- Insert-once using a unique constraint.
- Upstream idempotency using that ID as the provider idempotency key.
- Serialized apply inside a Durable Object for per-entity workflows.
If you cannot name your idempotency boundary, you probably do not yet have a production-ready consumer.
4) Batch semantics are where most accidental rework comes from
By default, queues deliver batches, not isolated single messages. That is great for cost and throughput, but it changes failure behavior.
Important consequence:
- if a batch is delivered,
- one message fails late,
- and you did not explicitly acknowledge already-successful messages,
- the batch can be retried and successful work can repeat.
This is why explicit acknowledgement matters so much.
Use explicit ack() when messages cause side effects
Use msg.ack() after a message’s effect is durably complete.
Examples:
- DB write committed,
- external API accepted the request with an idempotency key,
- file/object persisted,
- state machine step committed.
Use msg.retry() when a specific message should be retried later without forcing the whole batch to fail.
That gives you a much safer partial-failure shape:
- good messages get acknowledged,
- bad/transient ones are retried,
- the whole batch is not re-run unnecessarily.
Rule of thumb
- Purely idempotent, cheap work: batch-level success may be enough.
- Anything state-changing or expensive: explicitly
ack()per message.
5) Concurrency is a backlog-control lever, not just a speed lever
Push consumers can autoscale horizontally. By default, Cloudflare can increase concurrent consumer invocations as backlog or write rate rises.
That is good, but there are two common misconceptions.
Misconception A — “One consumer” means “one invocation at a time”
A queue has one active consumer configuration, but that consumer can still run with multiple concurrent invocations. So “single consumer per queue” does not mean serialized processing of the entire queue.
If you need strict serialization, implement it in your application layer:
- shard by key across multiple queues, or
- route each entity through a coordinating Durable Object.
Misconception B — lower max_concurrency is always safer
Sometimes yes, especially when an upstream API is rate-limited.
But setting max_concurrency = 1 can create a hidden backlog trap.
If producers can write faster than one invocation can drain, messages age toward retention expiry.
The practical tradeoff is:
- higher concurrency → lower queue latency, more downstream pressure,
- lower concurrency → protects fragile upstreams, but increases backlog risk.
When to cap concurrency deliberately
Cap max_concurrency when:
- the downstream API has strict rate limits,
- the consumer hits a scarce DB lock/resource,
- or you intentionally prefer backlog over upstream overload.
Otherwise, leaving concurrency uncapped is usually the sane default.
6) DLQ is not optional in serious systems
Without a dead-letter queue, messages that exhaust retries are discarded. That is fine for disposable analytics noise. It is not fine for important work.
Configure a DLQ whenever messages represent:
- user-visible jobs,
- billing or payment side effects,
- emails or notifications you may need to audit,
- sync/import pipelines,
- or anything that would require postmortem analysis.
What a DLQ is actually for
A DLQ is not just a trash can. It is your:
- forensic lane,
- replay lane,
- schema-mismatch detection lane,
- and “something changed upstream” alarm bell.
Operational rule
A DLQ with no consumer and no monitoring is only half-configured. At minimum, have one of these:
- a dedicated DLQ consumer that logs or persists failures for inspection,
- an alert on DLQ depth,
- or a periodic drain/review workflow.
Cloudflare docs note that messages placed in a DLQ without an active consumer persist for four days before deletion. So “we will check later” has a deadline.
7) Tune batch size for the downstream shape, not for aesthetics
Relevant queue-side limits/defaults:
max_batch_sizedefault: 10max_batch_timeoutdefault: 5 seconds- maximum batch size: 100 messages
- maximum batch timeout: 60 seconds
Batching should match what the downstream system wants.
Use smaller batches when
- per-message latency matters,
- each message is heavy,
- autoscaling reacts too slowly because batches take too long,
- or failures are frequent and you want tighter retry granularity.
Use larger batches when
- the downstream API accepts efficient bulk writes,
- the work is lightweight and homogeneous,
- cost per invocation matters,
- or you are smoothing bursty producer traffic.
A practical sizing heuristic
Choose batch size based on the slowest stable downstream unit:
- if the downstream rate limit is 2 requests/sec, do not create queue settings that encourage 50-message burst flushes into a single-threaded API wrapper;
- if the downstream supports bulk insert of 100 rows, exploit that and acknowledge each message only after the bulk transaction is committed.
The right batch size is not “biggest possible.” It is “largest size that preserves failure isolation and downstream stability.”
8) Delay is useful for backpressure, not just scheduling
Queues supports delaying messages both when sending and when retrying, up to 24 hours.
That matters for a simple reason: not every failure should be retried immediately.
Good uses of delay:
- backing off after HTTP 429s,
- smoothing spikes after an outage recovery,
- staging non-urgent work,
- separating hot-path user requests from cold-path background processing.
This gives you a lightweight backpressure control loop:
- call upstream,
- observe throttling or temporary failure,
retry()/ retry later with delay,- let the queue absorb the burst.
That is usually cleaner than implementing a fragile in-process sleep/retry storm inside one long-running consumer invocation.
9) The limits that actually shape architecture
A few documented limits matter more than the rest:
- message size: 128 KB
sendBatch()limit: 100 messages and 256 KB total payload- per-queue throughput: 5,000 messages/sec
- concurrent consumer invocations: 250
- consumer wall time: 15 minutes
- consumer CPU time: configurable up to 5 minutes
- backlog size per queue: 25 GB
- retention: up to 14 days on paid plans, 24 hours on free-plan retention
Architecture implications:
Do not put fat payloads in messages
Use queue messages as work descriptors, not giant data envelopes. Store large objects in R2 / D1 / another system and pass references.
One hot queue is not the only queue you can have
If one queue becomes a throughput or backlog hotspot, split by use case or shard key. Separate queues are often a simpler scaling primitive than overcomplicating one mega-pipeline.
Long wall time is not a license for sloppy consumers
A 15-minute wall clock limit sounds large, but it is not a reason to build giant “do everything in one invocation” handlers. Prefer small, restart-safe units of work.
10) Where Queues fits well
Queues is a strong fit when you want:
- request/response decoupling,
- burst absorption,
- eventually processed background work,
- lightweight event pipelines inside Workers,
- fan-out into downstream systems,
- or async workflows where duplicates are acceptable if side effects are idempotent.
Good examples:
- email / notification delivery,
- webhook ingestion and normalization,
- log/event enrichment,
- image/document processing triggers,
- import/export pipelines,
- retryable third-party API integration.
11) Where Queues is the wrong primitive
Do not force Queues into problems that actually want:
Strict per-entity serialization
Use Durable Objects if entity-local coordination is the hard part. Queues may still feed them, but the queue itself is not the serializer.
Exactly-once semantics as the core contract
You can emulate “effectively once” with idempotency, but if your team refuses that model conceptually, the architecture will keep fighting the platform.
Long, branching, stateful workflow orchestration
Queues is a delivery primitive, not a full workflow engine. Once the process becomes “many conditional steps with state transitions and retries across stages,” a workflow/state-machine layer usually becomes clearer.
12) A production checklist
Before calling a Queue consumer production-ready, I would want all of these answered:
Correctness
- What is the idempotency key?
- Where is dedup state stored durably?
- When exactly do we call
ack()? - What failures trigger
retry()versus terminal failure?
Capacity
- What write rate can producers sustain?
- What drain rate can one invocation sustain?
- Is
max_concurrencycapped, and why? - What happens if backlog grows for an hour?
Failure handling
- Is a DLQ configured?
- Who monitors DLQ depth?
- How are poisoned messages inspected and replayed?
- Are schema/version mismatches visible quickly?
Message design
- Is the payload small and versioned?
- Can consumers handle old and new message shapes during rollout?
- Are big blobs stored elsewhere and referenced indirectly?
If those answers are fuzzy, the queue setup is probably still in “demo works” territory.
13) The distilled advice
If I had to compress the whole playbook into five rules:
- Assume duplicates. Design for idempotency from day one.
- Acknowledge after durable success. Not before.
- Use DLQ for anything meaningful. Silent discard is a choice; make it consciously.
- Tune concurrency to the downstream bottleneck. Protect the real constraint, not your intuition.
- Keep messages small and boring. Put bulky state elsewhere.
That is the mindset that turns Cloudflare Queues from “nice async helper” into a production-safe building block.
References
- Cloudflare Queues overview: https://developers.cloudflare.com/queues/
- Delivery guarantees: https://developers.cloudflare.com/queues/reference/delivery-guarantees/
- How Queues works: https://developers.cloudflare.com/queues/reference/how-queues-works/
- Configure Queues: https://developers.cloudflare.com/queues/configuration/configure-queues/
- Batching, retries and delays: https://developers.cloudflare.com/queues/configuration/batching-retries/
- Consumer concurrency: https://developers.cloudflare.com/queues/configuration/consumer-concurrency/
- Dead Letter Queues: https://developers.cloudflare.com/queues/configuration/dead-letter-queues/
- Limits: https://developers.cloudflare.com/queues/platform/limits/
- JavaScript APIs: https://developers.cloudflare.com/queues/configuration/javascript-apis/