Redis Streams Consumer-Group Reliability Playbook

2026-03-27 Β· software

Redis Streams Consumer-Group Reliability Playbook

Why this note

Redis Streams is great for low-latency, operationally simple event pipelines when you already run Redis. But many teams hit reliability issues (stuck PEL, unbounded stream growth, duplicate side effects) before they hit throughput limits.

This playbook focuses on practical reliability patterns for production.


TL;DR


1) Correct mental model

Delivery semantics

Redis Streams + consumer groups give you:

But practically, you should assume:

Core lifecycle

  1. Producer appends with XADD
  2. Consumer group reads with XREADGROUP
  3. Message enters group PEL until acked
  4. Consumer does side effect
  5. Consumer XACKs message

If consumer dies after side effect but before XACK, message is redelivered. That is expected behavior.


2) Reliability design patterns

Pattern A β€” Idempotent handler boundary (non-negotiable)

Use one of:

Rule: A redelivered message must be safe.

Pattern B β€” Ack only after durable side effect

XACK means β€œdone”.

Prefer correctness; tune capacity to keep lag reasonable.

Pattern C β€” Reclaim abandoned messages

Use periodic sweeper logic:

Pattern D β€” Dead-letter stream

After N failed attempts:

A DLQ keeps the main pipeline moving while preserving forensic data.


3) Operational guardrails

Stream growth control

Never leave retention implicit.

Use one:

Choose retention from replay/SLA requirements, not guesswork.

PEL hygiene

Track per group/consumer:

High pending idle time is often a better early-warning signal than raw lag.

Consumer identity discipline

Use stable, explicit consumer names (host/pod + instance ID).

Backpressure policy

When downstream is unhealthy:

Without explicit backpressure, retries become a feedback loop.


4) Failure-mode checklist

Symptom: pending count climbs forever

Likely causes:

Actions:

  1. Check oldest pending idle + owning consumer
  2. Trigger XAUTOCLAIM for idle > threshold
  3. Scale workers if downstream healthy
  4. Route poison messages to DLQ

Symptom: duplicates causing money-loss side effects

Likely cause: non-idempotent handler.

Actions:

  1. Add idempotency key on side-effect boundary
  2. Reprocess replay safely in staging
  3. Block deploys without idempotency tests

Symptom: Redis memory pressure from streams

Likely causes:

Actions:

  1. Set MAXLEN/MINID policy
  2. Move blobs to object storage, keep pointers in stream
  3. Revisit retention windows by business value

5) Suggested baseline defaults (starting point)

These are not universal constants; calibrate with production telemetry.


6) When to move from Redis Streams to Kafka-class backbone

Consider migration when you need:

Stay on Redis Streams when:


7) Minimal runbook commands

# create group (once)
XGROUP CREATE mystream mygroup $ MKSTREAM

# consume
XREADGROUP GROUP mygroup consumer-1 COUNT 100 BLOCK 2000 STREAMS mystream >

# inspect pending summary
XPENDING mystream mygroup

# inspect pending details
XPENDING mystream mygroup - + 20

# reclaim idle pending messages
XAUTOCLAIM mystream mygroup consumer-recovery 60000 0-0 COUNT 100

# ack after successful processing
XACK mystream mygroup 1711450000000-0

# trim stream (approximate)
XTRIM mystream MAXLEN ~ 1000000

References