Fixing Silent Dropped Messages in Redis Pub/Sub Under High Throughput
You publish a message to a Redis channel and nothing complains β no error, no warning, no stack trace. But on the receiving end, your subscriber processed fewer messages than you sent. This is the most frustrating class of bug: silent data loss that only shows up in aggregate metrics, if you're measuring at all.
Redis Pub/Sub is fast and dead simple to set up. But under real load, its delivery guarantees are weaker than most developers expect, and the failure mode is invisible by design.
What you'll learn
- Why Redis Pub/Sub drops messages at the protocol level
- How to detect message loss in a running system
- Practical mitigation strategies without rearchitecting everything
- When to migrate to Redis Streams instead
- How to tune subscriber performance to reduce drop risk
Prerequisites
This article assumes you're running Redis 6 or later and have a working Pub/Sub setup already. Code examples use Python with redis-py, but the concepts apply to any client library. You should be comfortable with basic async patterns and have access to your Redis server configuration.
Why Redis Pub/Sub Drops Messages
Redis Pub/Sub is a fire-and-forget system. When you call PUBLISH, Redis delivers the message to every connected subscriber right now, in that instant. It does not queue anything. If a subscriber is slow, Redis buffers the outgoing data in the client's output buffer. If that buffer fills up before the subscriber drains it, Redis closes the connection and drops everything in the buffer.
This is documented behavior, not a bug. The relevant config key is client-output-buffer-limit. The default for pubsub clients is:
client-output-buffer-limit pubsub 32mb 8mb 60That means: if the output buffer hits 32 MB at any point, or stays above 8 MB for 60 consecutive seconds, Redis disconnects the client. No warning is sent to the subscriber. From the subscriber's perspective, the connection just drops. If your reconnect logic re-subscribes cleanly, you'll never know how many messages you missed.
How to Detect Message Loss
The first step is confirming you actually have a problem. Redis gives you two commands worth knowing here.
Run CLIENT LIST and look at the omem field for each subscriber connection. That's the current output buffer memory usage. If you see values growing steadily or hitting the MB range, your subscriber isn't keeping up.
redis-cli client list type pubsubYou'll get output like this:
id=42 addr=127.0.0.1:54312 ... omem=4194304 ... cmd=subscribeAlso check INFO stats and look at total_commands_processed vs your own publish counter. More usefully, add a sequence number to every message payload you publish. On the subscriber side, track the last-seen sequence and flag any gaps. This is the only reliable way to measure actual loss β Redis itself doesn't track undelivered Pub/Sub messages.
import redis
import json
import time
r = redis.Redis()
def publish_with_seq(channel: str, data: dict, seq: int):
payload = json.dumps({"seq": seq, "ts": time.time(), **data})
r.publish(channel, payload)
# Publisher side
for i in range(10000):
publish_with_seq("events", {"type": "click", "user": i}, seq=i)
On the subscriber, check for gaps:
last_seq = -1
def handle_message(message):
global last_seq
payload = json.loads(message["data"])
seq = payload["seq"]
if seq != last_seq + 1:
print(f"GAP detected: expected {last_seq + 1}, got {seq}")
last_seq = seq
The Slow Subscriber Problem
Most dropped messages trace back to one root cause: the subscriber is doing too much work per message. If each message triggers a database write, an HTTP call, or a CPU-heavy computation, the subscriber falls behind the publisher. Redis's output buffer fills up and the connection is killed.
The fix is to decouple message receipt from message processing. Receive the message as fast as possible, push it onto an internal queue, and process it in a separate thread or coroutine.
import redis
import json
import threading
import queue
local_queue = queue.Queue(maxsize=50000)
def subscriber_thread():
r = redis.Redis()
pubsub = r.pubsub()
pubsub.subscribe("events")
for message in pubsub.listen():
if message["type"] == "message":
try:
local_queue.put_nowait(message["data"])
except queue.Full:
print("Local queue full β consider scaling consumers")
def worker_thread():
while True:
data = local_queue.get()
payload = json.loads(data)
# Do your expensive work here
process_event(payload)
local_queue.task_done()
threading.Thread(target=subscriber_thread, daemon=True).start()
threading.Thread(target=worker_thread, daemon=True).start()
This pattern separates the I/O path from the processing path. The subscriber loop stays tight and drains Redis's output buffer quickly. Your processing logic can take as long as it needs without affecting buffer pressure.
Tuning the Output Buffer Limit
If your workload is inherently bursty β a flood of publishes followed by quiet periods β raising the buffer limits can help absorb spikes without disconnecting subscribers. Edit your redis.conf or set it at runtime:
redis-cli config set client-output-buffer-limit "pubsub 64mb 16mb 90"This gives each subscriber a hard cap of 64 MB and a soft cap of 16 MB sustained for 90 seconds. Be careful here. Raising these limits uses real server memory. If you have ten subscribers and each buffer grows to 64 MB, that's 640 MB of RAM committed. Monitor your Redis instance's used_memory after tuning.
Buffer tuning is a safety net, not a fix. You still need a fast subscriber. Use it to buy yourself time during burst absorption, not as a substitute for proper consumer design.
Using PSUBSCRIBE Carefully
Pattern subscriptions (PSUBSCRIBE) are convenient but carry a hidden cost. Every message published to any channel gets evaluated against every pattern subscription on the server. Under high publish rates, this adds CPU overhead on both the server and client side. If you're using PSUBSCRIBE "events.*" where a plain SUBSCRIBE events:clicks events:views would work, switch to the explicit form.
Also, a single Redis connection can handle multiple channel subscriptions. Avoid opening one connection per channel unless you have a strong reason.
When to Switch to Redis Streams
If you've tuned buffers, optimized your subscriber, and you're still losing messages, Pub/Sub may be the wrong tool. Redis Streams were added specifically to address the durability gap.
The key differences:
| Feature | Pub/Sub | Streams |
|---|---|---|
| Message persistence | No | Yes |
| Replay after disconnect | No | Yes |
| Consumer groups | No | Yes |
| Delivery acknowledgement | No | Yes (XACK) |
| Setup complexity | Low | Medium |
With Streams, a subscriber that disconnects and reconnects can read from its last acknowledged ID. No messages are missed. The tradeoff is higher setup complexity and the need to manage stream length with MAXLEN to avoid unbounded memory growth.
A minimal Streams consumer looks like this:
import redis
r = redis.Redis()
STREAM = "events"
GROUP = "workers"
CONSUMER = "worker-1"
# Create group if it doesn't exist
try:
r.xgroup_create(STREAM, GROUP, id="$", mkstream=True)
except redis.exceptions.ResponseError:
pass # Group already exists
while True:
messages = r.xreadgroup(GROUP, CONSUMER, {STREAM: ">"}, count=100, block=5000)
if not messages:
continue
for stream_name, entries in messages:
for entry_id, fields in entries:
process_event(fields)
r.xack(STREAM, GROUP, entry_id)
The > ID means "give me messages not yet delivered to this group." After processing, XACK marks the message as handled. If the consumer crashes before acknowledging, the message stays pending and can be reclaimed.
Common Pitfalls
Reconnect without re-subscribe. If your client reconnects after a dropped connection but doesn't call SUBSCRIBE again, it silently receives nothing. Most client libraries handle this, but check yours explicitly β especially if you're using a lower-level client or custom connection pool.
Blocking the event loop. In async frameworks (asyncio, Tornado), blocking the event loop inside a message handler will stall the entire subscriber. Even a 50ms database call repeated at 5000 messages/sec will cause buffer overflow. Use await for every I/O call inside async handlers.
No health monitoring on subscriber connections. Run a periodic check on CLIENT LIST and alert if any pubsub client's omem exceeds a threshold. This is the earliest signal you have before Redis starts dropping connections.
Publishing from within a subscriber callback. Some developers publish a response message from inside the message handler. On the same connection, this will raise an error in most clients because a subscribed connection only accepts subscribe/unsubscribe commands. Use a separate Redis connection for publishing.
Scaling Horizontally
A single subscriber process has a ceiling. If your message volume genuinely exceeds what one process can drain, you need multiple subscribers. With plain Pub/Sub, every subscriber receives every message β you can't split the load. This is another reason Redis Streams with consumer groups is a better fit for high-throughput pipelines: multiple workers in the same group each receive a distinct subset of messages, giving you horizontal scale without duplicate processing.
If you must stick with Pub/Sub, split your channels. Instead of one events channel, shard into events:0 through events:N and assign one subscriber process per shard. Your publisher hashes the message key to pick a channel.
Wrapping Up
Silent message loss in Redis Pub/Sub is predictable once you understand the output buffer mechanism. Here's what to do next:
- Add sequence numbers to your messages now and log any gaps on the subscriber side. You can't fix what you can't measure.
- Profile your subscriber's per-message latency. If it's above a few milliseconds, move processing off the receive loop using an internal queue.
- Check
CLIENT LISTunder load and watch theomemfield. Set an alert before it approaches your soft limit. - Evaluate Redis Streams if you need replay, at-least-once delivery, or multi-consumer load balancing. The migration cost is real but so is continued data loss.
- Tune
client-output-buffer-limitto absorb burst traffic, but treat it as a buffer β not a fix for an underpowered subscriber.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!