Fixing Celery Tasks That Silently Fail Without Raising Exceptions
Your Celery task ran. The worker picked it up, logged Task received, and marked it as SUCCESS. But the database row never updated. The email never sent. Nothing in the logs explains why. Silent task failures are one of the most frustrating problems in async Python work, precisely because they look like success.
This article walks you through why Celery tasks fail without raising exceptions, how to catch them in the act, and what patterns to put in place so it never blindsides you again.
What you'll learn
- The most common reasons Celery tasks fail silently
- How to use task signals and custom base classes to catch every failure
- How to configure retries correctly so transient errors don't turn into lost jobs
- How to use
task_always_eagerand result backends to confirm task outcomes - Concrete patterns for alerting when something goes wrong in production
Prerequisites
This article assumes you're running Celery 5.x with a Redis or RabbitMQ broker. Code examples use Python 3.10+. You should already know how to define and call a basic task β this is a debugging and architecture article, not a getting-started guide.
Why Celery Tasks Fail Without Raising
Celery has a quirk that surprises nearly everyone eventually: by default, it catches broad exception classes inside the task runner loop to keep workers alive. If your task raises a BaseException subclass that isn't a standard Exception (like SystemExit or KeyboardInterrupt), or if an exception is swallowed inside a try/except block in your own code, the task ends cleanly from Celery's perspective.
There are four common causes:
- Broad bare
exceptblocks in your task code that catch and log but never re-raise - Third-party libraries that swallow exceptions internally and return
Noneor a falsy value instead - Database or network calls that time out silently when no timeout is set
- Soft time limit exceeded β Celery raises
SoftTimeLimitExceeded, and if your code catches all exceptions, that gets eaten too
The result is a task state of SUCCESS with a return value of None, and a side effect that never happened.
Auditing Your Task Code for Silent Swallows
The first place to look is your own code. Search for bare except blocks that don't re-raise.
# Bad: swallows everything
@app.task
def send_invoice(order_id):
try:
order = Order.objects.get(id=order_id)
email_client.send(order.email, render_invoice(order))
except Exception as e:
logger.error("Something went wrong: %s", e)
# no raise β task returns None and reports SUCCESS
The fix is simple: re-raise after logging, or better yet, let Celery's retry mechanism handle known transient errors and let everything else propagate.
# Better: log and re-raise
@app.task(bind=True, max_retries=3)
def send_invoice(self, order_id):
try:
order = Order.objects.get(id=order_id)
email_client.send(order.email, render_invoice(order))
except (SMTPException, ConnectionError) as exc:
# retry transient failures
raise self.retry(exc=exc, countdown=60)
except Exception:
logger.exception("Unhandled error in send_invoice for order %s", order_id)
raise # let Celery mark this FAILURE
The critical line is that final raise. Without it, your task reports success and you lose the error trail entirely.
Using a Custom Base Task Class
If you have many tasks across a large codebase, patching each one individually is error-prone. A better approach is a custom base class that wraps the on_failure hook.
from celery import Task
import logging
logger = logging.getLogger(__name__)
class AlertOnFailureTask(Task):
abstract = True
def on_failure(self, exc, task_id, args, kwargs, einfo):
logger.error(
"Task %s[%s] failed: %s",
self.name,
task_id,
exc,
exc_info=einfo,
)
# plug in your alerting here: Sentry, PagerDuty, Slack webhook, etc.
super().on_failure(exc, task_id, args, kwargs, einfo)
Then use it as the base for all your tasks:
@app.task(bind=True, base=AlertOnFailureTask, max_retries=3)
def process_payment(self, payment_id):
...
The on_failure hook is only called when Celery actually marks the task as FAILURE β meaning you still need to make sure exceptions propagate out of your task body. This base class handles the reporting, not the propagation.
Task Signals for Cross-Cutting Concerns
Celery provides task signals that fire independently of whatever the task itself does. These are useful for building observability without modifying task code.
from celery.signals import task_failure, task_success, task_retry
@task_failure.connect
def handle_task_failure(sender=None, task_id=None, exception=None, **kwargs):
logger.error(
"FAILURE | task=%s id=%s exception=%s",
sender.name if sender else 'unknown',
task_id,
repr(exception),
)
@task_retry.connect
def handle_task_retry(sender=None, reason=None, **kwargs):
logger.warning("RETRY | task=%s reason=%s", sender.name, reason)
Wire these up in your Celery app's __init__.py or wherever you configure the application. They fire globally, so you get coverage on every task without touching task definitions.
Configuring a Result Backend to Confirm Outcomes
If you're running Celery without a result backend configured, you have no durable record of what happened. You're relying entirely on broker acknowledgment and worker logs. Adding a result backend gives you queryable task state.
# celeryconfig.py
result_backend = 'redis://localhost:6379/1'
result_expires = 3600 # keep results for 1 hour
task_track_started = True # also record when a task transitions to STARTED
With this in place, you can inspect task state from your application code or a management command:
from celery.result import AsyncResult
result = AsyncResult(task_id)
print(result.state) # PENDING, STARTED, SUCCESS, FAILURE, RETRY
print(result.result) # return value or exception instance
A state of SUCCESS with result.result == None when you expected a value is a clear signal that a bare except is hiding something. A state of FAILURE gives you the traceback via result.traceback.
Soft and Hard Time Limits
A task that runs forever is a silent failure of a different kind. The worker is occupied, the job never finishes, and the caller gets no response. Set both soft and hard time limits.
@app.task(
bind=True,
soft_time_limit=25,
time_limit=30,
)
def generate_report(self, report_id):
from celery.exceptions import SoftTimeLimitExceeded
try:
# expensive work here
...
except SoftTimeLimitExceeded:
logger.warning("Report %s timed out, cleaning up", report_id)
# do any cleanup (close file handles, partial writes, etc.)
raise # let it propagate so the task is marked FAILURE
The soft limit gives you a chance to clean up gracefully. The hard limit kills the worker process if it's still running. Both are better than a hung worker blocking your queue for hours.
You can also set defaults globally to protect every task:
# celeryconfig.py
task_soft_time_limit = 60
task_time_limit = 120
Common Pitfalls
Acknowledging before the work is done
Celery's default acks_late=False means the broker removes the message from the queue the moment the worker picks it up, not when it finishes. If the worker crashes mid-task, the job is lost. Set acks_late=True on tasks where durability matters, but understand that this can cause duplicate execution on worker restart, so your task logic needs to be idempotent.
@app.task(bind=True, acks_late=True)
def charge_customer(self, customer_id, amount):
...
Ignoring the result when it matters
If you call task.delay() and never inspect the AsyncResult, you have no way to know it failed. For fire-and-forget tasks this is fine. For tasks whose output gates the next step in a workflow, always store the task ID and check the result.
Retrying without an exponential backoff
Retrying immediately when a downstream service is under load often makes the problem worse. Use the countdown parameter with increasing delays, or Celery's exponential_backoff option available in recent versions.
raise self.retry(exc=exc, countdown=2 ** self.request.retries)
Max retries set to None
If max_retries is None (Celery's default), a task will retry forever on a persistent error. Your queue will quietly fill up with endlessly retrying tasks. Always set a finite max_retries and handle the MaxRetriesExceededError explicitly.
from celery.exceptions import MaxRetriesExceededError
@app.task(bind=True, max_retries=5)
def sync_to_external_api(self, record_id):
try:
api_client.sync(record_id)
except APIError as exc:
try:
raise self.retry(exc=exc, countdown=30)
except MaxRetriesExceededError:
logger.critical("sync_to_external_api exhausted retries for %s", record_id)
# mark the record as failed in your DB, alert someone
raise
Testing for Silent Failures Locally
Use CELERY_TASK_ALWAYS_EAGER = True in your test configuration to run tasks synchronously in the same process. This makes exceptions bubble up immediately, which is exactly what you want during development and in unit tests.
# settings.py (Django) or test config
CELERY_TASK_ALWAYS_EAGER = True
CELERY_TASK_EAGER_PROPAGATES = True # re-raises exceptions instead of swallowing
Without CELERY_TASK_EAGER_PROPAGATES, even eager mode can hide exceptions. Set both.
Wrapping Up
Silent Celery failures are almost always the result of a combination of broad exception handling, missing result backends, and no time limits. Here's what to do next:
- Audit every
exceptblock in your task code and make sure exceptions propagate unless you're explicitly handling a retry. - Add a result backend (Redis is the easiest choice) and set
task_track_started = Trueso you have a queryable record of task state. - Create a custom base task class or connect to the
task_failuresignal to push failures into your observability stack immediately. - Set
soft_time_limitandtime_limitat the global level so no task can hang a worker indefinitely. - Review your
max_retriesvalues and add exponential backoff so transient errors don't spiral into queue bloat.
Once you have these pieces in place, silent failures become loud ones β and loud failures are problems you can actually fix.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!