Fixing Celery Tasks That Silently Fail Without Raising Exceptions

May 17, 2026 7 min read 7 views
Stylized flat illustration of a broken circuit board with a faint warning signal, representing silent background task failures in a Python async system.

Your Celery task ran. The worker picked it up, logged Task received, and marked it as SUCCESS. But the database row never updated. The email never sent. Nothing in the logs explains why. Silent task failures are one of the most frustrating problems in async Python work, precisely because they look like success.

This article walks you through why Celery tasks fail without raising exceptions, how to catch them in the act, and what patterns to put in place so it never blindsides you again.

What you'll learn

  • The most common reasons Celery tasks fail silently
  • How to use task signals and custom base classes to catch every failure
  • How to configure retries correctly so transient errors don't turn into lost jobs
  • How to use task_always_eager and result backends to confirm task outcomes
  • Concrete patterns for alerting when something goes wrong in production

Prerequisites

This article assumes you're running Celery 5.x with a Redis or RabbitMQ broker. Code examples use Python 3.10+. You should already know how to define and call a basic task β€” this is a debugging and architecture article, not a getting-started guide.

Why Celery Tasks Fail Without Raising

Celery has a quirk that surprises nearly everyone eventually: by default, it catches broad exception classes inside the task runner loop to keep workers alive. If your task raises a BaseException subclass that isn't a standard Exception (like SystemExit or KeyboardInterrupt), or if an exception is swallowed inside a try/except block in your own code, the task ends cleanly from Celery's perspective.

There are four common causes:

  • Broad bare except blocks in your task code that catch and log but never re-raise
  • Third-party libraries that swallow exceptions internally and return None or a falsy value instead
  • Database or network calls that time out silently when no timeout is set
  • Soft time limit exceeded β€” Celery raises SoftTimeLimitExceeded, and if your code catches all exceptions, that gets eaten too

The result is a task state of SUCCESS with a return value of None, and a side effect that never happened.

Auditing Your Task Code for Silent Swallows

The first place to look is your own code. Search for bare except blocks that don't re-raise.

# Bad: swallows everything
@app.task
def send_invoice(order_id):
    try:
        order = Order.objects.get(id=order_id)
        email_client.send(order.email, render_invoice(order))
    except Exception as e:
        logger.error("Something went wrong: %s", e)
        # no raise β€” task returns None and reports SUCCESS

The fix is simple: re-raise after logging, or better yet, let Celery's retry mechanism handle known transient errors and let everything else propagate.

# Better: log and re-raise
@app.task(bind=True, max_retries=3)
def send_invoice(self, order_id):
    try:
        order = Order.objects.get(id=order_id)
        email_client.send(order.email, render_invoice(order))
    except (SMTPException, ConnectionError) as exc:
        # retry transient failures
        raise self.retry(exc=exc, countdown=60)
    except Exception:
        logger.exception("Unhandled error in send_invoice for order %s", order_id)
        raise  # let Celery mark this FAILURE

The critical line is that final raise. Without it, your task reports success and you lose the error trail entirely.

Using a Custom Base Task Class

If you have many tasks across a large codebase, patching each one individually is error-prone. A better approach is a custom base class that wraps the on_failure hook.

from celery import Task
import logging

logger = logging.getLogger(__name__)

class AlertOnFailureTask(Task):
    abstract = True

    def on_failure(self, exc, task_id, args, kwargs, einfo):
        logger.error(
            "Task %s[%s] failed: %s",
            self.name,
            task_id,
            exc,
            exc_info=einfo,
        )
        # plug in your alerting here: Sentry, PagerDuty, Slack webhook, etc.
        super().on_failure(exc, task_id, args, kwargs, einfo)

Then use it as the base for all your tasks:

@app.task(bind=True, base=AlertOnFailureTask, max_retries=3)
def process_payment(self, payment_id):
    ...

The on_failure hook is only called when Celery actually marks the task as FAILURE β€” meaning you still need to make sure exceptions propagate out of your task body. This base class handles the reporting, not the propagation.

Task Signals for Cross-Cutting Concerns

Celery provides task signals that fire independently of whatever the task itself does. These are useful for building observability without modifying task code.

from celery.signals import task_failure, task_success, task_retry

@task_failure.connect
def handle_task_failure(sender=None, task_id=None, exception=None, **kwargs):
    logger.error(
        "FAILURE | task=%s id=%s exception=%s",
        sender.name if sender else 'unknown',
        task_id,
        repr(exception),
    )

@task_retry.connect
def handle_task_retry(sender=None, reason=None, **kwargs):
    logger.warning("RETRY | task=%s reason=%s", sender.name, reason)

Wire these up in your Celery app's __init__.py or wherever you configure the application. They fire globally, so you get coverage on every task without touching task definitions.

Configuring a Result Backend to Confirm Outcomes

If you're running Celery without a result backend configured, you have no durable record of what happened. You're relying entirely on broker acknowledgment and worker logs. Adding a result backend gives you queryable task state.

# celeryconfig.py
result_backend = 'redis://localhost:6379/1'
result_expires = 3600  # keep results for 1 hour
task_track_started = True  # also record when a task transitions to STARTED

With this in place, you can inspect task state from your application code or a management command:

from celery.result import AsyncResult

result = AsyncResult(task_id)
print(result.state)   # PENDING, STARTED, SUCCESS, FAILURE, RETRY
print(result.result)  # return value or exception instance

A state of SUCCESS with result.result == None when you expected a value is a clear signal that a bare except is hiding something. A state of FAILURE gives you the traceback via result.traceback.

Soft and Hard Time Limits

A task that runs forever is a silent failure of a different kind. The worker is occupied, the job never finishes, and the caller gets no response. Set both soft and hard time limits.

@app.task(
    bind=True,
    soft_time_limit=25,
    time_limit=30,
)
def generate_report(self, report_id):
    from celery.exceptions import SoftTimeLimitExceeded
    try:
        # expensive work here
        ...
    except SoftTimeLimitExceeded:
        logger.warning("Report %s timed out, cleaning up", report_id)
        # do any cleanup (close file handles, partial writes, etc.)
        raise  # let it propagate so the task is marked FAILURE

The soft limit gives you a chance to clean up gracefully. The hard limit kills the worker process if it's still running. Both are better than a hung worker blocking your queue for hours.

You can also set defaults globally to protect every task:

# celeryconfig.py
task_soft_time_limit = 60
task_time_limit = 120

Common Pitfalls

Acknowledging before the work is done

Celery's default acks_late=False means the broker removes the message from the queue the moment the worker picks it up, not when it finishes. If the worker crashes mid-task, the job is lost. Set acks_late=True on tasks where durability matters, but understand that this can cause duplicate execution on worker restart, so your task logic needs to be idempotent.

@app.task(bind=True, acks_late=True)
def charge_customer(self, customer_id, amount):
    ...

Ignoring the result when it matters

If you call task.delay() and never inspect the AsyncResult, you have no way to know it failed. For fire-and-forget tasks this is fine. For tasks whose output gates the next step in a workflow, always store the task ID and check the result.

Retrying without an exponential backoff

Retrying immediately when a downstream service is under load often makes the problem worse. Use the countdown parameter with increasing delays, or Celery's exponential_backoff option available in recent versions.

raise self.retry(exc=exc, countdown=2 ** self.request.retries)

Max retries set to None

If max_retries is None (Celery's default), a task will retry forever on a persistent error. Your queue will quietly fill up with endlessly retrying tasks. Always set a finite max_retries and handle the MaxRetriesExceededError explicitly.

from celery.exceptions import MaxRetriesExceededError

@app.task(bind=True, max_retries=5)
def sync_to_external_api(self, record_id):
    try:
        api_client.sync(record_id)
    except APIError as exc:
        try:
            raise self.retry(exc=exc, countdown=30)
        except MaxRetriesExceededError:
            logger.critical("sync_to_external_api exhausted retries for %s", record_id)
            # mark the record as failed in your DB, alert someone
            raise

Testing for Silent Failures Locally

Use CELERY_TASK_ALWAYS_EAGER = True in your test configuration to run tasks synchronously in the same process. This makes exceptions bubble up immediately, which is exactly what you want during development and in unit tests.

# settings.py (Django) or test config
CELERY_TASK_ALWAYS_EAGER = True
CELERY_TASK_EAGER_PROPAGATES = True  # re-raises exceptions instead of swallowing

Without CELERY_TASK_EAGER_PROPAGATES, even eager mode can hide exceptions. Set both.

Wrapping Up

Silent Celery failures are almost always the result of a combination of broad exception handling, missing result backends, and no time limits. Here's what to do next:

  1. Audit every except block in your task code and make sure exceptions propagate unless you're explicitly handling a retry.
  2. Add a result backend (Redis is the easiest choice) and set task_track_started = True so you have a queryable record of task state.
  3. Create a custom base task class or connect to the task_failure signal to push failures into your observability stack immediately.
  4. Set soft_time_limit and time_limit at the global level so no task can hang a worker indefinitely.
  5. Review your max_retries values and add exponential backoff so transient errors don't spiral into queue bloat.

Once you have these pieces in place, silent failures become loud ones β€” and loud failures are problems you can actually fix.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.