Why do Celery tasks disappear without any error when a worker crashes?

By default, Celery acknowledges messages as soon as a worker picks them up, before the task finishes. If the worker crashes mid-execution, the broker considers the message delivered and discards it. Setting acks_late=True and reject_on_worker_lost=True tells the broker to wait until the task completes before acknowledging, so a crash causes the message to be requeued instead of lost.

What visibility timeout should I set for Celery with a Redis broker?

Your visibility timeout must be longer than the maximum time any task in your queue can run, including retries. A common safe default is 3600 seconds, but if your tasks can run for more than an hour, increase it accordingly. Set it in broker_transport_options as shown in Celery's documentation rather than leaving it at the system default.

How do I stop Celery Beat from flooding the queue with stale tasks after downtime?

Add an expires option to each Beat schedule entry, set to a value slightly less than the schedule interval. This tells Celery to discard a task if it has not started by the time the next instance is due, preventing a backlog of stale executions from all firing at once after a worker outage.

Can ChatGPT reliably generate production-ready Celery configurations?

It can, but you have to be explicit about your operational requirements in the prompt. ChatGPT defaults to tutorial-style code that works on the happy path. By specifying retry policy, acks_late, visibility timeout, and idempotency requirements upfront, and asking it to flag infrastructure-dependent values, you get substantially more accurate output.

How do I make a Celery task idempotent when it involves sending an external request?

Track execution state in a persistent store before performing the external action. A simple approach is a database log table with a unique constraint on task type and entity ID — check for an existing record at the start of the task and skip the work if one is found. This guards against duplicate execution caused by retries or broker redelivery.

ChatGPT Celery Task Configs Without Silent Failures

You ask ChatGPT to set up a Celery task for sending emails or syncing data, it hands you something that looks correct, you wire it up, and tasks just… disappear. No exception, no log entry, nothing in the dead-letter queue. Silent failures in Celery are uniquely nasty because the default configuration is tuned for convenience, not production resilience.

ChatGPT can produce working Celery boilerplate quickly, but it consistently misses the settings that matter most: acks_late, visibility timeouts, retry backoff, and idempotency guards. This article shows you exactly how to prompt it so you get configs that survive a crashed worker or a slow broker.

What You'll Learn

Why default ChatGPT Celery output creates silent failure conditions
A prompt structure that forces the model to address retry, acknowledgment, and idempotency together
How to configure acks_late, visibility timeout, and max retries correctly
How to get accurate Beat schedules without timezone drift
The gotchas ChatGPT routinely skips, and how to catch them in review

Prerequisites

You should have a working Python project with Celery installed (version 5.x) and a basic understanding of how task queues work. The examples use Redis as the broker, but the concepts apply to RabbitMQ and Amazon SQS as well. You'll be iterating with ChatGPT in a chat interface, so have a conversation window open alongside your editor.

Why Celery Configs Go Wrong With AI Assistance

ChatGPT's training data contains a huge amount of Celery tutorials, and most of them are aimed at getting something running quickly. That means the model defaults to the happy path: a task fires, it succeeds, done. It rarely accounts for what happens when a worker crashes mid-execution or when a task is retried after an unacknowledged message sits on the queue past its visibility window.

The three failure modes that appear most often in AI-generated Celery configs are:

Tasks acknowledged before execution completes — the broker marks the message as delivered the moment the worker picks it up, so a crash loses the job entirely.
Unbounded or missing retry logic — either no retry at all, or retries with no backoff that hammer a failing downstream service.
Visibility timeout shorter than the task runtime — the broker requeues the message while the worker is still processing it, causing duplicate execution.

These are not edge cases. They surface under normal load the first time a worker process gets killed by a deployment or an OOM event. The good news is that once you know what to ask for, ChatGPT can generate correct configs consistently.

How ChatGPT Typically Approaches Celery Task Config

If you send a plain prompt like "Write a Celery task to send a welcome email", you'll usually get something like this:

from celery import shared_task

@shared_task
def send_welcome_email(user_id):
    user = User.objects.get(id=user_id)
    send_email(user.email, subject="Welcome!", body="...")

That decorator has no retry policy, no acknowledgment setting, no timeout. It will silently drop tasks if the worker dies. The model isn't wrong to produce this — it's answering the surface-level question. Your job is to ask the deeper one.

The Core Prompt Pattern That Changes Everything

The key insight is to front-load your constraints. Rather than describing what the task should do and hoping the model adds safety settings, you specify the operational requirements explicitly and ask the model to justify each setting it chooses.

Here's a prompt template that consistently produces production-grade output:

I need a Celery task configuration in Python for the following job:
[describe the task]

Requirements:
- Broker: Redis (or RabbitMQ / SQS — specify yours)
- The task must survive a worker crash without message loss
- Retry on failure with exponential backoff, max 5 attempts
- Execution must be idempotent (safe to run more than once)
- Include acks_late and reject_on_worker_lost settings
- Set an explicit task_soft_time_limit and task_time_limit
- Add a brief comment next to each config setting explaining why it exists

After the task code, provide the relevant Celery app settings I need to add.
Flag any setting where the correct value depends on my infrastructure (e.g., visibility timeout).

The "flag any setting where the value depends on infrastructure" instruction is particularly effective. It stops the model from hard-coding numbers that look authoritative but are actually wrong for your setup.

This approach mirrors the prompting discipline discussed in the guide on getting ChatGPT to write accurate gRPC service definitions — giving the model explicit correctness criteria rather than relying on its defaults.

Configuring Retry Logic Correctly

A retry decorator without backoff is almost as dangerous as no retry at all. If a downstream API is down and you retry immediately, you're just adding load to an already-stressed system. Use exponential backoff with jitter.

from celery import shared_task
from celery.utils.log import get_task_logger
import random

logger = get_task_logger(__name__)

@shared_task(
    bind=True,
    acks_late=True,                  # Acknowledge only after successful execution
    reject_on_worker_lost=True,      # Requeue if the worker dies mid-task
    max_retries=5,
    soft_time_limit=25,              # Raises SoftTimeLimitExceeded — lets you clean up
    time_limit=30,                   # Hard kill after 30s
    default_retry_delay=5,
)
def send_welcome_email(self, user_id: int) -> None:
    try:
        user = User.objects.get(id=user_id)
        send_email(user.email)
    except SoftTimeLimitExceeded:
        logger.warning("Task timed out for user_id=%s", user_id)
        raise
    except Exception as exc:
        # Exponential backoff with jitter
        delay = (2 ** self.request.retries) + random.uniform(0, 1)
        logger.error("Retrying send_welcome_email in %.1fs: %s", delay, exc)
        raise self.retry(exc=exc, countdown=delay)

Notice bind=True — this gives you access to self, which exposes self.request.retries for calculating the backoff. Without bind=True, you can't inspect the current retry count inside the task, and ChatGPT sometimes omits it when generating retry logic.

Making Tasks Idempotent

With acks_late=True and retries enabled, a task can run more than once. That's fine only if the task is idempotent. For a "send email" task, the usual approach is a database flag:

@shared_task(bind=True, acks_late=True, reject_on_worker_lost=True, max_retries=5)
def send_welcome_email(self, user_id: int) -> None:
    sent = EmailLog.objects.filter(user_id=user_id, type="welcome").exists()
    if sent:
        logger.info("Welcome email already sent for user_id=%s, skipping.", user_id)
        return

    user = User.objects.get(id=user_id)
    send_email(user.email)
    EmailLog.objects.create(user_id=user_id, type="welcome")

Ask ChatGPT explicitly: "Make this task idempotent using a database check, and explain the race condition risk." A good response will note that filter().exists() followed by create() is not atomic, and suggest using get_or_create or a database-level unique constraint to close the gap.

Getting acks_late and Visibility Timeout Right

The interaction between acks_late and your broker's visibility timeout is where most production failures hide. Here's the relationship: when you use Redis as a broker and acks_late=True, the message stays on the queue until the task finishes. But Redis has a visibility timeout (via the redis-py connection or Celery's broker_transport_options). If your task takes longer than that timeout, Redis will redeliver the message to another worker — causing duplicate execution even with acks_late.

# In your celery.py or settings file
app.conf.update(
    broker_url="redis://localhost:6379/0",
    result_backend="redis://localhost:6379/1",

    # Visibility timeout must be longer than your longest task.
    # Default is 3600s (1 hour). If tasks can run for up to 10 minutes,
    # set this to at least 1800s for headroom.
    broker_transport_options={
        "visibility_timeout": 3600,  # Adjust based on your max task duration
    },

    task_acks_late=True,
    task_reject_on_worker_lost=True,

    # Prevent tasks from running indefinitely
    task_soft_time_limit=60,
    task_time_limit=120,

    # Serialize safely
    task_serializer="json",
    result_serializer="json",
    accept_content=["json"],
)

When you prompt ChatGPT, include: "My tasks can run up to X minutes. Set the visibility timeout appropriately and explain the relationship between that setting and acks_late." This forces the model to reason about the interaction rather than copy a tutorial value.

This kind of infrastructure-aware prompting is similar to what's needed when getting ChatGPT to write accurate async code without race condition blind spots — the model needs to reason about timing, not just syntax.

Handling Beat Schedules Without Drift

Celery Beat generates periodic tasks. ChatGPT usually gets the schedule syntax right, but it consistently forgets two things: setting the timezone explicitly, and using USE_TZ=True in Django projects to prevent DST-related drift.

from celery.schedules import crontab

app.conf.beat_schedule = {
    "sync-inventory-every-hour": {
        "task": "myapp.tasks.sync_inventory",
        "schedule": crontab(minute=0),  # Top of every hour
        "options": {
            "queue": "periodic",
            "expires": 3500,  # Drop the task if it hasn't started before the next cycle
        },
    },
}

# Always set this — without it, Beat uses local system time
app.conf.timezone = "UTC"
app.conf.enable_utc = True

The expires option in the task options is something ChatGPT almost never includes unprompted. If a Beat task is delayed (the worker was down) and then the backlog catches up, without expires you can get a flood of stale executions all firing at once. Add "Include an expiry so stale periodic tasks are dropped rather than queued" to your prompt.

Prompting for Beat Schedule Validation

Ask ChatGPT to generate a small test that asserts the schedule is registered correctly:

from django.test import TestCase
from myproject.celery import app

class BeatScheduleTest(TestCase):
    def test_inventory_sync_is_scheduled(self):
        schedules = app.conf.beat_schedule
        self.assertIn("sync-inventory-every-hour", schedules)
        task = schedules["sync-inventory-every-hour"]
        self.assertEqual(task["task"], "myapp.tasks.sync_inventory")
        self.assertIn("expires", task.get("options", {}))

This won't catch runtime failures, but it catches typos in task paths and missing configuration keys before you hit production.

Common Pitfalls ChatGPT Misses

Even with a well-structured prompt, a few failure patterns show up repeatedly in AI-generated Celery code. Audit every output for these before merging:

Missing bind=True on retry tasks. Without it, self.retry() will raise a TypeError. If the generated code uses self anywhere, bind=True must be in the decorator.
Hardcoded queue names without a corresponding worker routing config. If the task specifies queue="emails" but no worker is consuming that queue, tasks pile up silently.
Using mutable default arguments in task signatures. def process(items=[]) is a Python footgun that also causes Celery serialization issues. Always use None as the default and initialize inside the function.
Chord and group callbacks without error handling. ChatGPT-generated chord() calls usually don't show what happens if one of the group tasks fails. The callback fires regardless by default.
No dead-letter queue configuration. After max retries are exhausted, tasks are dropped. Ask explicitly: "Where do messages go after max retries? Add configuration to route them to a dead-letter queue."

The pattern of silent failures here mirrors what you'll encounter in other infrastructure configs — the guide on getting ChatGPT to write accurate logging middleware without swallowing errors covers a similar class of problem where the error disappears rather than surfaces.

A Quick Review Checklist

Before committing any ChatGPT-generated Celery config, run through this list:

Does every retrying task have bind=True and acks_late=True?
Is the visibility timeout longer than task_time_limit?
Is the task idempotent, or is it safe to run exactly once (and why)?
Are all queues referenced in task options also configured in worker routing?
Does the Beat schedule set timezone and use expires on long-interval tasks?
Is there a dead-letter strategy after retries are exhausted?

Paste this checklist directly into your next ChatGPT prompt as a section titled "Before giving me the code, verify your output against this checklist and note any gaps." The model will often catch its own omissions when given explicit criteria to check against. This technique also works well when prompting ChatGPT for accurate webhook handlers, where missing edge cases are similarly invisible until production.

Wrapping Up: Next Steps

ChatGPT can write solid Celery configurations, but only if you give it the operational context it needs to make the right tradeoffs. Here are four concrete actions to take from here:

Run your existing Celery tasks through the review checklist above. Look specifically for tasks that retry without bind=True and periodic tasks missing expires.
Add acks_late=True and reject_on_worker_lost=True to any task that cannot afford to be silently dropped. Confirm your visibility timeout is set higher than your longest expected task runtime.
Prompt ChatGPT with explicit constraints every time. Use the template from the "Core Prompt Pattern" section and paste in the review checklist as a verification step.
Introduce a dead-letter queue. Whether you're on Redis, RabbitMQ, or SQS, tasks that exhaust retries should land somewhere observable, not disappear.
Write at least one integration test per task that exercises the retry path by mocking the downstream service to fail. Silent failures are only silent if you're not watching the right signals.

Getting ChatGPT to Write Accurate Celery Task Configs Without Silent Failures

What You'll Learn

Prerequisites

Why Celery Configs Go Wrong With AI Assistance

How ChatGPT Typically Approaches Celery Task Config

The Core Prompt Pattern That Changes Everything

Configuring Retry Logic Correctly

Making Tasks Idempotent

Getting acks_late and Visibility Timeout Right

Handling Beat Schedules Without Drift

Prompting for Beat Schedule Validation

Common Pitfalls ChatGPT Misses

A Quick Review Checklist

Wrapping Up: Next Steps

Frequently Asked Questions

Related Articles

Why Your Scikit-learn Pipeline Silently Transforms Your Target Variable

Getting ChatGPT to Write Accurate gRPC Service Definitions Without Type Mismatches

Getting ChatGPT to Write Accurate OAuth 2.0 Flows Without Token Leaks

Comments (0)

Leave a Comment

Getting ChatGPT to Write Accurate Celery Task Configs Without Silent Failures

What You'll Learn

Prerequisites

Why Celery Configs Go Wrong With AI Assistance

How ChatGPT Typically Approaches Celery Task Config

The Core Prompt Pattern That Changes Everything

Configuring Retry Logic Correctly

Making Tasks Idempotent

Getting acks_late and Visibility Timeout Right

Handling Beat Schedules Without Drift

Prompting for Beat Schedule Validation

Common Pitfalls ChatGPT Misses

A Quick Review Checklist

Wrapping Up: Next Steps

Frequently Asked Questions

Related Articles

Why Your Scikit-learn Pipeline Silently Transforms Your Target Variable

Getting ChatGPT to Write Accurate gRPC Service Definitions Without Type Mismatches

Getting ChatGPT to Write Accurate OAuth 2.0 Flows Without Token Leaks

Comments (0)

Leave a Comment

Stay ahead of the curve