AI Prompt Engineering

Getting ChatGPT to Write Accurate Feature Flag Logic Without Stale State Bugs

July 01, 2026 10 min read 2 views

You ask ChatGPT to wire up feature flag logic for your service, and it produces something that looks reasonable at a glance. Then you deploy it and discover the flag is being read once at startup, cached forever, and your "gradual rollout" is actually an all-or-nothing switch that ignores every change you make in the dashboard. Stale state bugs in feature flag code are subtle, runtime-silent, and exactly the kind of thing a language model trained on static examples will miss.

The fix is not to stop using ChatGPT for this work. It is to give it enough context that it cannot produce the wrong pattern.

What You'll Learn

  • Why ChatGPT's default feature flag output almost always introduces stale state
  • Four prompt patterns that force accurate, production-ready flag evaluation
  • How to ask for runtime refresh logic without a service restart
  • How to make ChatGPT emit stale-read detection so bugs surface in logs, not in incidents
  • The most common gotchas in AI-generated flag logic and how to spot them in review

Prerequisites

You should be comfortable reading Python (the examples below use Python, but the prompt patterns translate to any language). You do not need to be using a specific feature flag platform β€” the concepts apply whether you are reading flags from LaunchDarkly, a homegrown Redis store, a database table, or a YAML file baked into your repo.

How ChatGPT Thinks About Feature Flags (and Where It Goes Wrong)

ChatGPT has seen a lot of feature flag code in its training data. Most of that code is tutorial-grade: a dictionary of booleans, a simple is_enabled(flag_name) helper, and a check in the business logic. That pattern is fine for a blog post. It is dangerous in production.

The three failure modes that appear most often in AI-generated flag code are:

  • Read-once initialization. The flags are loaded at process startup and stored in a module-level variable. Changes to the flag store require a service restart to take effect.
  • No TTL on the in-process cache. The model adds a cache to avoid hammering the flag store on every request, but forgets to set an expiry. The cache grows stale and never refreshes.
  • Context-free evaluation. The flag check receives no user, tenant, or request context, so percentage rollouts and targeted segments are impossible to implement correctly later.

These are not hypothetical. If you paste a bare prompt like "write feature flag logic for my Flask app" into ChatGPT right now, you will almost certainly get at least one of the three. Understanding why helps you write prompts that rule them out. This is the same class of problem that affects other stateful backend patterns β€” see how similar issues appear in AI-generated Redis caching logic and cache stampedes for a useful comparison.

Prompt Pattern 1: Require an Explicit Data Source and TTL

The first thing to nail down is where the flags come from and how long a cached value is allowed to live. If you do not specify this, ChatGPT will pick whatever the simplest option is.

A prompt that works:

Write a Python feature flag client that reads flag values from Redis. The client must cache each flag's value in memory with a TTL of 30 seconds. After the TTL expires, the next call must re-fetch from Redis. Use time.monotonic() for the TTL check, not wall-clock time. Include type hints. Do not read flags at module import time.

That prompt removes every ambiguity that leads to a stale-forever cache. The output you should expect:

import time
import redis
from typing import Any

class FeatureFlagClient:
    def __init__(self, redis_client: redis.Redis, ttl_seconds: float = 30.0) -> None:
        self._redis = redis_client
        self._ttl = ttl_seconds
        self._cache: dict[str, tuple[Any, float]] = {}  # flag -> (value, fetched_at)

    def is_enabled(self, flag_name: str) -> bool:
        value, fetched_at = self._cache.get(flag_name, (None, 0.0))
        if value is None or (time.monotonic() - fetched_at) > self._ttl:
            raw = self._redis.get(flag_name)
            value = raw is not None and raw.decode() == "1"
            self._cache[flag_name] = (value, time.monotonic())
        return bool(value)

Notice the guard: value is None or (time.monotonic() - fetched_at) > self._ttl. That is the line ChatGPT skips when you leave TTL unspecified. Always verify it is present in whatever the model generates.

Prompt Pattern 2: Force Context-Aware Evaluation

A feature flag that cannot accept a user or tenant identifier is not really a feature flag β€” it is a global switch. Percentage rollouts, beta cohorts, and kill-switches scoped to a single customer all require context at evaluation time.

Extend your prompt with an explicit signature requirement:

The is_enabled method must accept a context parameter of type dict[str, str] that carries keys like user_id, tenant_id, and region. For now, implement simple percentage rollout using user_id: hash the flag name and user_id together and return True if the result modulo 100 is less than the rollout percentage stored in Redis alongside the boolean. Show the Redis data structure you expect.

This forces the model to think about the data contract up front. A representative output:

import hashlib
import json
import time
import redis

class FeatureFlagClient:
    """Redis schema per flag: {"enabled": true, "rollout_pct": 20}"""

    def __init__(self, redis_client: redis.Redis, ttl_seconds: float = 30.0) -> None:
        self._redis = redis_client
        self._ttl = ttl_seconds
        self._cache: dict[str, tuple[dict, float]] = {}

    def _fetch(self, flag_name: str) -> dict:
        cached_value, fetched_at = self._cache.get(flag_name, ({}, 0.0))
        if not cached_value or (time.monotonic() - fetched_at) > self._ttl:
            raw = self._redis.get(flag_name)
            cached_value = json.loads(raw) if raw else {"enabled": False, "rollout_pct": 0}
            self._cache[flag_name] = (cached_value, time.monotonic())
        return cached_value

    def is_enabled(self, flag_name: str, context: dict[str, str] | None = None) -> bool:
        config = self._fetch(flag_name)
        if not config.get("enabled"):
            return False
        rollout_pct = config.get("rollout_pct", 100)
        if rollout_pct >= 100:
            return True
        user_id = (context or {}).get("user_id", "")
        bucket = int(hashlib.md5(f"{flag_name}:{user_id}".encode()).hexdigest(), 16) % 100
        return bucket < rollout_pct

The hash-based bucketing is deterministic: the same user always lands in the same bucket, so their experience is consistent across requests. If you omit the context requirement from the prompt, ChatGPT will almost certainly skip that and give you a random roll on every call β€” which is not a rollout, it is noise.

Prompt Pattern 3: Demand Runtime Refresh Without Restart

TTL-based cache expiry handles most cases. But some flag changes are urgent: a kill-switch for a broken feature needs to take effect in seconds, not in 30-second cache windows. A background refresh thread or a pub/sub invalidation mechanism covers this gap.

Ask for it explicitly:

Add a background thread that subscribes to a Redis pub/sub channel named feature_flag_updates. When a message arrives with a flag name as its payload, evict that flag from the in-memory cache immediately so the next call forces a fresh read. The thread must be a daemon thread and must not block the main application from starting.

The key output to look for:

import threading

class FeatureFlagClient:
    def __init__(self, redis_client: redis.Redis, ttl_seconds: float = 30.0) -> None:
        self._redis = redis_client
        self._ttl = ttl_seconds
        self._cache: dict[str, tuple[dict, float]] = {}
        self._lock = threading.Lock()
        self._start_invalidation_listener()

    def _start_invalidation_listener(self) -> None:
        def listen() -> None:
            pubsub = self._redis.pubsub()
            pubsub.subscribe("feature_flag_updates")
            for message in pubsub.listen():
                if message["type"] == "message":
                    flag_name = message["data"].decode()
                    with self._lock:
                        self._cache.pop(flag_name, None)

        thread = threading.Thread(target=listen, daemon=True)
        thread.start()

    def _fetch(self, flag_name: str) -> dict:
        with self._lock:
            cached_value, fetched_at = self._cache.get(flag_name, ({}, 0.0))
        if not cached_value or (time.monotonic() - fetched_at) > self._ttl:
            raw = self._redis.get(flag_name)
            result = json.loads(raw) if raw else {"enabled": False, "rollout_pct": 0}
            with self._lock:
                self._cache[flag_name] = (result, time.monotonic())
            return result
        return cached_value

Two things to verify here: the thread is marked daemon=True (so it does not prevent process shutdown), and all cache reads and writes are protected by a lock. ChatGPT sometimes forgets the lock when a background thread is involved β€” this is the same category of race condition covered in prompting ChatGPT for background job schedulers without race conditions.

Prompt Pattern 4: Ask for Stale-Read Detection

Even with TTL and pub/sub invalidation in place, there will be moments when a flag value is stale β€” during the window between a Redis write and the invalidation message arriving. You want that window to be observable in your logs rather than invisible in production.

Add this to your prompt:

When the client serves a cached flag value, log a structured warning if the cache age exceeds 25 seconds (five seconds before the 30-second TTL). Include the flag name, cache age in seconds, and the current value in the log line. Use Python's standard logging module and emit JSON-compatible structured output.

The output should contain something like:

import logging

logger = logging.getLogger(__name__)

STALE_WARN_THRESHOLD = 25.0  # seconds

def _fetch(self, flag_name: str) -> dict:
    with self._lock:
        cached_value, fetched_at = self._cache.get(flag_name, ({}, 0.0))

    age = time.monotonic() - fetched_at if fetched_at else float("inf")

    if cached_value and age > STALE_WARN_THRESHOLD:
        logger.warning(
            "Feature flag cache nearing expiry",
            extra={"flag": flag_name, "cache_age_s": round(age, 2), "value": cached_value},
        )

    if not cached_value or age > self._ttl:
        raw = self._redis.get(flag_name)
        result = json.loads(raw) if raw else {"enabled": False, "rollout_pct": 0}
        with self._lock:
            self._cache[flag_name] = (result, time.monotonic())
        return result

    return cached_value

This turns an invisible correctness window into a visible operational signal. You can alert on elevated rates of this log line as a leading indicator that your Redis connection is slow or your TTL is misconfigured.

Common Pitfalls in AI-Generated Flag Logic

Even with good prompts, ChatGPT output needs a systematic review pass. These are the issues that slip through most often.

Default-to-True on Missing Flags

Some generated code treats a missing flag as enabled. The safe default is always disabled. Check what is_enabled returns when the flag does not exist in Redis β€” it must be False, not True, and not an exception.

Non-Atomic Cache Population

If two concurrent requests both find a cache miss at the same time, both will hit Redis, both will write to the cache, and one write will silently overwrite the other. In Python with the GIL this is usually harmless, but in async frameworks or multi-process deployments it matters. Prompt for a threading.Lock or an async asyncio.Lock around the check-then-set sequence, then verify it wraps both the read check and the write.

Float Comparison for Rollout Percentage

Rollout percentages are sometimes stored as floats in Redis ("rollout_pct": 0.2 meaning 20%). Dividing by 100 and comparing with < against a modulo result is fine, but mixing int and float representations creates off-by-one errors. Standardize on integers representing percentage points (0–100) in your prompt.

Flag Name Case Sensitivity

ChatGPT will use whatever case you show it in the prompt. If your flag store uses snake_case keys but your call sites use camelCase, lookups silently return the disabled default. Ask ChatGPT to normalize flag names to lowercase in the _fetch method as a defensive measure.

No Fallback When Redis Is Down

The model usually wraps the Redis call in nothing at all. If Redis is unavailable, an exception propagates up and takes the request with it. Ask explicitly: "If the Redis call raises any exception, return the last cached value if one exists, otherwise return the safe default and log the error." This is a circuit-breaker concern closely related to what you would see in prompting ChatGPT for retry logic without infinite loop traps.

Ignoring Concurrent Flag Updates in Tests

Generated unit tests almost never test what happens when a flag changes between two calls in the same request lifetime. Add a test that manually evicts the cache between two is_enabled calls and asserts the second call picks up the new value. If you are using ChatGPT to write the tests too, specify this case explicitly β€” the model will not infer it. For related guidance on keeping AI-generated tests reliable, the techniques in avoiding flaky selectors in ChatGPT-generated Playwright scripts apply the same verification mindset.

Wrapping Up: Next Steps

ChatGPT is genuinely useful for scaffolding feature flag logic, but it needs guard rails built into your prompts. Here are the concrete actions to take:

  1. Add TTL and data-source constraints to every flag prompt. Never let the model choose the persistence mechanism or expiry window on its own.
  2. Require a context parameter from the start. Retrofitting user-scoped evaluation onto a global boolean is far harder than designing for it upfront.
  3. Ask for a background invalidation path explicitly. TTL alone is not fast enough for kill-switch use cases.
  4. Review every generated client for the five pitfalls above β€” default-to-true, non-atomic writes, float vs. int percentage, case sensitivity, and missing Redis fallback.
  5. Write a concurrent-update test case that the model will not generate itself β€” it is the single best regression guard for stale state bugs.

Frequently Asked Questions

Why does ChatGPT-generated feature flag code often use stale values after a deploy?

ChatGPT typically reads flag values once at startup and stores them in a module-level variable, so changes to the flag store never propagate until the process restarts. You need to explicitly prompt for TTL-based in-process caching and a runtime refresh mechanism to avoid this.

How do I make a ChatGPT-generated feature flag client handle Redis outages gracefully?

Prompt ChatGPT to catch all exceptions from the Redis call and fall back to the last cached value if one exists, or return the safe disabled default if the cache is empty. Always log the exception so the outage is visible in your monitoring.

What is the safest default when a feature flag key is missing from the store?

The safe default is always disabled (False). Returning True for a missing flag can accidentally enable unreleased or broken features for all users, which is the opposite of what a feature flag is designed to prevent.

How do I get ChatGPT to generate percentage rollout logic that is deterministic per user?

Ask it to hash the combination of the flag name and the user ID together (using MD5 or SHA-256) and take the result modulo 100, then compare it to the rollout percentage. This ensures the same user always falls in the same bucket across all requests.

Should I use a background thread or a TTL cache for real-time feature flag updates?

Use both. A TTL cache gives you regular refresh without hammering your flag store, while a Redis pub/sub background thread lets you force immediate invalidation for urgent changes like kill-switches. Together they cover both the normal and the emergency update path.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.