Multi-Tenant Feature Flags Done Wrong: Isolation Bugs That Leak Settings
You ship a feature flag so your enterprise tier gets early access to a new analytics dashboard. Three days later, a free-tier customer emails your support team asking why the dashboard disappeared. Somehow they had it, and now they don't. Or worse: a tenant on the starter plan has quietly had your premium export feature enabled for two weeks.
Feature flag isolation bugs are among the sneakiest defects in multi-tenant SaaS. They don't throw exceptions. They don't show up in your error tracker. They leak silently, and they can damage customer trust, distort revenue data, and create compliance exposure.
What Can Go Wrong With Feature Flags in Multi-Tenant Systems
The core promise of feature flags in a SaaS context is simple: evaluate a flag for this tenant, right now, in this request. The problems start when "this tenant" slips. It happens in more places than most teams expect.
- Understanding why feature flag context gets confused at the request level
- How global state and singletons cause cross-tenant leakage
- Why caching is the most common source of leaked flag values
- How async workers drop tenant context silently
- Concrete patterns to fix each class of bug
Prerequisites
This article assumes you're running a multi-tenant SaaS application β either shared database or separate schemas β and already use some form of feature flags (home-grown, LaunchDarkly, Unleash, Flagsmith, or similar). Basic familiarity with request lifecycle middleware and background job queues is helpful.
How Feature Flag Context Gets Confused
Every feature flag evaluation is only as reliable as the context you pass into it. That context, at minimum, needs to contain a stable tenant identifier. The bug pattern almost always follows the same shape: context is assembled too early, stored globally, or inherited from a previous request.
Consider a typical flag evaluation call:
is_enabled = flag_client.variation("new-analytics-dashboard", context, default=False)
If context carries the wrong tenant ID β or no tenant ID at all β the evaluation silently returns the wrong result. No error, no warning. The feature either lights up for someone who shouldn't see it, or goes dark for someone who paid for it.
The Global State Trap
The most dangerous pattern is storing the flag evaluation context as a module-level or class-level singleton. It feels convenient during development β you initialize the context once when a user logs in and reuse it everywhere. In production with concurrent requests, this is a data race waiting to happen.
# BAD: module-level context shared across all requests
class FeatureFlagService:
_context = None # β this will be overwritten by concurrent requests
@classmethod
def set_context(cls, tenant_id: str):
cls._context = {"tenant_id": tenant_id}
@classmethod
def is_enabled(cls, flag_name: str) -> bool:
return flag_client.variation(flag_name, cls._context, default=False)
Under concurrent load, Tenant A's request calls set_context, then Tenant B's request immediately overwrites _context. Tenant A's flag evaluation now runs against Tenant B's context. This is a classic time-of-check to time-of-use race.
The fix is straightforward: never store tenant-bound context outside the lifecycle of a single request. Pass it explicitly, or use thread-local or async-local storage scoped to the request.
# GOOD: context passed explicitly per call
def is_feature_enabled(flag_name: str, tenant_id: str) -> bool:
context = build_context(tenant_id)
return flag_client.variation(flag_name, context, default=False)
If you need to avoid rebuilding the context object on every call, cache it in request-local storage (Python's contextvars.ContextVar, Java's ThreadLocal, Node's AsyncLocalStorage) β never in a shared class attribute.
Request-Scoped Context Done Right
The cleanest approach is to resolve tenant identity in middleware, attach it to the request object, and pass that down the call stack. This makes tenant identity an explicit dependency rather than an ambient global.
# Django middleware example
class TenantContextMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
tenant_id = resolve_tenant_from_request(request) # from subdomain, header, JWT, etc.
request.tenant_id = tenant_id
response = self.get_response(request)
return response
Then in your view or service layer, pull the tenant ID from the request object and pass it to every flag evaluation. Never let a flag evaluation function reach out to a global to figure out "which tenant is this for."
This also means your flag evaluation functions become trivially testable: you pass in a tenant ID string and assert the result. No mocking of globals required.
Caching: The Silent Cross-Tenant Leak
Caching flag evaluations is a reasonable optimization β flag SDKs often do this internally, and teams sometimes add their own layer on top to reduce SDK calls or latency. This is where tenant isolation bugs become particularly insidious, because a wrong cache key means one tenant gets another's flag state indefinitely.
Cache Key Design for Tenant Isolation
Every cached flag value must be keyed by both the flag name and the tenant identifier. If your cache key is just the flag name, you've effectively created a single global flag state shared across all tenants.
# BAD: flag name only β all tenants share the same cached value
cache_key = f"feature_flag:{flag_name}"
# GOOD: flag name + tenant ID β each tenant has its own cache entry
cache_key = f"feature_flag:{tenant_id}:{flag_name}"
This sounds obvious, but it breaks in subtle ways. Consider a caching wrapper added months after the initial flag implementation, written by someone who didn't think about multi-tenancy. Or a Redis cache that's shared across environments because someone reused a connection string. Both scenarios produce the same symptom: wrong flag values for the wrong tenants.
Also watch for TTL mismatches. If you cache flag evaluations for 60 seconds and a tenant's plan changes mid-session (say, they upgrade during checkout), they'll get stale flag values for up to a minute. For billing-sensitive flags, that window can matter. Consider using shorter TTLs or event-driven cache invalidation keyed to tenant plan change events.
This is the same category of revenue data integrity problem you'll find in billing systems β similar to the gotchas covered in Stripe metered billing edge cases that break revenue reports, where stale or mis-keyed data quietly corrupts downstream numbers.
Async Workers and Background Jobs: The Blind Spot
Request context doesn't survive the boundary between a web process and a background worker. When you enqueue a job from a request, the worker process has no knowledge of which tenant triggered it unless you explicitly pass that information in the job payload.
# BAD: tenant context not included in job payload
def enqueue_report_job(report_id: int):
task_queue.enqueue(generate_report, report_id)
def generate_report(report_id: int):
# Which tenant is this for? The worker has no idea.
if is_feature_enabled("advanced-report-format"): # β evaluates against what context?
...
# GOOD: tenant ID travels with the job
def enqueue_report_job(report_id: int, tenant_id: str):
task_queue.enqueue(generate_report, report_id, tenant_id)
def generate_report(report_id: int, tenant_id: str):
if is_feature_enabled("advanced-report-format", tenant_id=tenant_id):
...
This seems mechanical, but it's easy to miss when a job calls into a shared service that internally evaluates flags. If that service reads tenant context from a global or thread-local, it'll find nothing (or worse, the context from whatever request last ran on that thread).
Cron jobs are another blind spot. A scheduled job that processes all tenants must evaluate flags per tenant in each iteration, not once before the loop starts.
# BAD: flag evaluated once before the tenant loop
is_export_enabled = is_feature_enabled("bulk-export", tenant_id=None) # wrong
for tenant in get_all_tenants():
if is_export_enabled:
run_bulk_export(tenant)
# GOOD: flag evaluated per tenant
for tenant in get_all_tenants():
if is_feature_enabled("bulk-export", tenant_id=tenant.id):
run_bulk_export(tenant)
Database-Level Isolation vs. Application-Level Isolation
Some teams store flag overrides in the database β a table like tenant_feature_flags(tenant_id, flag_name, enabled). This is a solid approach for per-tenant flag configuration, but it introduces its own isolation risks.
The most common mistake is forgetting to filter by tenant_id in the query. If you fetch all rows for a flag name and cache the result, every tenant shares the same value. Or you fetch rows for the wrong tenant because the ORM query was constructed before the request context was resolved.
-- BAD: no tenant filter β returns flags for every tenant
SELECT enabled FROM tenant_feature_flags WHERE flag_name = 'new-analytics-dashboard';
-- GOOD: tenant-scoped query
SELECT enabled FROM tenant_feature_flags
WHERE flag_name = 'new-analytics-dashboard' AND tenant_id = '7a3f9c';
If you're using an ORM with a shared query scope or a row-level security policy, verify that multi-tenant filtering is enforced at the database level and not just application level. Application-level filtering can be bypassed by a missing .filter(tenant=...) call. Database-level row security (like PostgreSQL's RLS) catches that class of bug automatically β but you need to set it up intentionally.
Tenant data isolation at the database layer is related to broader SaaS audit concerns. If you've ever had to trace which tenant had which feature enabled and when, you'll understand why a proper per-tenant audit log matters β the kind of rigor that's also essential when auditing paid licenses for seat creep across your customer base.
Common Pitfalls and Edge Cases
SDK initialization shared across tenants. Some feature flag SDKs initialize with a single user/context at startup. In a multi-tenant app, you must either initialize per-request context through the SDK's evaluation API (most modern SDKs support this), or use a server-side SDK that accepts context at evaluation time, not at initialization time.
Feature flag state in frontend bundles. If you inline flag state into the initial HTML payload or a JavaScript config object, make sure you're generating that payload per-tenant, not caching a single rendered page across all tenants. A CDN edge cache keyed only by URL β without a Vary header or tenant-specific cache key β will serve one tenant's flag state to another.
Merging flag sources without a clear hierarchy. Many teams combine flags from multiple sources: an SDK, a database table, environment variables, and maybe a config file. When the same flag exists in multiple sources, which value wins? If the hierarchy isn't explicit and documented, you'll get inconsistent results across environments and difficult-to-reproduce bugs.
Flags that gate data access, not just UI. If a flag controls whether a tenant can call an API endpoint or access a data export, make sure the flag is evaluated server-side on every request, not just on the frontend. A user who manipulates client-side state or calls the API directly bypasses a frontend-only flag entirely. This is a security concern, not just a UX one.
This kind of access-control thinking connects directly to the risks described in debugging webhook failures in SaaS pipelines β when tenant context is missing or wrong, events and permissions go to the wrong place in ways that are hard to diagnose after the fact.
Testing in production with real tenants. It's tempting to test a new flag by enabling it for "all tenants" or by toggling the flag in the production environment against a test account. If your test account shares infrastructure with production tenants, enabling a flag for it may affect others depending on how your flag system resolves targeting rules.
One underappreciated tool here is integration tests that spin up multiple tenants in the same test run and assert that flag evaluations are completely independent. A test like this would have caught most of the bugs described in this article before they reached production. The same discipline of automated checks applied to shipping code is what makes turning a code review checklist into an enforceable ruleset so valuable β isolation rules can be linted, not just hoped for.
Next Steps
Feature flag isolation bugs are fixable, and the patterns are consistent once you know what to look for. Here's what to do next:
- Audit your flag evaluation calls. Search your codebase for every call to your flag evaluation function and verify each one explicitly passes a tenant ID. Any call that reads tenant ID from a global is a candidate for a bug.
- Review your cache keys. Check every place you cache flag values β in-process caches, Redis, CDN edge caches β and confirm the cache key includes a tenant identifier. Add a test that evaluates the same flag for two different tenants and asserts they can return different values.
- Audit your background jobs. List every job that evaluates a feature flag. Confirm tenant ID is in the job payload and is passed explicitly to the flag evaluation, not resolved from ambient context.
- Add multi-tenant integration tests. Write a test that creates two tenants, enables a flag for one, and asserts the other doesn't see it β across your web handlers, API endpoints, and background jobs.
- Document your flag source hierarchy. If you merge flag values from multiple sources (SDK, database, env vars), write down and enforce a clear precedence order. Make it part of your flag implementation guide so future contributors don't introduce ambiguity.
Frequently Asked Questions
How do feature flags leak between tenants in a multi-tenant SaaS app?
The most common causes are shared global state where the tenant context gets overwritten by concurrent requests, cache keys that don't include a tenant identifier, and background jobs that drop tenant context when crossing process boundaries. Each of these causes one tenant's flag state to be evaluated or served for a different tenant.
Should feature flags be evaluated server-side or client-side in a multi-tenant app?
Flags that control access to features, data, or API endpoints must be evaluated server-side on every request β client-side evaluation can be bypassed by a user who manipulates state or calls your API directly. UI-only flags can be evaluated client-side, but make sure the initial flag state payload is generated per-tenant and not cached globally at the CDN or application layer.
What's the safest cache key format for multi-tenant feature flag caching?
Include both the tenant identifier and the flag name in the cache key, such as feature_flag:{tenant_id}:{flag_name}. Never cache by flag name alone, as this produces a single shared value across all tenants. Also consider short TTLs or event-driven invalidation for flags tied to billing or plan changes.
How do I make sure background jobs evaluate feature flags for the right tenant?
Always include the tenant ID explicitly in the job payload when you enqueue it, and pass that ID directly into the flag evaluation call inside the job handler. Never rely on thread-local or global context inside a worker process, as that context is not reliably set by the time the job runs.
Can a CDN cache cause feature flag values to leak between tenants?
Yes, if your application renders flag state into the initial HTML or a JavaScript config object and that page is cached at the CDN edge without a tenant-specific cache key or a Vary header, one tenant's flag state can be served to another. Always ensure CDN cache keys incorporate tenant identity for any response that contains tenant-specific configuration.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!