Fixing JWT Token Expiry Errors That Only Appear Under Load in FastAPI
Your FastAPI app passes every local test, your staging environment looks clean, and then production starts throwing 401 Unauthorized errors the moment traffic picks up. The logs point to expired JWTs β but the tokens are fresh. You've seen this pattern before and it's maddening.
These errors almost always come from one of a small set of root causes: clock skew between services, blocking code stalling the event loop during token validation, or a race condition in your token refresh logic. The good news is that each one is diagnosable and fixable.
What you'll learn
- Why JWT expiry errors appear only under load, not in isolation
- How to identify clock skew between your app servers and your auth service
- How blocking validation code silently delays async requests
- How to implement a safe, race-condition-free token refresh flow
- Practical hardening steps to prevent regressions
Prerequisites
This guide assumes you're running FastAPI with an async setup (Python 3.9+), using python-jose or PyJWT for token decoding, and have some form of load testing tool available β locust or k6 both work fine.
Why Load Exposes What Local Testing Hides
Single-request testing is sequential and fast. When you hit your endpoint once, the token decodes in microseconds and there's no contention anywhere. Under load, three things change simultaneously: your event loop gets stretched, multiple workers share resources, and small timing gaps compound into real failures.
JWT expiry is time-sensitive by design. A token valid at 14:00:00.000 is invalid at 14:00:00.001 after its exp claim. Any delay between the client sending a request and your server reaching the decode call eats into that margin. Under load, that delay can be hundreds of milliseconds.
Root Cause 1: Clock Skew Between Services
If your FastAPI app and your identity provider (Auth0, Keycloak, a home-built auth service) run on different machines, their clocks can drift. Even a few seconds of drift is enough to cause intermittent expiry errors that look completely random.
Check the skew directly. On your app server, run:
ntpdate -q pool.ntp.orgYou'll see an offset value. Anything beyond Β±1 second is worth fixing. On most Linux systems, enabling chrony or systemd-timesyncd keeps drift under 50ms reliably.
On the code side, add a leeway parameter when you decode tokens. Both major Python JWT libraries support this:
# Using PyJWT
import jwt
def decode_token(token: str) -> dict:
return jwt.decode(
token,
key=SECRET_KEY,
algorithms=["HS256"],
leeway=10 # seconds of tolerance
)
A leeway of 10 seconds is enough to cover normal NTP drift without meaningfully weakening your token security. Don't go higher than 30 seconds β at that point you're masking a clock sync problem rather than tolerating a small one.
Root Cause 2: Blocking Code in the Async Path
FastAPI runs on an async event loop. If your token validation does anything synchronous and slow β a database lookup, a network call, even an expensive regex β it blocks every other coroutine waiting on that loop. Under load, requests queue up, and by the time a token actually gets decoded, it may have expired in transit.
The pattern that bites people most often looks like this:
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
import jwt
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
def get_current_user(token: str = Depends(oauth2_scheme)):
# This is a synchronous function β FastAPI runs it in a threadpool.
# But if you accidentally do async I/O inside here, you'll get subtle bugs.
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
return payload
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Token expired")
Pure in-memory decoding like the example above is actually fine as a sync function β jwt.decode is CPU-bound and FastAPI will run it in a thread pool automatically. The problem comes when developers add a database call to check a token revocation list inside that same sync function, or worse, call an external HTTP endpoint synchronously:
import requests # blocking! never do this inside a dependency
def get_current_user(token: str = Depends(oauth2_scheme)):
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
# This blocks the thread entirely, starving the event loop under load
revoked = requests.get(f"http://auth-service/revoked/{payload['jti']}")
if revoked.json()["revoked"]:
raise HTTPException(status_code=401, detail="Token revoked")
return payload
Replace any blocking I/O with its async equivalent. Use httpx.AsyncClient for HTTP and an async database driver for DB calls:
import httpx
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
import jwt
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
async def get_current_user(token: str = Depends(oauth2_scheme)):
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"], leeway=10)
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Token expired")
async with httpx.AsyncClient() as client:
resp = await client.get(f"http://auth-service/revoked/{payload['jti']}")
if resp.json().get("revoked"):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Token revoked")
return payload
If you need the revocation check on every request, cache the result in Redis with a short TTL rather than hitting the auth service every time. That single change typically eliminates most of the latency that causes race-condition expiry failures.
Root Cause 3: Race Conditions in Token Refresh
Token refresh is where things get genuinely tricky under load. Imagine ten concurrent requests arriving with a token that expires in two seconds. Each one independently detects the token is about to expire and fires a refresh request. Nine of those requests get back a new token, but now you have ten different tokens floating around your client state, and half of them will conflict.
On the server side, if you're validating short-lived tokens and your refresh endpoint is slow, requests that arrive during the refresh window will catch an expired token before the new one is distributed to the client.
The standard fix is a refresh mutex on the client side combined with a small leeway buffer and an overlap window:
import asyncio
import time
import jwt
import httpx
_refresh_lock = asyncio.Lock()
_current_token: dict = {"access_token": "", "expires_at": 0}
async def get_valid_token() -> str:
global _current_token
# Fast path: token is still valid
if time.time() < _current_token["expires_at"] - 30: # 30s buffer
return _current_token["access_token"]
# Slow path: acquire the lock so only one coroutine refreshes
async with _refresh_lock:
# Double-check after acquiring β another coroutine may have refreshed already
if time.time() < _current_token["expires_at"] - 30:
return _current_token["access_token"]
async with httpx.AsyncClient() as client:
resp = await client.post("/token/refresh", json={"refresh_token": REFRESH_TOKEN})
data = resp.json()
_current_token["access_token"] = data["access_token"]
_current_token["expires_at"] = time.time() + data["expires_in"]
return _current_token["access_token"]
The double-check pattern inside the lock is critical. Without it, every coroutine that was waiting on the lock will also attempt a refresh once the first one releases it. The second check costs almost nothing and prevents a stampede of refresh calls.
Root Cause 4: Worker Count and Token Timing
If you deploy FastAPI with multiple Uvicorn workers (or behind Gunicorn), each worker process has its own memory and potentially its own view of the system clock. Token caches, in-memory revocation lists, and refresh state are not shared between workers automatically.
A token refreshed by worker 1 is not automatically visible to worker 2. If your client sends the new token to a request handled by worker 2 before that worker has seen it, and if you're using any server-side token tracking, you'll get spurious rejections.
The fix is to move shared state out of worker memory entirely. Use Redis or a fast external cache for anything that needs to be consistent across workers:
import redis.asyncio as aioredis
import jwt
redis_client = aioredis.from_url("redis://localhost")
async def is_token_revoked(jti: str) -> bool:
result = await redis_client.get(f"revoked:{jti}")
return result is not None
async def revoke_token(jti: str, ttl: int) -> None:
await redis_client.setex(f"revoked:{jti}", ttl, "1")
Set the TTL on the revocation entry to match the token's remaining lifetime. There's no point keeping a revocation record for a token that's already expired.
Diagnosing the Problem With Load Tests
Before you fix anything, confirm which root cause you actually have. Run a load test that mimics your production traffic pattern and watch for expiry errors specifically.
With locust, a simple test looks like this:
from locust import HttpUser, task, between
import time
class APIUser(HttpUser):
wait_time = between(0.1, 0.5)
token = None
token_expiry = 0
def on_start(self):
resp = self.client.post("/token", json={"username": "test", "password": "test"})
data = resp.json()
self.token = data["access_token"]
self.token_expiry = time.time() + data["expires_in"]
@task
def call_protected_endpoint(self):
headers = {"Authorization": f"Bearer {self.token}"}
with self.client.get("/protected", headers=headers, catch_response=True) as resp:
if resp.status_code == 401:
resp.failure(f"401 at token age {time.time() - (self.token_expiry - 300):.1f}s")
Run this at progressively higher concurrency levels. The failure message includes how far into the token's lifetime the error occurred β if failures cluster near the expiry boundary, it's clock skew or event loop lag. If they appear randomly throughout the token's lifetime, look at your revocation check or refresh logic.
Common Pitfalls
Using datetime.utcnow() to generate exp claims. In Python 3.12+, utcnow() is deprecated. Use datetime.now(timezone.utc) instead. Mixing naive and timezone-aware datetimes is a silent source of off-by-one errors in expiry comparisons.
Setting token lifetime too short for your infrastructure. A 60-second access token is fine if your infrastructure is tight and fast. If you have high-latency network hops between client and server, a 60-second token has almost no safe window. Use 5β15 minutes for access tokens and rely on short-lived refresh tokens for revocation control.
Ignoring the nbf claim. The
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!