Fixing AWS Secrets Manager Throttling That Breaks App Startup at Scale
Your deployment looks fine on staging with five containers. Then you push to production, auto-scaling kicks in, and forty pods all boot simultaneously β each one calling GetSecretValue the moment it starts. Within seconds, half of them crash with ThrottlingException and your alerting goes red. This is one of the most common scaling surprises teams hit with AWS Secrets Manager, and it's entirely fixable.
What you'll learn
- Why AWS Secrets Manager throttles at scale and what the actual limits are
- How to identify throttling as the root cause of failed startups
- In-process caching strategies to cut API call volume dramatically
- How to use AWS-provided tooling to handle this automatically
- Architectural patterns to prevent the problem from coming back
Prerequisites
This guide assumes you're running containerized workloads on AWS (ECS, EKS, or Lambda), using the AWS SDK in your application language, and have basic familiarity with IAM and CloudWatch. Code examples use Python with boto3, but the patterns translate directly to any supported SDK.
Why Throttling Happens at Boot Time
AWS Secrets Manager enforces a per-region API rate limit. By default, each AWS account gets a quota around 10,000 requests per second for GetSecretValue, but the important number is the burst quota, which is much lower. When you have a horizontal scale-out event β a traffic spike, a fresh deployment, or a scheduled job fleet starting β all your services request secrets at roughly the same instant.
The problem compounds when you have multiple secrets per service. A typical microservice might fetch a database password, an external API key, and a JWT signing secret separately. With 40 containers each making 3 calls, that's 120 API requests in under a second. For a busy account that already has background secret rotations and health checks running, this easily crosses the burst threshold.
AWS throttles by returning an HTTP 429 with the error code ThrottlingException. If your application doesn't handle this gracefully β with retry logic and backoff β the startup sequence fails entirely.
Confirming Throttling Is the Actual Problem
Before changing anything, confirm the diagnosis. Misidentifying the root cause wastes time and can introduce unnecessary complexity.
Check CloudWatch metrics
Navigate to CloudWatch and look at the AWS/SecretsManager namespace. The metric you want is CallCount filtered by ErrorCode = ThrottlingException. Set the time window to match your last deployment or scale-out event. A spike in throttling errors that correlates with your incident timestamp is strong evidence.
Check application logs
The AWS SDK surfaces throttling as an exception. In Python with boto3, it looks like this:
botocore.errorfactory.ThrottlingException: An error occurred (ThrottlingException) when calling the GetSecretValue operation: Rate exceeded
Search your log aggregator (CloudWatch Logs, Datadog, or wherever you ship logs) for ThrottlingException combined with GetSecretValue. If these appear clustered at startup time across multiple instances, you've confirmed the cause.
Check Service Quotas
In the AWS console, go to Service Quotas and search for Secrets Manager. You'll see your current limits and whether you've requested increases. If your workload has grown substantially, a quota increase request is one part of the solution β but it shouldn't be the only one.
Fix 1: Add Retry Logic with Exponential Backoff
This is the minimum viable fix. The AWS SDK has built-in retry behavior, but the defaults may not be tuned for burst throttling at startup. Configure it explicitly.
import boto3
from botocore.config import Config
# Increase retries and use standard (exponential) backoff mode
client = boto3.client(
'secretsmanager',
region_name='us-east-1',
config=Config(
retries={
'max_attempts': 10,
'mode': 'standard' # uses exponential backoff
}
)
)
response = client.get_secret_value(SecretId='prod/myapp/database')
Setting mode to 'standard' enables exponential backoff with jitter, which is important. If every container retries at the same intervals, you get synchronized retry storms that make throttling worse. The jitter breaks that pattern.
The 'adaptive' mode goes further β the client tracks throttling responses and pre-emptively slows its own request rate. For startup-heavy workloads, 'standard' is usually sufficient, but 'adaptive' is worth knowing about.
Fix 2: Cache Secrets In-Process
Retrying helps with occasional throttling, but the real fix is reducing the number of API calls in the first place. If your secrets don't change mid-run, there's no reason to call GetSecretValue more than once per process lifetime.
A simple module-level cache works well for most applications:
import boto3
import json
import time
from functools import lru_cache
from botocore.config import Config
_client = boto3.client(
'secretsmanager',
config=Config(retries={'max_attempts': 10, 'mode': 'standard'})
)
_cache = {}
_CACHE_TTL = 300 # seconds (5 minutes)
def get_secret(secret_id: str) -> dict:
now = time.monotonic()
if secret_id in _cache:
value, fetched_at = _cache[secret_id]
if now - fetched_at < _CACHE_TTL:
return value
response = _client.get_secret_value(SecretId=secret_id)
secret_string = response['SecretString']
value = json.loads(secret_string)
_cache[secret_id] = (value, now)
return value
This gives you one API call on first access, then cached responses for five minutes. Adjust _CACHE_TTL based on how quickly you need to pick up rotated secrets. For most rotation schedules (30-day or weekly), even an hour-long TTL is safe.
Fix 3: Use the AWS Secrets Manager Agent or Caching Client
AWS maintains an official caching client for several languages. For Python, the package is aws-secretsmanager-caching. It implements an LRU cache with configurable TTLs and handles refresh logic for you.
pip install aws-secretsmanager-caching
import botocore
import botocore.session
from aws_secretsmanager_caching import SecretCache, SecretCacheConfig
botocore_session = botocore.session.get_session()
client = botocore_session.create_client('secretsmanager', region_name='us-east-1')
cache_config = SecretCacheConfig(
max_cache_size=1000,
exception_retry_delay_base=1,
exception_retry_growth_factor=2,
exception_retry_delay_max=3600,
default_version_stage='AWSCURRENT',
secret_refresh_interval=3600, # refresh every hour
secret_version_stage_refresh_interval=3600
)
cache = SecretCache(config=cache_config, client=client)
# This call hits the API once, then serves from cache
secret = cache.get_secret_string('prod/myapp/database')
The caching library is the most robust option if you want battle-tested refresh logic without writing it yourself. The key setting is secret_refresh_interval, which controls how often the cache proactively refreshes a secret in the background.
Fix 4: Stagger Container Startup Timing
Even with caching, a brand-new container still makes one cold API call. When a scale-out event spins up 50 containers simultaneously, you get 50 near-simultaneous cold calls. Introduce startup jitter at the orchestration level to spread them out.
In ECS task definitions (using a startup delay script)
#!/bin/bash
# entrypoint.sh β add a random sleep before fetching secrets
SLEEP_TIME=$(( RANDOM % 10 ))
echo "Startup jitter: sleeping ${SLEEP_TIME}s"
sleep $SLEEP_TIME
exec python app.py
Ten seconds of random spread across 50 containers means roughly 5 containers starting per second, which is far easier for the API to absorb. This is a quick operational fix that pairs well with the caching strategies above.
In Kubernetes (EKS)
Use a Kubernetes initContainer that sleeps for a random duration before the main container starts. Or configure your Horizontal Pod Autoscaler to scale out more gradually, limiting how many pods come up in a single wave.
Fix 5: Reduce the Number of Secrets API Calls by Design
Sometimes the throttling problem is architectural. Services that fetch secrets one-by-one, or that re-fetch on every request handler invocation, create unnecessary load. A few design changes can cut call volume significantly.
- Consolidate secrets into a single JSON object. Instead of storing
DB_HOST,DB_USER, andDB_PASSas three separate secrets, store a single secret with a JSON value containing all three. One API call replaces three. - Fetch at startup, not at request time. Load secrets once during application initialization and store them in memory. Re-fetching on every HTTP request is a common antipattern.
- Use environment variable injection at deploy time for non-sensitive config. Not everything needs to be a secret. Database hostnames and port numbers can live in plain environment variables or SSM Parameter Store, reserving Secrets Manager for credentials only.
Common Pitfalls to Avoid
Setting TTL too short defeats the purpose. A 5-second cache TTL barely helps with a burst of 50 containers. Think in minutes, not seconds. Most secrets don't rotate faster than daily.
Sharing one IAM role across all services in an account. If all your microservices call Secrets Manager under the same role, it's harder to track which service is driving throttling. Use per-service IAM roles. It also limits the blast radius if a role is compromised.
Ignoring the SDK version. Older versions of boto3 and botocore have different retry defaults and may not support adaptive mode. Pin and update your SDK versions regularly.
Relying solely on a quota increase request. AWS will often approve a quota increase for Secrets Manager, but it takes time, and the underlying burst problem remains. Quota increases are a ceiling raise, not a fix. Use caching regardless.
Not testing throttling behavior locally. You can simulate throttling in your tests by mocking the SDK client to return a ThrottlingException on the first call, then a valid response. If your application crashes instead of retrying, you'll catch it before production does.
Wrapping Up
Secrets Manager throttling at startup is a predictable scaling problem with well-established solutions. If you're hitting it right now, work through these steps in order:
- Confirm the diagnosis with CloudWatch metrics and application logs before making any changes.
- Configure SDK retry settings immediately β it's a one-line config change that prevents hard crashes from transient throttling.
- Add in-process caching using the AWS caching library or a simple TTL-based cache module in your application.
- Consolidate secrets into fewer, richer JSON objects to reduce the total number of API calls per service.
- Add startup jitter in your container entrypoint or orchestration config to spread cold-start API calls across time.
Once you've addressed the immediate problem, schedule a short architecture review to check whether all your services follow the same patterns. A throttling fix applied to one service won't help if ten others are still calling GetSecretValue on every request.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!