Cutting AWS Lambda Cold Starts in Python Without Provisioned Concurrency
Your Lambda function works perfectly in local tests, but the first request after a quiet period takes two seconds longer than every request that follows. Users notice. Alarms fire. The usual answer β Provisioned Concurrency β adds a line item to your AWS bill that runs 24/7 whether you need it or not. You can do better before reaching for that switch.
Cold starts are not a mystery. They follow predictable rules, and Python gives you several specific levers to pull. This article walks through each one in practical terms.
What You'll Learn
- Why Python cold starts are slower than other runtimes and what drives that cost
- How to restructure imports and module-level code for faster initialization
- How to reduce your deployment package size to speed up container loading
- How to reuse execution context across invocations to avoid repeat work
- How to use Lambda Snap Start and warm-up patterns as free alternatives
Why Python Cold Starts Are Notably Slow
When Lambda has no warm container ready for your function, it spins up a new execution environment. That process involves downloading your deployment package, initializing the runtime, and running all your module-level code before the handler even starts. For Python, that last step is expensive.
Python imports are not just declarations. When you write import boto3, Python executes that module's top-level code, builds internal data structures, and registers class hierarchies. A standard data-science Lambda with NumPy, Pandas, and boto3 can spend over a second in imports alone before your handler runs a single line.
The runtime itself (CPython) is also slower to start than compiled runtimes like Go or Java's GraalVM native image. You cannot change the runtime's fundamental behavior, but you can control what happens on top of it.
Understand What Your Cold Start Is Actually Doing
Before optimizing blindly, measure. Add an init_start timestamp at the very top of your module and an init_end timestamp just before your handler definition. Log both inside the handler on first invocation.
import time
_INIT_START = time.perf_counter()
import boto3
import json
# ... other imports ...
_INIT_END = time.perf_counter()
def handler(event, context):
if not hasattr(handler, "_logged_init"):
print(f"INIT_DURATION_MS={((_INIT_END - _INIT_START) * 1000):.1f}")
handler._logged_init = True
# your actual logic
This tells you how long module initialization takes separately from handler execution. You can also read the REPORT line in CloudWatch Logs, which includes an Init Duration field on cold start invocations. Correlate the two to see exactly where time is going.
Defer Imports You Do Not Always Need
The single highest-impact change is moving imports inside the function that actually needs them, when those imports are only used conditionally or infrequently.
# Before: always imported at module level
import pandas as pd
import numpy as np
def handler(event, context):
action = event.get("action")
if action == "generate_report":
df = pd.DataFrame(...)
# ...
elif action == "ping":
return {"statusCode": 200, "body": "ok"}
# After: import only when the code path actually runs
def handler(event, context):
action = event.get("action")
if action == "generate_report":
import pandas as pd # only imported on this branch
df = pd.DataFrame(...)
# ...
elif action == "ping":
return {"statusCode": 200, "body": "ok"}
Python caches imported modules in sys.modules, so repeated calls after the first warm invocation pay no re-import cost. The cold start for any invocation that does not hit the heavy branch becomes much faster.
This pattern is not always clean. If your handler always uses a library, deferring its import just moves the cost without eliminating it. Save this technique for optional or path-specific dependencies.
Trim Your Deployment Package
Lambda cold start time includes the time to load and decompress your deployment package. A 50 MB compressed zip loads noticeably faster than a 250 MB one. Here are the most effective reductions.
Strip unused sub-packages from large libraries
Some libraries ship with optional extras you rarely need. Boto3, for example, includes service stubs for every AWS service. If you only use S3 and DynamoDB, you can install a lighter alternative like boto3-stubs for type hints and exclude unneeded service data at build time.
Alternatively, use botocore directly for specific services instead of pulling in the full boto3 surface. For AWS SDK calls you control tightly, the savings can be significant.
Use Lambda Layers strategically
Splitting large, stable dependencies into a Lambda Layer does not by itself speed up cold starts β Lambda still loads the layer. However, it separates fast-changing application code from slow-changing dependencies, which lets you cache and reuse layers more effectively and keeps your main package small for iteration.
Exclude dev and test artifacts
Your requirements.txt probably includes pytest, black, mypy, and similar tools. Make sure your build step installs only production dependencies into the deployment artifact.
# Install only production deps into a target directory
pip install \
--no-dev \
--target ./package \
-r requirements-prod.txt
# Then zip only that directory plus your source
cd package && zip -r9 ../function.zip . && cd ..
zip -g function.zip lambda_function.py
Keep Expensive Initialization Out of the Handler
Lambda reuses the execution environment for subsequent invocations. Any work done at module level β outside the handler function β runs once per container, not once per request. Use this to avoid repeating expensive setup on every call.
import boto3
import os
# Runs once per container lifetime
_dynamodb = boto3.resource("dynamodb")
_table = _dynamodb.Table(os.environ["TABLE_NAME"])
_config = None # lazy-loaded below
def _load_config():
"""Fetch config from Parameter Store β cached after first call."""
ssm = boto3.client("ssm")
response = ssm.get_parameter(Name=os.environ["CONFIG_PARAM"], WithDecryption=True)
return response["Parameter"]["Value"]
def handler(event, context):
global _config
if _config is None:
_config = _load_config() # only happens on first invocation per container
# use _config and _table normally
The DynamoDB resource is initialized at module load time β one connection setup per container. The SSM config fetch is deferred to first invocation but cached after that. Neither repeats on subsequent warm calls.
Do not put network calls that can fail at module level, because a failure there causes Lambda to report an initialization error and discard the container, triggering another cold start. The lazy initialization pattern in _load_config is safer.
Choose the Right Memory Setting
Lambda allocates CPU proportional to memory. A function configured at 128 MB gets a fraction of a vCPU; one at 1024 MB gets a full vCPU. Python's import machinery is CPU-bound. Doubling memory from 128 MB to 256 MB often cuts import time by 30β40% with minimal cost impact, because you pay for duration times memory and the duration drops.
Run a quick test: deploy the same function at 128 MB, 256 MB, and 512 MB, trigger cold starts, and compare Init Duration in the logs. The inflection point varies by package size, but moving off the minimum memory allocation is almost always a win for Python functions with meaningful dependencies.
Use ARM64 (Graviton2) Architecture
Lambda supports both x86_64 and arm64 architectures. Graviton2 functions often show measurably shorter cold start durations for Python workloads and cost less per GB-second. The change is a single setting in your function configuration and requires building your deployment package for the arm64 target.
# Build for arm64 on an x86 machine using Docker
docker run --rm \
--platform linux/arm64 \
-v "$(pwd)":/var/task \
public.ecr.aws/lambda/python:3.12-arm64 \
pip install -r requirements-prod.txt -t /var/task/package
Check that all your dependencies have ARM wheels available before switching. Most popular packages (NumPy, Pandas, boto3) do. Pure-Python packages work without any changes.
Lambda SnapStart for Python (and Free Warm-Up Patterns)
AWS SnapStart was initially only available for Java runtimes. As of recent Lambda runtime updates, Python support is expanding. SnapStart takes a snapshot of the initialized execution environment and restores from it rather than re-initializing from scratch, which can eliminate most cold start overhead. Check the current AWS documentation for whether your target Python runtime version supports it β the feature has been rolling out incrementally.
If SnapStart is not yet available for your runtime, the classic alternative is a scheduled warm-up ping. A CloudWatch Events rule triggers your function every few minutes with a known payload that bypasses real work.
def handler(event, context):
# Short-circuit on warm-up pings
if event.get("source") == "warmup":
return {"statusCode": 200, "body": "warm"}
# ... real logic follows
This keeps at least one container alive between real requests. It is not the most elegant solution, and it does nothing for traffic spikes that need multiple concurrent containers. Think of it as a last resort, not a primary strategy.
Common Pitfalls to Avoid
- Putting SDK clients inside the handler body unconditionally. Each invocation reconstructs the client, which includes TLS handshake setup. Move clients to module level.
- Importing heavy libraries at the top of utility modules that are imported at startup. Your thin
utils.pymight importpandas, and importingutilsdrags it in even when you do not need it. - Assuming a cold start only happens on first deploy. Lambda scales horizontally. Any new container, for any concurrent invocation, is a cold start. Optimize for frequency, not just for the initial deploy.
- Over-trimming packages and causing import errors at runtime. Always test your trimmed deployment package in a staging environment before production. A missing transitive dependency is harder to debug than a slow cold start.
- Ignoring the
Init Durationfield in logs. Many teams optimize handler duration while the initialization cost sits hidden in the log report. Read those fields β they tell you exactly what users experience on cold starts.
Wrapping Up
Cold start reduction in Python Lambda is a series of small wins that compound. No single change eliminates the problem, but combining them can realistically cut your cold start time in half without any change to your billing tier. Here are the concrete next steps to take:
- Add init timing instrumentation to your function today and measure
Init Durationin CloudWatch before making any changes. - Move any conditional or infrequently-used imports inside the code paths that actually need them.
- Audit your deployment package: strip dev dependencies, check for unused large packages, and get your compressed artifact under 10 MB if possible.
- Move all SDK client construction and one-time setup to module level, with lazy loading for calls that can fail.
- Benchmark the same function at 256 MB and 512 MB memory versus your current setting, and consider switching to
arm64architecture.
Once you have applied all of these, re-measure. If you are still seeing cold starts that are too slow for your SLA, then Provisioned Concurrency is a reasonable next step β but at that point you will be paying for it on a much leaner, faster function.
π€ Share this article
Sign in to saveComments (0)
No comments yet. Be the first!