Tracing Memory Leaks in Long-Running Python Processes with tracemalloc
Your Python service has been running fine for hours, then an alert fires: memory usage is climbing and not coming back down. You restart the process, the problem disappears, and you're left with no idea what actually happened. That's a memory leak, and without the right tools it's one of the most frustrating bugs to pin down.
Python's standard library ships a module called tracemalloc that records exactly where each allocation was made. No third-party dependencies, no recompilation, no guesswork. This guide walks through using it effectively in production-grade processes.
What you'll learn
- How
tracemallocworks and what it actually tracks - Taking snapshots and comparing them to isolate growth over time
- Filtering and sorting results to find the worst offenders
- Integrating leak detection into a long-running service without killing performance
- Common patterns that cause leaks in Python and how to recognize them
Prerequisites
You need Python 3.4 or later β tracemalloc has been part of the standard library since then. The examples assume a basic comfort with Python classes and decorators. No special setup is required beyond a working Python interpreter.
How tracemalloc Works
tracemalloc hooks into CPython's memory allocator at the C level. When you call tracemalloc.start(), Python begins recording a traceback for every allocation made from that point forward. You can then ask for a snapshot of all currently live allocations, complete with the file, line number, and call stack that created each one.
This is meaningfully different from a tool like memory_profiler, which samples RSS usage from outside the process. tracemalloc knows about individual Python objects and their allocation sites. If a dictionary is leaking, you'll see the exact line of code that created it.
The tradeoff is overhead. Recording full tracebacks for every allocation is not free. In a tight loop allocating millions of small objects, you'll notice the slowdown. For most I/O-bound or moderately CPU-bound services, the cost is acceptable, especially when you limit the traceback depth.
Taking Your First Snapshot
The minimal workflow is: start tracing, do some work, take a snapshot, inspect the top allocations.
import tracemalloc
tracemalloc.start() # begin recording allocations
# --- your application code runs here ---
data = [bytearray(1024) for _ in range(10_000)]
# ------------------------------------
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
for stat in top_stats[:10]:
print(stat)
The statistics("lineno") call groups allocations by source line. Each item in the list tells you the file, line number, total bytes currently allocated, and the number of distinct allocation calls that came from that line. Running this in an interactive session is a good way to get comfortable with the output format before you try it in production.
The Diff Pattern: Finding What's Growing
A single snapshot shows you what's allocated right now, but it doesn't tell you what's growing. For a long-running process, you want to compare two snapshots taken at different points in time. Anything that appears in the second snapshot but not the first β or that grew significantly β is a candidate for investigation.
import tracemalloc
import time
tracemalloc.start(25) # keep up to 25 frames of traceback
# snapshot before your work begins
snap1 = tracemalloc.take_snapshot()
# simulate the process doing work over time
for cycle in range(100):
process_one_batch() # your real function here
time.sleep(0.1)
# snapshot after
snap2 = tracemalloc.take_snapshot()
# compare: positive size means snap2 has MORE allocated from that line
stats = snap2.compare_to(snap1, "lineno")
for stat in stats[:15]:
print(stat)
The compare_to result is sorted by the size difference descending by default, so the biggest growers appear first. A line that shows up at the top of this list every time you run a diff is a reliable signal that something there is not being released.
Reading the Output
A typical line from statistics("lineno") looks like this:
/app/worker.py:142: size=4802 KiB (+4801 KiB), count=9823 (+9822), average=501 B
Break it down: size is the total bytes currently held by objects allocated on that line. The value in parentheses is the change since the earlier snapshot. count is the number of live objects that came from that line. average is the mean size per object.
A high count with a small average often points to a container (list, dict, set) accumulating small items and never clearing them. A low count with a large size often points to a buffer or cache that isn't being evicted. Both patterns are worth investigating but require different fixes.
Filtering to Cut Through the Noise
The raw output frequently includes a lot of standard library noise β internal list resizes, import machinery, codec buffers. You almost always want to filter this down to your own code.
import tracemalloc
import linecache
tracemalloc.start(10)
# ... run your code ...
snap = tracemalloc.take_snapshot()
# only show allocations from your own package
filters = [
tracemalloc.Filter(inclusive=True, filename_pattern="/app/*"),
]
filtered = snap.filter_traces(filters)
stats = filtered.statistics("traceback")
for stat in stats[:5]:
print("=" * 60)
print(f"{stat.count} allocations, {stat.size / 1024:.1f} KiB total")
for line in stat.traceback.format():
print(line)
Using "traceback" instead of "lineno" groups by the full call stack, which is slower to compute but gives you the context you need to understand why a particular line is being called so often. The traceback.format() output looks exactly like a Python exception traceback, so it's immediately readable.
You can also use inclusive=False filters to exclude known-noisy paths like */site-packages/* or the standard library.
Integrating tracemalloc into a Running Service
You don't want to manually trigger snapshots in production. A better pattern is to build periodic diffing into your service itself, writing the results to a log or metrics backend.
import tracemalloc
import logging
import threading
import time
logger = logging.getLogger(__name__)
_baseline_snapshot = None
def start_memory_tracing():
tracemalloc.start(15)
global _baseline_snapshot
_baseline_snapshot = tracemalloc.take_snapshot()
t = threading.Thread(target=_memory_report_loop, daemon=True)
t.start()
def _memory_report_loop():
global _baseline_snapshot
while True:
time.sleep(300) # report every 5 minutes
snap = tracemalloc.take_snapshot()
filters = [tracemalloc.Filter(inclusive=True, filename_pattern="/app/*")]
stats = snap.filter_traces(filters).compare_to(
_baseline_snapshot.filter_traces(filters), "lineno"
)
for stat in stats[:10]:
if stat.size_diff > 50 * 1024: # only log if >50 KiB growth
logger.warning(
"Memory growth detected",
extra={"location": str(stat.traceback), "size_diff_kb": stat.size_diff // 1024},
)
# roll the baseline forward so you see incremental growth
_baseline_snapshot = snap
Rolling the baseline forward on each cycle lets you see which allocations are consistently growing across intervals rather than just a one-time spike. If the same line shows up as the top grower in five consecutive 5-minute windows, that's your leak.
Common Leak Patterns in Python
Unbounded caches and registries
Global dictionaries used as caches are one of the most common culprits. If you're appending to a dict and never evicting old entries, it will grow without bound. Switch to functools.lru_cache with a maxsize, or use cachetools.TTLCache if you need time-based expiry.
Circular references with __del__
CPython's reference-counting garbage collector handles most cycles automatically, but objects that define __del__ methods and participate in reference cycles used to be uncollectable before Python 3.4. Even in modern Python, cycles slow down collection. Use weakref.ref or weakref.WeakValueDictionary when a parent object needs to refer to something that also refers back to it.
Event listeners and callbacks never removed
If you register a callback on an event emitter, message broker, or signal dispatcher, the emitter holds a reference to your callback. That callback often closes over a large object. If you never deregister, neither the callback nor the closed-over objects can be collected. Always pair connect/subscribe calls with a corresponding disconnect/unsubscribe in a finally block or context manager.
Thread-local storage growing across requests
In web frameworks that reuse threads across requests, threading.local() data accumulates per-thread. If you append to a thread-local list on each request and never clear it, that list grows for the lifetime of the thread. Explicitly reset thread-local state at the start or end of each request handler.
Large objects kept alive by exception tracebacks
Python tracebacks hold references to the local variables in every frame on the call stack at the time an exception was raised. If you catch an exception and store it (e.g., for later logging), every local in every frame it passed through stays alive. Use traceback.format_exc() to store the string representation instead, then let the exception object go.
Gotchas to Watch For
The overhead of tracemalloc scales with the traceback depth you request. A depth of 25 frames can add meaningful latency in tight loops. Start with a depth of 5β10 for production use and only increase it when you're actively hunting a specific leak.
tracemalloc only tracks memory allocated through CPython's internal allocator. Memory held by C extensions (NumPy arrays, database drivers, some image processing libraries) may not show up in the output at all. If you've ruled out pure-Python leaks and RSS is still growing, you may need a tool like gdb or valgrind to inspect the C heap.
Also note that Python's memory allocator doesn't always return memory to the OS immediately after freeing objects. RSS can look like it's leaking when Python has simply grown its internal free-list. Use tracemalloc data to confirm that the number of live objects is growing, not just the process RSS.
Wrapping Up
Memory leaks in long-running Python processes are fixable once you can see exactly what's accumulating and where it was created. Here are the concrete steps to take from here:
- Add
tracemalloc.start(10)at your service's entry point and take a baseline snapshot at startup before any request handling begins. - Build a background thread that diffs against the baseline every few minutes and logs results above a size threshold you define.
- Filter traces to your own package paths so you're not drowning in standard library noise.
- When you identify a suspicious line, switch to
"traceback"grouping and increase the frame depth temporarily to see the full call path. - Work through the common patterns above β unbounded caches, unremoved listeners, thread-local accumulation β and check whether any of them match what your trace data is pointing at.
Once you've fixed a suspected leak, leave tracing on for at least one full production cycle and verify that the growth has actually stopped. A flat diff across five consecutive intervals is as close to confirmation as you'll get.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!