Debugging Memory Leaks in Python Long-Running Services

May 11, 2026 7 min read 8 views

Your service starts fine, but twelve hours later the container is eating 4 GB of RAM and climbing. A restart fixes it temporarily, but the leak is still there, waiting. Finding the culprit in a Python process that's been running for hours is genuinely tricky β€” the usual print-and-run loop doesn't apply.

This article gives you a systematic approach: the right tools, the right mental model, and concrete patterns to isolate the object that's quietly accumulating in memory.

What you'll learn

  • How Python's garbage collector works and why objects still leak
  • How to use tracemalloc to capture allocation snapshots
  • How to identify growing object counts with objgraph
  • Common leak patterns: caches, closures, global state, and circular references
  • How to add lightweight memory monitoring to a production service

Prerequisites

You'll need Python 3.6 or later. The built-in tracemalloc module ships with Python, so no install needed there. Install the third-party tools with:

pip install objgraph psutil memory-profiler

Some examples assume a long-running HTTP service, but the techniques apply equally to queue workers, schedulers, or any process that runs for hours.

Why Python Still Leaks Despite Garbage Collection

Python uses reference counting as its primary memory management strategy. When an object's reference count drops to zero, it's freed immediately. The cyclic garbage collector handles circular references that reference counting can't resolve on its own.

So where do leaks come from? Usually one of these places:

  • An object is still referenced by something you forgot about β€” a list, a dict, a module-level variable.
  • A __del__ method exists on one or more objects in a reference cycle, which historically prevented collection (fixed in Python 3.4+, but still a code smell).
  • A C extension doesn't release memory correctly.
  • You're growing a cache or a list indefinitely without bounding it.

The mental model to hold: a memory leak in Python is almost always a reference leak. Something is holding a reference to an object longer than it should. Your job is to find what.

Establishing a Baseline with tracemalloc

Before you can find a leak, you need a before-and-after snapshot of memory allocations. tracemalloc is the right tool for this.

import tracemalloc

tracemalloc.start()

# Simulate your service doing work
do_some_work()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")

for stat in top_stats[:10]:
    print(stat)

Each line in the output tells you which file and line number is responsible for the most allocated memory. For a leak hunt, comparing two snapshots is more useful than a single one.

import tracemalloc

tracemalloc.start()

# Take a snapshot before a suspected leak window
snapshot1 = tracemalloc.take_snapshot()

for _ in range(1000):
    do_work_cycle()

# Take a snapshot after
snapshot2 = tracemalloc.take_snapshot()

top_stats = snapshot2.compare_to(snapshot1, "lineno")
for stat in top_stats[:15]:
    print(stat)

The compare_to output shows you the delta β€” lines that allocated the most new memory between the two snapshots. This narrows your search considerably.

Counting Live Objects with objgraph

objgraph works at a higher level. Instead of tracking allocations by source line, it counts live Python objects by type. This is excellent for catching the pattern where you keep creating instances of something and never destroying them.

import objgraph

# Show the 15 types with the most instances
objgraph.show_most_common_types(limit=15)

Run this at startup, then again after a few work cycles. If dict, list, or one of your own classes keeps climbing, you've found your candidate type.

Once you have a suspect type, you can ask objgraph to show you what's holding references to instances of it:

import objgraph

# Get a sample of live MyWorker instances
objs = objgraph.by_type("MyWorker")

# Show what references the first one
objgraph.show_backrefs(objs[0], max_depth=3, filename="backrefs.png")

This generates a reference graph image. Follow the arrows backward and you'll find the root object keeping your instance alive. It's rarely more than three or four hops away.

Common Leak Patterns and How to Spot Them

Unbounded Caches

The most common leak in production services. A dict is used as a cache, keys are added on every request, and nothing ever evicts them. After a day of traffic, that dict has millions of entries.

# BAD: grows forever
_cache = {}

def get_user(user_id):
    if user_id not in _cache:
        _cache[user_id] = fetch_from_db(user_id)
    return _cache[user_id]

The fix is to use functools.lru_cache with a maxsize, or a proper LRU structure like cachetools.LRUCache.

from functools import lru_cache

@lru_cache(maxsize=1024)
def get_user(user_id):
    return fetch_from_db(user_id)

Event Listeners and Callbacks Not Cleaned Up

When you register a callback with an event system, the event emitter holds a reference to your callback. If the callback is a bound method, it keeps the entire object alive. This is a classic leak in GUI frameworks, but it also shows up in async services and plugin architectures.

# BAD: the emitter keeps processor alive via the callback reference
class RequestProcessor:
    def __init__(self, event_bus):
        event_bus.on("request", self.handle)

    def handle(self, event):
        pass

# BETTER: unregister when done
class RequestProcessor:
    def __init__(self, event_bus):
        self._bus = event_bus
        event_bus.on("request", self.handle)

    def shutdown(self):
        self._bus.off("request", self.handle)

Closures Capturing Large Objects

A closure keeps a reference to every variable in its enclosing scope, even if you only need one small value from it. This can accidentally keep a large object alive long past its useful life.

# BAD: the lambda captures the entire 'data' dict
def make_handler(data):
    return lambda: process(data["key"])

# BETTER: capture only what you need
def make_handler(data):
    key_value = data["key"]
    return lambda: process(key_value)

Circular References with __del__

Circular references alone are handled by the cyclic GC. But if any object in the cycle defines __del__, Python historically couldn't collect the cycle safely. Python 3.4+ improved this, but defining __del__ is still worth auditing. Prefer context managers and explicit cleanup over finalizers.

Adding Memory Monitoring to a Production Service

You can't run objgraph interactively on a production pod. Instead, emit memory metrics on a regular interval so you can spot a leak trend before it becomes an incident.

import os
import threading
import time
import psutil

def memory_monitor(interval_seconds=60):
    process = psutil.Process(os.getpid())
    while True:
        mem = process.memory_info()
        rss_mb = mem.rss / (1024 * 1024)
        # Replace this with your metrics client (Prometheus, Datadog, etc.)
        print(f"RSS memory: {rss_mb:.1f} MB")
        time.sleep(interval_seconds)

monitor_thread = threading.Thread(target=memory_monitor, daemon=True)
monitor_thread.start()

Track RSS (Resident Set Size) over time. A healthy service has a stable RSS after its warm-up period. A service with a leak shows a steady upward slope. If you see that slope, that's your cue to enable tracemalloc and take snapshots.

Using a Diagnostic Endpoint

In a Flask or FastAPI service, a hidden diagnostic endpoint lets you trigger a memory snapshot on demand without redeploying. Lock it down with an internal-only middleware or a secret header.

import tracemalloc
from fastapi import FastAPI, Header, HTTPException

app = FastAPI()
tracemalloc.start()

@app.get("/internal/memory-snapshot")
def memory_snapshot(x_internal_token: str = Header(None)):
    if x_internal_token != "your-secret-token":
        raise HTTPException(status_code=403)

    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics("lineno")
    return {
        "top_allocations": [
            {"location": str(stat.traceback), "size_kb": stat.size / 1024}
            for stat in top_stats[:20]
        ]
    }

Call this endpoint before and after a suspected leak window, then diff the two responses. This is far more targeted than trying to reproduce the leak locally.

Common Pitfalls

  • Profiling in development only: Many leaks only appear under real traffic volumes or after hours of runtime. Reproduce the conditions, not just the code path.
  • Confusing RSS with actual leaks: Python keeps a pool of free memory available for reuse. RSS can appear high even after objects are freed. Use object-count tools like objgraph to confirm that objects are actually accumulating, not just that RSS is large.
  • Overlooking third-party libraries: Your code may be clean, but a library you depend on could hold internal caches or connection pools that grow over time. Check library-specific documentation for bounding options.
  • Assuming async code is leak-free: Async tasks that are created but never awaited accumulate in the event loop's task set. Always store a reference and cancel/await tasks when they're done.
  • Forgetting thread-local storage: Data stored in threading.local() is scoped to a thread's lifetime. In a thread-pool-based server, threads live for the process lifetime, so any data stored in thread-locals accumulates.

Wrapping Up

Memory leaks in long-running Python services are almost always reference leaks β€” something is holding onto an object that should have been released. The diagnostic path is repeatable: measure baseline RSS, take allocation snapshots with tracemalloc, confirm object growth with objgraph, and trace back the reference chain.

Concrete next steps to take right now:

  1. Add RSS monitoring to your service using psutil and emit the metric to your existing observability stack.
  2. Wrap your main work loop in a tracemalloc compare-snapshot block in a staging environment and run it for at least 1000 cycles.
  3. Run objgraph.show_most_common_types() at startup and after your warm-up phase; flag any type whose count keeps climbing.
  4. Audit every module-level dict or list that could act as a cache; add a maxsize or an explicit eviction strategy.
  5. Review any async tasks you create to confirm they are always awaited or cancelled before the owning object is destroyed.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.