Tracing Memory Leaks in Long-Running Python Processes with tracemalloc

One of the most frustrating production issues in Python applications is a memory leak that appears slowly over time.

The application starts normally.

Everything looks healthy.

CPU usage remains stable.

Requests are processed successfully.

Then, hours or days later:

Memory Usage
↓
Keeps Growing
↓
Performance Degrades
↓
OOM Kill or Crash

The issue becomes even more challenging in long-running systems such as:

Django applications
Flask APIs
FastAPI services
Celery workers
Data pipelines
Background daemons
ETL processes
AI inference servers

Because the process rarely restarts, small memory leaks accumulate until they become serious operational problems.

Many developers assume:

Python Has Garbage Collection
↓
Memory Leaks Cannot Happen

Unfortunately, this is not true.

Garbage collection only removes objects that are no longer referenced.

If your application unintentionally retains references, memory usage can continue growing indefinitely.

This is where Python's built-in:

tracemalloc

module becomes invaluable.

It allows developers to track memory allocations, compare snapshots, and identify exactly where memory growth originates.

In this guide, you'll learn how to use tracemalloc effectively to diagnose memory leaks in production-grade Python applications.

What You Will Learn From This Article

After reading this guide, you'll understand:

What memory leaks look like in Python.
Why garbage collection doesn't prevent all leaks.
How tracemalloc works.
Taking and comparing memory snapshots.
Finding allocation hotspots.
Diagnosing real-world leaks.
Best practices for memory monitoring.

What Is a Memory Leak?

A memory leak occurs when memory remains allocated longer than necessary.

Example:

cache = []

while True:
    cache.append(load_data())

Memory usage:

Iteration 1
↓
10 MB

Iteration 1000
↓
500 MB

Iteration 10000
↓
5 GB

The application never releases old data.

Memory continuously grows.

Why Python Applications Leak Memory

Common causes include:

Unbounded Caches

Data accumulates indefinitely.

Global Variables

Objects remain referenced forever.

Circular References

Complex object relationships prevent cleanup.

Event Listeners

Handlers accumulate over time.

Session Storage

User data remains in memory.

Queues

Items are never removed.

Background Workers

Temporary objects persist unexpectedly.

These problems are especially common in long-running services.

Why Garbage Collection Isn't Enough

Python's garbage collector removes objects that are no longer referenced.

Example:

data = load_data()

del data

Memory becomes eligible for cleanup.

However:

cache.append(data)

creates a reference.

Now:

Object Still Referenced
↓
Cannot Be Freed

Garbage collection cannot help.

Introducing tracemalloc

Python provides:

import tracemalloc

This module tracks:

Memory allocations
Allocation sources
File locations
Line numbers

Unlike general memory monitoring tools, tracemalloc focuses specifically on Python object allocations.

Starting tracemalloc

Enable tracking:

import tracemalloc

tracemalloc.start()

From this point forward:

Memory Allocations
↓
Recorded

for later analysis.

Taking a Snapshot

Capture current memory state:

snapshot = tracemalloc.take_snapshot()

Think of a snapshot as:

Memory Photograph

at a specific moment.

Viewing Top Memory Consumers

Example:

top_stats = snapshot.statistics('lineno')

for stat in top_stats[:10]:
    print(stat)

Output may resemble:

app.py:42
50 MB

cache.py:18
35 MB

These lines indicate where memory was allocated.

Why Snapshots Are Powerful

A single snapshot shows:

Current Memory State

Two snapshots reveal:

Memory Growth

which is often more useful.

Comparing Snapshots

Example:

snapshot1 =
    tracemalloc.take_snapshot()

# Run workload

snapshot2 =
    tracemalloc.take_snapshot()

Compare:

stats =
    snapshot2.compare_to(
        snapshot1,
        'lineno'
    )

Now you can see:

Which Lines Added Memory

during execution.

Example Output

Results may show:

cache.py:28
+120 MB

or:

worker.py:95
+50000 objects

These are strong indicators of leak locations.

Finding Allocation Hotspots

Hotspots are areas where memory growth occurs repeatedly.

Example:

results.append(
    process(item)
)

If:

results

never clears,

memory usage grows continuously.

Snapshot comparisons quickly reveal this pattern.

Monitoring Long-Running Processes

Example workflow:

Application Starts
↓
Snapshot A
↓
1 Hour Later
↓
Snapshot B
↓
Compare

This identifies growth over time.

Particularly useful in:

APIs
Workers
Scheduled jobs

Real-World Leak Example

Problem:

processed_users = []

def handle_user(user):

    processed_users.append(user)

Initially:

100 Users
↓
Small Memory Usage

After weeks:

Millions of Users
↓
Massive Memory Usage

Snapshots reveal:

processed_users.append()

as the primary allocation source.

Using Filters

Reduce noise:

snapshot.filter_traces((
    tracemalloc.Filter(
        True,
        "myproject/*"
    ),
))

Benefits:

Ignore third-party libraries
Focus on application code
Faster analysis

Especially useful in large projects.

Grouping Statistics

Different grouping options include:

statistics('lineno')

Line level.

statistics('filename')

File level.

statistics('traceback')

Full allocation path.

Each provides different debugging perspectives.

Investigating Tracebacks

Example:

statistics('traceback')

Shows:

Allocation Origin
↓
Function Chain
↓
Memory Usage

This helps locate leaks hidden behind multiple layers of abstraction.

Measuring Current and Peak Memory

Example:

current, peak =
    tracemalloc.get_traced_memory()

Output:

Current:
150 MB

Peak:
400 MB

Useful for performance monitoring.

Resetting Peak Tracking

Example:

tracemalloc.reset_peak()

Allows:

Fresh Peak Measurements

during benchmarking.

Detecting Slow Memory Growth

Not all leaks are dramatic.

Example:

+1 MB Per Hour

appears harmless.

Yet:

24 MB Per Day
↓
720 MB Per Month

becomes significant.

Snapshot comparisons help identify slow leaks early.

Memory Leaks vs Memory Fragmentation

Important distinction:

Memory Leak

Memory usage continuously grows.

Fragmentation

Memory usage appears high even after objects are freed.

Tracemalloc primarily helps identify allocation growth rather than OS-level fragmentation.

Integrating tracemalloc Into Testing

Example:

Run Test
↓
Snapshot Before
↓
Execute Logic
↓
Snapshot After
↓
Compare

Unexpected growth often reveals hidden issues before production deployment.

Combining tracemalloc With Logging

Example:

logger.info(
    tracemalloc.get_traced_memory()
)

Periodic logging creates:

Memory Trend History

which helps correlate leaks with specific workloads.

Best Practices Checklist

When debugging memory leaks:

✅ Enable tracemalloc early

✅ Take baseline snapshots

✅ Compare snapshots over time

✅ Monitor peak memory usage

✅ Filter irrelevant libraries

✅ Investigate large allocation deltas

✅ Review global state carefully

✅ Audit caches and queues

✅ Test long-running workloads

✅ Track memory trends continuously

Common Mistakes to Avoid

Avoid:

❌ Assuming garbage collection prevents leaks

❌ Relying only on OS memory metrics

❌ Taking only one snapshot

❌ Ignoring slow memory growth

❌ Overlooking global variables

❌ Using unbounded caches

❌ Forgetting to clear queues and buffers

Example Investigation Workflow

Production symptom:

API Memory
100 MB
↓
300 MB
↓
700 MB
↓
1.5 GB

Investigation:

Enable tracemalloc
↓
Take Snapshot
↓
Generate Traffic
↓
Take Second Snapshot
↓
Compare Results

Finding:

cache_manager.py:54
+800 MB

Root cause:

cache[key] = response

without expiration logic.

Fix:

Bounded Cache
↓
Stable Memory Usage

Problem resolved.

Why tracemalloc Is Often Underused

Many developers immediately reach for:

External profilers
Container metrics
System monitoring tools

These tools show:

Memory Is Growing

but not:

Why Memory Is Growing

tracemalloc bridges that gap by connecting memory growth directly to source code.

Wrapping Summary

Memory leaks in Python applications can be difficult to diagnose because they often develop gradually over hours, days, or weeks. Long-running services such as APIs, background workers, data pipelines, and machine learning systems are especially vulnerable because even small memory retention issues accumulate over time.

Python's built-in tracemalloc module provides a powerful way to identify the source of memory growth by tracking allocations, capturing snapshots, comparing memory states, and pinpointing the exact files and lines responsible for increasing memory consumption. Unlike system-level monitoring tools that only reveal that memory usage is rising, tracemalloc helps explain why it is happening.

By incorporating snapshot comparisons, memory trend analysis, allocation filtering, and regular monitoring into your debugging workflow, you can detect memory leaks early, maintain stable application performance, and prevent costly production incidents before they affect users.

Tracing Memory Leaks in Long-Running Python Processes with tracemalloc

Unbounded Caches

Global Variables

Circular References

Event Listeners

Session Storage

Queues

Background Workers

Memory Leak

Fragmentation

Related Articles

Writing a Contributor Guide That Gets First-Time PRs You Can Actually Merge

Pinpointing CPU Spikes in Node.js Services Using Clinic.js Flame

Fixing React useState Updates That Batch Silently in Async Event Handlers

Comments (0)

Leave a Comment

Tracing Memory Leaks in Long-Running Python Processes with tracemalloc

Unbounded Caches

Global Variables

Circular References

Event Listeners

Session Storage

Queues

Background Workers

Memory Leak

Fragmentation

Related Articles

Writing a Contributor Guide That Gets First-Time PRs You Can Actually Merge

Pinpointing CPU Spikes in Node.js Services Using Clinic.js Flame

Fixing React useState Updates That Batch Silently in Async Event Handlers

Comments (0)

Leave a Comment

Stay ahead of the curve