Fixing Django ORM Queries That Ignore Database Indexes at Scale
Your Django app runs fine with a few thousand rows. Then your dataset hits a few million and suddenly a simple list view is timing out. You check the logs, find a query taking four seconds, and wonder what changed. Usually nothing changed β you just grew into a problem that was always there.
The root cause, more often than not, is a query that performs a full table scan instead of using an index. Django's ORM makes it easy to write queries without thinking about the underlying SQL, which means it's equally easy to write queries that silently ignore every index you've defined.
- How database indexes work and why the ORM can miss them
- How to inspect query plans with
EXPLAINand Django's debug tools - The most common ORM patterns that prevent index use
- How to add targeted indexes and rewrite queries to use them
- How to catch these problems before they reach production
Prerequisites
This guide assumes you're running Django 3.2 or later with a PostgreSQL or MySQL backend. The concepts apply equally to both, but the EXPLAIN output shown here is PostgreSQL-flavored. You should be comfortable reading basic SQL and have access to your database shell or a tool like DBeaver.
Why the ORM Sometimes Skips Your Indexes
An index is only useful when the database's query planner decides using it is cheaper than scanning every row. The planner makes that decision based on statistics about your data β cardinality, row count, value distribution. The ORM has no visibility into any of this. It just generates SQL.
Three things commonly push the planner away from your index. First, you wrap a column in a function or expression, which means the index on the raw column value is useless. Second, you filter on a column with very low cardinality (a boolean, a status enum with two values) and the planner correctly decides a full scan is faster. Third, and most relevant to the ORM, you filter in a way that doesn't match the column type exactly β a string compared to an integer, or a case-insensitive lookup on a case-sensitive index.
Inspecting What the ORM Actually Generates
Before you can fix anything, you need to see what SQL Django is producing and whether the planner is using your indexes.
Using str() on a QuerySet
The fastest way to see the SQL for a queryset is to call str() on it before it evaluates:
from myapp.models import Order
qs = Order.objects.filter(status="pending", created_at__gte="2024-01-01")
print(str(qs.query))
This prints the raw SQL Django will execute. It won't show you the query plan, but it tells you whether your .filter() calls are producing the WHERE clause you expect.
Running EXPLAIN ANALYZE
To see whether PostgreSQL is using an index, you need the query plan. Use Django's database connection to run EXPLAIN ANALYZE directly:
from django.db import connection
with connection.cursor() as cursor:
cursor.execute(
"EXPLAIN ANALYZE SELECT * FROM myapp_order WHERE status = %s AND created_at >= %s",
["pending", "2024-01-01"]
)
rows = cursor.fetchall()
for row in rows:
print(row[0])
Look for Seq Scan in the output. That means a full table scan. Index Scan or Index Only Scan means the index is being used. Bitmap Heap Scan is a middle ground the planner uses when it expects to return a significant fraction of rows.
Common ORM Patterns That Break Index Usage
Filtering on a Function or Expression
This is the most common culprit. If you write a filter like this:
User.objects.filter(email__icontains="example.com")
Django generates SQL that calls LOWER(email) under the hood for a case-insensitive search. Your index on the raw email column is not a LOWER(email) index, so the planner can't use it. You end up scanning every row.
The fix is a functional index that matches exactly what the query does:
from django.db import models
from django.db.models.functions import Lower
class User(models.Model):
email = models.EmailField()
class Meta:
indexes = [
models.Index(Lower("email"), name="user_email_lower_idx"),
]
Run makemigrations and migrate, then check your query plan again. The planner will now find the functional index and use it for __icontains and __iexact lookups.
Filtering on a JSON Field
Django's JSONField lookups are powerful but easy to misuse. A filter like this will never use a standard B-tree index:
Product.objects.filter(metadata__category="electronics")
For PostgreSQL, you need a GIN index on the JSON column, or a generated column with an index if you always filter on the same key:
class Product(models.Model):
metadata = models.JSONField()
class Meta:
indexes = [
models.Index(
fields=["metadata"],
name="product_metadata_gin_idx",
# Requires psycopg2 and a custom Index class for GIN
)
]
For production workloads where you always filter on a specific JSON key, the more reliable approach is to extract that key into a dedicated database column with a proper index. It adds a bit of schema complexity but the query performance improvement is significant.
Ordering Without a Matching Index
Ordering by a column that isn't indexed forces the database to sort every row it retrieves. With millions of rows and pagination, this is expensive. If you commonly do:
Order.objects.filter(user=user).order_by("-created_at")
Then you want a composite index on (user_id, created_at) in descending order:
from django.db.models import Index
class Order(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
indexes = [
Index(fields=["user", "-created_at"], name="order_user_created_idx"),
]
The - prefix in the fields list tells Django to create a descending index on that column. PostgreSQL can use this for both ORDER BY created_at DESC and ORDER BY created_at ASC (by scanning the index backwards), but match the direction to your most common query for best results.
Using OR Filters Carelessly
Combining OR conditions with Django's Q objects can produce SQL that the planner can't satisfy with a single index scan:
from django.db.models import Q
Order.objects.filter(Q(status="pending") | Q(status="processing"))
For simple cases like this, rewrite it as an __in lookup instead. The planner handles it much better:
Order.objects.filter(status__in=["pending", "processing"])
The SQL changes from two OR conditions to a single IN (...) clause, which PostgreSQL can satisfy with a single index scan on status.
The N+1 Problem Is an Index Problem Too
N+1 queries are often framed as a lazy-loading issue, but they're also an index issue. If you access a related object in a loop without select_related or prefetch_related, you're firing one query per row. Each of those queries may be individually fast if the foreign key is indexed, but at scale you're still making thousands of round trips.
# Bad: fires one query per order
for order in Order.objects.filter(status="pending"):
print(order.user.email) # N extra queries
# Good: one query with a JOIN
for order in Order.objects.filter(status="pending").select_related("user"):
print(order.user.email)
select_related works for ForeignKey and OneToOne fields. Use prefetch_related for ManyToMany or reverse ForeignKey relationships. Both require that the joining columns are indexed, which Django handles automatically for primary keys and foreign keys.
Partial Indexes for High-Cardinality Filters
If you frequently filter on a small subset of rows β active users, open orders, unprocessed jobs β a partial index can dramatically reduce the index size and make scans faster:
class Job(models.Model):
STATUS_CHOICES = [("pending", "Pending"), ("done", "Done")]
status = models.CharField(max_length=20, choices=STATUS_CHOICES)
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
indexes = [
models.Index(
fields=["created_at"],
name="job_pending_created_idx",
condition=models.Q(status="pending"),
)
]
This creates an index that only includes rows where status = 'pending'. If 95% of your rows are done, this index is much smaller and faster than a full index on created_at. Django exposes partial indexes through the condition argument on models.Index.
Common Pitfalls to Watch For
Don't add indexes speculatively. Every index slows down writes because the database must update the index on every INSERT, UPDATE, and DELETE. Profile first, then add the index that solves the specific slow query you've identified.
Statistics can be stale. PostgreSQL's autovacuum runs ANALYZE periodically to update statistics, but on very busy tables it may lag behind. If you've added an index and the planner still ignores it, run ANALYZE myapp_order; manually and check the plan again.
Composite index column order matters. An index on (user_id, created_at) is useful for queries that filter on user_id alone or on both user_id and created_at. It is not useful for queries that filter only on created_at. Put the most selective column, or the one you filter on most frequently alone, first.
Django's db_index=True is not always enough. Setting db_index=True on a field creates a single-column B-tree index. If your queries filter on multiple columns together, you need a composite models.Index defined in Meta.indexes.
Don't trust development data for query plan tests. A planner decision made on a 500-row development database will likely differ from one made on a 50-million-row production database. Always verify query plans in a staging environment loaded with production-scale data.
Automating Detection in CI
Catching slow queries before they reach production is much cheaper than fixing them after. A few practical approaches:
- Enable
log_min_duration_statementin PostgreSQL to log any query over a threshold (e.g., 100ms) in staging. Review the logs after each deployment. - Use
django-silkordjango-debug-toolbarduring development to surface slow queries and N+1 patterns on every page load. - Write tests that assert query count for critical views using Django's
assertNumQueriescontext manager. If a refactor adds unexpected queries, the test fails.
from django.test import TestCase
from myapp.models import Order, User
class OrderListQueryTest(TestCase):
def test_order_list_query_count(self):
user = User.objects.create_user(username="test", password="pass")
Order.objects.bulk_create(
[Order(user=user, status="pending") for _ in range(10)]
)
with self.assertNumQueries(2): # one for orders, one for user
list(Order.objects.filter(status="pending").select_related("user"))
Wrapping Up
Slow queries at scale almost always have a structural cause you can fix. Here are concrete next steps to take today:
- Pick your three slowest queries from production logs and run
EXPLAIN ANALYZEon each. Look forSeq Scanon large tables. - Audit your models for
__icontains,__iexact, and JSON field filters that lack matching functional or GIN indexes. - Replace any
Q(field=a) | Q(field=b)patterns withfield__in=[a, b]where the values are equivalent. - Add
select_relatedorprefetch_relatedto any queryset that accesses related objects in a loop. - Add
assertNumQueriestests for your most critical views so query regressions fail loudly in CI before they reach production.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!