Diagnosing and Fixing N+1 Query Problems in SQLAlchemy ORM
Your endpoint feels fine in development with a handful of rows, but under real traffic the database is getting hammered with hundreds of tiny queries per request. The culprit is almost always an N+1 query problem, and SQLAlchemy's lazy-loading defaults make it easy to introduce one without noticing.
This article walks you through identifying the problem, understanding why SQLAlchemy generates those extra queries, and applying the right fix for your situation.
What you'll learn
- How to enable query logging to catch N+1 patterns in development
- Why lazy loading causes N+1 problems and when it's actually fine
- How to use
selectinload,joinedload, andsubqueryloadto eliminate excess queries - How to apply eager loading on a per-query basis without changing your model definitions
- Common mistakes that bring N+1 problems back after you've fixed them
Prerequisites
You should be comfortable writing SQLAlchemy ORM queries and have a basic understanding of Python. The examples use SQLAlchemy 2.x with the select() style API, but the loading strategies apply to 1.4 as well. A running PostgreSQL or SQLite database helps if you want to follow along.
What an N+1 Problem Actually Looks Like
The name comes from the query count: one query to fetch N parent rows, then one additional query for each of those rows to fetch related data. Fetch 100 posts and their authors, and you've just fired 101 queries instead of one or two.
Here's a minimal example. Say you have Post and Author models with a many-to-one relationship:
from sqlalchemy import Column, Integer, String, ForeignKey
from sqlalchemy.orm import relationship, DeclarativeBase
class Base(DeclarativeBase):
pass
class Author(Base):
__tablename__ = "authors"
id = Column(Integer, primary_key=True)
name = Column(String)
class Post(Base):
__tablename__ = "posts"
id = Column(Integer, primary_key=True)
title = Column(String)
author_id = Column(Integer, ForeignKey("authors.id"))
author = relationship("Author") # lazy="select" by default
Now you query all posts and print each author's name:
from sqlalchemy import select
posts = session.scalars(select(Post)).all()
for post in posts:
print(post.author.name) # triggers a new query for every post
SQLAlchemy fetches all posts in one query. Then, the moment you access post.author inside the loop, it fires a separate SELECT for each post's author. That's the N+1.
Turning On Query Logging to Confirm It
The fastest way to confirm you have an N+1 problem is to watch the SQL SQLAlchemy generates. Set the engine's echo flag to True during development:
from sqlalchemy import create_engine
engine = create_engine("postgresql+psycopg2://user:pass@localhost/mydb", echo=True)
Every SQL statement will print to stdout. If you see the same SELECT repeated with different primary key values, you've found your N+1. You can also route logging through Python's standard logging module if you want more control:
import logging
logging.basicConfig()
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)
In production, consider using a query counting middleware or a profiling tool like SQLAlchemy's event system to track query counts per request without the full log noise.
Why Lazy Loading Is the Default
SQLAlchemy defaults to lazy="select" loading on relationships because it's the safest general choice. When you load a model, SQLAlchemy doesn't know whether you'll ever need the related objects. Fetching everything eagerly by default would make simple lookups wasteful.
Lazy loading is perfectly fine when you access a relationship on a single object. The problem appears when you iterate over a collection and touch the relationship for every item. That's when the query count multiplies.
Fixing It with selectinload
selectinload is usually the right default fix. It issues a second query that fetches all related objects in one shot using an IN clause, then maps them back to the parent objects in Python. No join required, and the result sets stay manageable.
from sqlalchemy.orm import selectinload
posts = session.scalars(
select(Post).options(selectinload(Post.author))
).all()
for post in posts:
print(post.author.name) # no additional queries fired here
SQLAlchemy now executes exactly two queries: one for posts, one for all the authors whose IDs appeared in the first result. It doesn't matter if you have 10 or 10,000 posts.
selectinload works well for both many-to-one and one-to-many relationships. It keeps query complexity low and avoids the row duplication issues you get with joins on collection relationships.
When to Use joinedload Instead
joinedload uses a SQL JOIN to fetch the parent and related rows in a single query. That sounds more efficient, but it comes with trade-offs you need to understand.
from sqlalchemy.orm import joinedload
posts = session.scalars(
select(Post).options(joinedload(Post.author))
).all()
For a many-to-one relationship (each post has one author), joinedload is a solid choice. The join produces exactly one row per post and there's no duplication.
For a one-to-many relationship (each author has many posts), the join multiplies rows. Fetch 5 authors, each with 20 posts, and the database returns 100 rows instead of 25. SQLAlchemy deduplicates them in Python, but you're still transferring more data over the wire. For large collections, selectinload is almost always better.
Use joinedload when the relationship is many-to-one or one-to-one and you're confident the result set won't balloon.
subqueryload: The Third Option
subqueryload issues a second query that embeds the original query as a subquery. It was more relevant in older SQLAlchemy versions; in most modern cases selectinload performs at least as well and generates simpler SQL. You may still encounter subqueryload in existing codebases, so it's worth knowing what it does.
from sqlalchemy.orm import subqueryload
posts = session.scalars(
select(Post).options(subqueryload(Post.comments))
).all()
If you're starting fresh, default to selectinload and only reach for subqueryload if profiling shows a reason.
Chaining Loaders for Nested Relationships
Real applications often have multiple levels of relationships. You can chain loaders to eagerly load nested associations without writing joins by hand.
from sqlalchemy.orm import selectinload
# Load posts -> comments -> comment authors
posts = session.scalars(
select(Post).options(
selectinload(Post.comments).selectinload(Comment.author)
)
).all()
Each level fires one additional query, so three levels deep means three queries total instead of exponentially more. Keep nesting in check β if you need four or five levels of eager loading, that's usually a sign the query or the data model needs rethinking.
Setting Load Strategies on the Model vs. Per Query
You can set a default loading strategy directly on the relationship definition:
from sqlalchemy.orm import relationship
class Post(Base):
__tablename__ = "posts"
id = Column(Integer, primary_key=True)
title = Column(String)
author_id = Column(Integer, ForeignKey("authors.id"))
author = relationship("Author", lazy="selectin") # always eager
This is convenient but blunt. Every query that touches Post will now eagerly load authors, even when you don't need them. That can create unnecessary overhead in queries where you only need post titles.
The better pattern for most applications is to keep the default lazy loading and apply eager loading explicitly in the queries that need it. That way you only pay for what you actually use.
Common Pitfalls That Bring N+1 Problems Back
Accessing relationships outside the session
If you load objects and then close the session before iterating, SQLAlchemy can't issue the lazy-load queries. You'll get a DetachedInstanceError. The fix is to eagerly load everything you need before the session closes, not to keep the session open longer than necessary.
Serializers triggering lazy loads
Libraries like Pydantic or Marshmallow will access every field you include in a schema. If your schema includes a nested relationship and you haven't eager-loaded it, the serializer fires a query per object. Add selectinload to any query feeding a serializer that touches relationships.
Forgetting about secondary relationships
You fix the first level but forget there's a second. Now you have a 1+N+NΒ² pattern. Use query logging every time you touch a new code path that involves relationships β don't assume a fix at one level covers the whole chain.
Mixing eager and lazy loading on the same relationship
If you set lazy="selectin" on the model but also call .options(lazyload(Post.author)) in a query, the per-query option wins. That's useful to know, but it also means a careless lazyload override can reintroduce the problem you just fixed.
Wrapping Up
N+1 query problems are quiet in development and painful in production. The steps to get ahead of them are straightforward:
- Enable SQL logging during development on any endpoint that reads collections with relationships. Count the queries.
- Apply
selectinloadas your default fix for most one-to-many and many-to-one relationships. Switch tojoinedloadonly for single-object lookups or simple many-to-one cases. - Chain loaders for nested relationships rather than hoping lazy loading will be fast enough.
- Audit your serializers β they're a common hidden trigger for lazy loads that don't show up until you profile a real request.
- Keep load strategies in queries, not model definitions, so you only pay for eager loading where it's actually needed.
With these habits in place, your SQLAlchemy app will scale to real data volumes without the database becoming the bottleneck.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!