Fixing PostgreSQL EXPLAIN ANALYZE That Shows Seq Scan Despite an Index
You created an index on a column, ran EXPLAIN ANALYZE, and PostgreSQL is still doing a full sequential scan. The index is right there β you can see it in \d tablename β but the planner is ignoring it. This is one of the most common PostgreSQL debugging sessions developers go through, and there are almost always fixable causes.
Why PostgreSQL Ignores Your Index (And It's Not a Bug)
PostgreSQL's query planner is cost-based. It estimates the cost of every available plan and picks the cheapest one. A sequential scan reads pages in order from disk and is extremely efficient when it needs to touch a large fraction of the table. An index scan has overhead: it reads the index tree, then jumps around the heap to fetch rows. If the planner calculates that the index path costs more, it won't use it β and sometimes it's right.
The trick is distinguishing when the planner is making a correct choice from when it's working from bad information. The six reasons below cover both cases.
What You'll Learn
- How to read the critical numbers in
EXPLAIN ANALYZEoutput - The six most common reasons PostgreSQL skips an index
- How to fix stale statistics, type mismatches, and low-selectivity columns
- How to safely force index usage for diagnostic purposes
- When a sequential scan is genuinely the right answer
Prerequisites
You need access to a PostgreSQL database (version 12 or later covers everything here), and you should be able to run EXPLAIN ANALYZE and \d tablename in psql or a query tool like pgAdmin. No extensions are required.
Reading the EXPLAIN ANALYZE Output First
Before chasing a fix, make sure you're reading the plan correctly. Run your query with both EXPLAIN ANALYZE and the BUFFERS option:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT *
FROM orders
WHERE customer_id = 42;
The line you care about looks like this when it's doing a sequential scan:
Seq Scan on orders (cost=0.00..4821.00 rows=1 width=128)
(actual time=0.041..18.432 rows=1 loops=1)
Filter: (customer_id = 42)
Rows Removed by Filter: 99999
Pay attention to three numbers: the estimated rows (the planner's guess), the actual rows (what really came back), and Rows Removed by Filter. A large gap between estimated and actual rows is a strong signal that your statistics are stale. The Buffers section tells you how many 8 KB pages were read from shared memory versus disk.
Reason 1: The Table Is Too Small
If your table fits in a handful of data pages, PostgreSQL will almost always choose a sequential scan. Reading a few pages sequentially is cheaper than navigating a B-tree index and then jumping to heap pages. This is correct behavior, not a bug.
Check the table size:
SELECT pg_size_pretty(pg_relation_size('orders')) AS table_size,
reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname = 'orders';
If the table has fewer than a few thousand rows and fits in under a megabyte, the planner is probably right. The fix here is patience β once the table grows, the planner will switch to an index scan automatically. If you need index scans in a test environment with tiny data sets, see the diagnostic section on enable_seqscan below.
Reason 2: Stale Planner Statistics
PostgreSQL's autovacuum daemon runs ANALYZE in the background to collect statistics about column distributions. If your table received a large batch insert or delete recently, the planner may be working from outdated row counts and value histograms, leading it to badly misjudge selectivity.
The gap between estimated and actual rows in EXPLAIN ANALYZE is the telltale sign. Fix it by running ANALYZE manually:
ANALYZE orders;
Then re-run your EXPLAIN ANALYZE. In most cases where stale statistics were the culprit, the estimated rows will now be close to the actual rows and the planner will switch to an index scan on its own.
If your table is constantly receiving bulk loads, consider tuning autovacuum_analyze_scale_factor for that specific table to trigger analysis more aggressively:
ALTER TABLE orders SET (autovacuum_analyze_scale_factor = 0.01);
This tells autovacuum to run ANALYZE after 1% of the table changes, instead of the default 20%. For a table with millions of rows, that's a meaningful difference. You can find related patterns in how PostgreSQL handles data integrity in bulk operations in this article on fixing PostgreSQL foreign key constraints that silently fail on bulk insert.
Reason 3: Low Cardinality β The Index Isn't Selective Enough
If the column you're filtering on has few distinct values relative to the number of rows, an index on that column provides almost no benefit. Imagine a status column with three possible values β 'pending', 'active', 'closed' β spread across a million rows. A query for WHERE status = 'active' might return 400,000 rows. Reading 400,000 index entries and then fetching 400,000 heap pages is slower than one clean sequential scan.
Check the actual cardinality of your column:
SELECT n_distinct, correlation
FROM pg_stats
WHERE tablename = 'orders'
AND attname = 'status';
A value close to 0 for n_distinct (or a small positive number like 3) means the column has very few distinct values. The planner is almost certainly making the right call here. Your options are:
- Create a partial index targeting only the rare value you actually query for:
CREATE INDEX ON orders (status) WHERE status = 'pending'. This works if one value is rare and you query it specifically. - Reconsider whether the filter should be combined with a more selective column in a composite index.
- Accept that a sequential scan is correct for high-selectivity queries on this column.
Reason 4: The Query Doesn't Match the Index Definition
PostgreSQL can only use a B-tree index when the query's predicate aligns with how the index was built. A few common mismatches:
Function applied to the indexed column
If you wrap the column in a function, PostgreSQL can't use the plain index on that column:
-- This will NOT use an index on email:
SELECT * FROM users WHERE lower(email) = 'alice@example.com';
The fix is a functional index that matches the expression exactly:
CREATE INDEX idx_users_email_lower ON users (lower(email));
Leading column not included in a composite index
A composite index on (last_name, first_name) cannot be used for a query filtering only on first_name. The index is ordered by last_name first. Either add a separate index on first_name alone, or reorder the composite index if first_name is the more common filter.
LIKE with a leading wildcard
A predicate like WHERE name LIKE '%smith' cannot use a standard B-tree index because the prefix is unknown. A leading wildcard forces a full scan. If you need substring search, look at pg_trgm trigram indexes or a full-text search approach instead.
Reason 5: The Column Type or Collation Mismatch
PostgreSQL uses strict type matching when evaluating index conditions. If the literal or bind parameter type doesn't match the indexed column's type, the planner inserts an implicit cast β and that cast breaks index usability.
A classic example is comparing an integer column to a string literal:
-- customer_id is INTEGER, but the literal is a string:
SELECT * FROM orders WHERE customer_id = '42';
PostgreSQL will cast '42' to integer here, and in most cases it's smart enough to still use the index. But the reverse β storing an ID as varchar and comparing to an integer β can prevent index use. Always make sure your application is sending the right type. In ORMs, check that the parameter is bound as an integer, not coerced to a string.
Collation mismatches on text columns can have the same effect. If the index was created with a specific collation and your query uses a different one, the index won't be used. Check with:
SELECT attname, atttypmod, attcollation
FROM pg_attribute
WHERE attrelid = 'orders'::regclass
AND attname = 'status';
Reason 6: You're Fetching Too Many Rows
Even on a highly selective column, if your query returns a large percentage of the table's rows, the planner will prefer a sequential scan. The crossover point depends on your storage configuration, but a rough mental model is: once a query needs more than 5β10% of a large table's rows, a sequential scan becomes competitive with random heap page reads.
This is especially relevant for range queries:
-- If this date range covers 60% of the table, expect a Seq Scan:
SELECT * FROM events WHERE created_at > now() - interval '1 year';
The solutions here are query-level: narrow your date range, add additional filters to reduce the result set, or use a covering index (INCLUDE clause) so PostgreSQL can answer the query from the index alone without hitting the heap.
This is also where correlation in pg_stats matters. If correlation is close to 1.0, the physical order of rows matches the index order, making index scans much cheaper. If it's close to 0 (random physical order), random heap reads become expensive and the planner favors sequential scans at a lower row count. Running CLUSTER orders USING idx_orders_customer_id rewrites the table in index order, improving correlation β at the cost of a table lock and significant I/O.
Forcing the Planner (For Diagnosis Only)
When you need to confirm that the index actually helps (or doesn't), you can disable sequential scans for the current session and re-run the query:
SET enable_seqscan = off;
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 42;
SET enable_seqscan = on; -- Always reset when you're done
With enable_seqscan = off, the planner will use the index if one exists and can serve the query. Compare the actual execution time in both plans. If the forced index scan is slower, the planner was right all along. If it's significantly faster, you likely have a statistics problem β run ANALYZE and check again.
Important: Never set
enable_seqscan = offat the database or role level in production. It is a diagnostic tool. Leaving it off globally causes the planner to pick index scans even when sequential scans would be far faster, degrading the entire cluster.
For a related pattern where the PostgreSQL planner produces wrong results from query structure rather than scan choice, the article on fixing PostgreSQL window functions that return incorrect partition results covers how planner decisions interact with more complex query shapes.
Common Pitfalls
- Running EXPLAIN without ANALYZE. Plain
EXPLAINshows estimates only. Always useEXPLAIN ANALYZE(on a staging copy if the query mutates data) so you see what actually happened. - Creating an index and immediately querying. Freshly created indexes don't automatically trigger a statistics refresh. Run
ANALYZEafter bulk-loading data. - Indexing the wrong table in a JOIN. If your plan shows a Seq Scan on the driving table of a nested loop join, check whether the join condition columns on both tables are indexed.
- Ignoring
work_mem. Some plans switch from index scans to hash joins or sort+merge joins whenwork_memis increased. Bumpingwork_memfor a session can change the plan in unexpected ways if you're diagnosing interactively. - Confusing an index existing with an index being usable. A unique index on a nullable column still has limitations. An index on an expression only helps if the query uses the exact same expression.
If you've been troubleshooting a related UPSERT operation that's silently skipping rows β sometimes caused by the same statistics issues β the article on fixing PostgreSQL UPSERT that silently skips updates on conflict is worth reading alongside this one.
Wrapping Up
A sequential scan in EXPLAIN ANALYZE when you expected an index scan almost always has an explainable cause. Work through this checklist:
- Run
ANALYZE tablenameand re-check the plan. Stale statistics are the most common culprit. - Check
pg_statsforn_distinctandcorrelationon the filtered column to understand selectivity. - Verify your query predicate matches the index definition exactly β no wrapping functions, correct leading column in composites, no leading wildcards in LIKE.
- Use
SET enable_seqscan = offin a test session to force the index path and compare actual execution times. - If the table is small or the query returns a large fraction of rows, accept the sequential scan β the planner may be right.
Frequently Asked Questions
Why does PostgreSQL use a sequential scan even when an index exists on the column?
PostgreSQL's cost-based planner estimates whether reading the index plus random heap pages is cheaper than a single sequential pass through the table. If the table is small, the query returns many rows, or planner statistics are stale, it will choose a sequential scan even when an index exists.
How do I force PostgreSQL to use an index instead of a sequential scan?
You can disable sequential scans for your session with SET enable_seqscan = off before running your query. This is a diagnostic tool only β always reset it with SET enable_seqscan = on afterward and never apply it globally in production.
Will running ANALYZE fix a PostgreSQL query that ignores an index?
Running ANALYZE refreshes the planner's statistics about column distributions and row counts. If the plan was wrong because of stale statistics β which is the most common cause β the planner will often choose the index scan automatically after ANALYZE completes.
Can a type mismatch between a query parameter and a column prevent index use in PostgreSQL?
Yes. If the data type of your filter literal or bind parameter doesn't match the indexed column, PostgreSQL may insert an implicit cast that prevents index usage. Always ensure your application binds parameters with the correct type matching the column definition.
What is a partial index in PostgreSQL and when should I use one instead of a full index?
A partial index is built over a subset of rows defined by a WHERE clause, for example CREATE INDEX ON orders (status) WHERE status = 'pending'. Use one when a column has low cardinality overall but one specific value is rare and frequently queried, making a full-column index too broad to be selective.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!