Fixing Python sqlite3 That Returns Incorrect Results After Adding an Index

June 22, 2026 10 min read 2 views

You added a CREATE INDEX statement to your SQLite database expecting faster queries, and now your application is returning data that simply does not match what you put in. The results were correct before the index existed, and you have not changed any data. That is a disorienting place to be.

The good news is that there are only a handful of root causes, and each one has a clear fix. This guide walks through all of them systematically so you can get back to correct results fast.

What You'll Learn

  • How SQLite's query planner decides when to use an index
  • Why a stale or corrupt index silently returns wrong rows
  • How to diagnose which problem you actually have
  • How to rebuild an index safely from Python
  • How partial indexes and expression indexes can mislead the optimizer

Prerequisites

You should be comfortable with basic Python and the sqlite3 standard library. You need SQLite 3.8.9 or later (bundled with Python 3.4+). All examples use an in-memory or file-based database you can reproduce locally. No third-party packages are required.

How SQLite Uses Indexes Internally

SQLite stores each index as a separate B-tree that maps indexed column values to the corresponding rowid in the main table. When you run a query, the query planner decides whether scanning that B-tree is cheaper than a full table scan. If it chooses the index, it reads the B-tree to find matching rowids, then fetches the full rows from the table B-tree.

This two-step lookup means the index and the table must stay in sync. If they diverge for any reason β€” a crash, a schema change, or a bug in how data was written β€” the index points to stale or nonexistent rowids, and your query silently returns wrong rows or misses rows entirely.

SQLite also keeps statistics in a table called sqlite_stat1 (populated by ANALYZE). If those statistics are wildly out of date, the planner may choose the wrong access path, causing results that look incorrect even though the data itself is fine.

Common Causes of Incorrect Results After Indexing

1. Index built over data that was later modified outside SQLite

If you ever write directly to the database file with a tool that bypasses SQLite's write journal β€” a hex editor, a corrupted bulk import, or a file copy mid-write β€” the index B-tree can diverge from the table. The index says row 42 has the value "alice", but the table no longer does.

2. Stale statistics misleading the query planner

After a large insert or delete, sqlite_stat1 may show the table as having far fewer rows than it actually does. The planner then estimates that a full table scan is cheaper and ignores your index, or it uses the index when it should not, depending on the query shape. Either way, you can get rows in an order or count you did not expect.

3. Partial index with a WHERE clause that silently excludes rows

A partial index only covers rows that satisfy its WHERE clause. If you query without that same filter, SQLite cannot use the partial index for those rows, and the planner may fall back to a strategy that produces results you did not anticipate β€” especially if you combined the index with a query rewrite.

4. Expression index mismatch

SQLite can index the result of an expression, such as LOWER(email). The planner only uses that index when the query expression exactly matches the indexed expression. A tiny difference in casing or function call means the index is skipped, and the query runs differently than you designed.

5. Corruption from an unclean shutdown

A crash while SQLite was mid-write can leave the WAL or rollback journal in an inconsistent state. SQLite usually recovers automatically on the next open, but in rare cases the index pages can be left dirty. Running PRAGMA integrity_check is the fastest way to confirm this.

Diagnosing the Problem Step by Step

Start with the integrity check. Open your database and run this before anything else:

import sqlite3

con = sqlite3.connect("mydb.db")
cur = con.cursor()
cur.execute("PRAGMA integrity_check")
rows = cur.fetchall()
for row in rows:
    print(row)
con.close()

If the output is anything other than ('ok',), you have corruption. Skip to the section on rebuilding the index. If it says ok, continue below.

Next, check whether the query planner is actually using your index:

import sqlite3

con = sqlite3.connect("mydb.db")
cur = con.cursor()
cur.execute("EXPLAIN QUERY PLAN SELECT * FROM orders WHERE customer_id = 101")
for row in cur.fetchall():
    print(row)
con.close()

Look for USING INDEX in the output. If you see SCAN orders with no mention of your index, the planner is ignoring it. That points to stale statistics or an expression mismatch.

Now check whether your statistics are fresh:

import sqlite3

con = sqlite3.connect("mydb.db")
cur = con.cursor()
cur.execute("SELECT * FROM sqlite_stat1")
for row in cur.fetchall():
    print(row)
con.close()

If sqlite_stat1 does not exist, or the row counts look nothing like your actual table size, run ANALYZE and test your query again.

Fixing a Stale or Corrupt Index

The safest fix for both stale and corrupt indexes is REINDEX. It drops and rebuilds every index on the specified table (or the entire database if you call it with no argument) by reading directly from the table data, so the result is guaranteed to match.

import sqlite3

con = sqlite3.connect("mydb.db")
cur = con.cursor()

# Rebuild all indexes on one table
cur.execute("REINDEX orders")

# Or rebuild every index in the database
# cur.execute("REINDEX")

con.commit()
con.close()
print("Index rebuilt successfully.")

After rebuilding, run ANALYZE so the query planner gets fresh statistics:

import sqlite3

con = sqlite3.connect("mydb.db")
cur = con.cursor()
cur.execute("ANALYZE")
con.commit()
con.close()
print("Statistics updated.")

If the database was corrupt and integrity_check reported errors, the most reliable path is to export your data and reimport it into a fresh database file. REINDEX can only fix the index; it cannot repair damaged table pages. Use .dump via the SQLite CLI or dump via Python if the table data is still readable:

import sqlite3

src = sqlite3.connect("mydb.db")
dst = sqlite3.connect("mydb_clean.db")

with dst:
    for line in src.iterdump():
        dst.execute(line)

src.close()
dst.close()
print("Database cloned into mydb_clean.db")

This is similar to the approach described in the guide on fixing sqlite3 INSERT operations that never persist data to disk, where ensuring write integrity is the core of the fix.

Fixing a Query the Optimizer Refuses to Use

If EXPLAIN QUERY PLAN shows the index is being ignored and ANALYZE did not help, you may be hitting a genuine optimizer reluctance. SQLite's planner uses a cost model, and for small tables it often prefers a full scan even when an index exists. Confirm the table has enough rows that the index should matter (typically more than a few thousand rows).

If the table is large and the planner still ignores the index, check that your query predicate exactly matches the indexed column. Functions on the left side of a comparison prevent index use:

-- Index on email is NOT used here
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- Index on email IS used here
SELECT * FROM users WHERE email = 'alice@example.com';

-- Index on LOWER(email) is used only when the expression matches exactly
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- ... but only if the index was created as: CREATE INDEX idx ON users(LOWER(email));

You can also use INDEXED BY to force a specific index during testing, which isolates whether the index itself is correct:

import sqlite3

con = sqlite3.connect("mydb.db")
cur = con.cursor()
cur.execute(
    "SELECT * FROM orders INDEXED BY idx_orders_customer_id "
    "WHERE customer_id = 101"
)
print(cur.fetchall())
con.close()

If that query returns the right results, the index is good but the planner was choosing a suboptimal plan. If it returns wrong results, the index data itself is the problem and you need REINDEX.

Fixing Partial Index Mistakes

A partial index defined with a WHERE clause only indexes the rows that satisfy that condition. Consider this setup:

import sqlite3

con = sqlite3.connect("mydb.db")
cur = con.cursor()
cur.execute("""
    CREATE TABLE IF NOT EXISTS orders (
        id INTEGER PRIMARY KEY,
        customer_id INTEGER,
        status TEXT
    )
""")
# Partial index: only active orders
cur.execute("""
    CREATE INDEX IF NOT EXISTS idx_active_orders
    ON orders(customer_id)
    WHERE status = 'active'
""")
con.commit()
con.close()

If you then query without the status = 'active' filter, SQLite cannot use this index for rows where status is anything else. Results may appear missing or incomplete. The fix is either to drop the partial index and create a full one, or to ensure every query that should benefit from the index includes the matching filter condition:

import sqlite3

con = sqlite3.connect("mydb.db")
cur = con.cursor()

# This query can use the partial index
cur.execute(
    "SELECT * FROM orders WHERE customer_id = 101 AND status = 'active'"
)
print(cur.fetchall())

# This query CANNOT use the partial index β€” results may be incomplete
cur.execute(
    "SELECT * FROM orders WHERE customer_id = 101"
)
print(cur.fetchall())

con.close()

Partial indexes are a powerful optimization, but they require discipline. Document the intended query pattern in a comment next to the CREATE INDEX statement so future you does not spend an afternoon puzzled by missing rows.

If you have run into other unexpected behaviors with SQLite in Python, the breakdown of how sqlite3 can return stale data after a commit covers a related class of consistency bugs that can also look like index problems.

When the Query Plan Looks Right but Results Are Still Wrong

Occasionally you will find that EXPLAIN QUERY PLAN shows the correct index being used, integrity_check passes, and statistics are fresh β€” yet the query still returns rows you do not expect. In these cases, the bug is almost never in SQLite itself. Check the following:

  • Connection isolation: another connection in the same process may have written uncommitted data that your connection cannot see yet, or has already committed data your connection has cached. Review transaction boundaries carefully.
  • Collation mismatch: SQLite's default text collation is case-sensitive binary. An index built on name with binary collation will return different results for WHERE name = 'Alice' vs WHERE name = 'alice'. If your data was inserted with mixed case, the index and your query filter may disagree on which rows match.
  • Type affinity: SQLite uses dynamic typing. If an integer was stored as text (e.g., '101' instead of 101), a comparison against an integer literal may or may not match depending on the column's declared affinity and the index's stored type.

For collation and type issues, inspect the raw stored values with SELECT typeof(customer_id), customer_id FROM orders LIMIT 5 to see exactly what SQLite recorded.

Concurrent write scenarios can also produce surprising results. The article on fixing sqlite3 OperationalError on concurrent writes covers how to structure connections to avoid these conflicts.

Common Pitfalls to Avoid

  • Creating an index mid-transaction: If you create an index inside a transaction that also inserts data, and the transaction rolls back, the index is gone but you may not notice immediately. Always create indexes in their own explicit transaction.
  • Forgetting ANALYZE after bulk inserts: Loading thousands of rows and then querying immediately gives the planner outdated statistics. Run ANALYZE as part of your ETL or import script.
  • Relying on REINDEX to fix corruption in the table pages: REINDEX only rebuilds index B-trees from table data. If the table pages themselves are corrupt, REINDEX will silently propagate the bad data into the new index.
  • Using the same database file from multiple processes without WAL mode: The default journal mode can cause read inconsistencies under concurrent access. Enable WAL mode with PRAGMA journal_mode=WAL when multiple readers and one writer are involved.
  • Not testing with representative data volumes: SQLite's planner behaves differently on a 100-row dev table versus a 10-million-row production table. An index that gets used correctly in dev may be bypassed in production once statistics shift.

A related trap happens when you alter the table schema after creating an index. The walkthrough on fixing sqlite3 wrong values after ALTER TABLE explains how column renaming and schema changes can leave indexes in an inconsistent state.

Next Steps

Here are four concrete actions to take right now:

  1. Run PRAGMA integrity_check on your database and confirm the output is ok before trusting any query results.
  2. Run ANALYZE after every significant bulk insert or delete, and add it to your maintenance scripts.
  3. Use EXPLAIN QUERY PLAN on every query that touches an indexed column to verify the planner is using the index you intend.
  4. If results are still wrong after the above, run REINDEX on the affected table and re-test immediately.
  5. Document every partial index with its intended query pattern as a SQL comment so the constraint is explicit and reviewable.

Frequently Asked Questions

Why does my SQLite query return wrong rows only after I added an index?

When an index becomes out of sync with the table data β€” due to a crash, a schema change, or stale statistics β€” the query planner can follow index pointers to incorrect or missing rows. Running REINDEX on the affected table and then ANALYZE will rebuild the index from the actual table data and update the planner's statistics.

How do I force SQLite to use a specific index in a Python query?

Use the INDEXED BY clause in your SQL statement, for example SELECT * FROM orders INDEXED BY idx_orders_customer_id WHERE customer_id = 101. This forces the query planner to use that index and lets you isolate whether the index itself contains correct data.

Can REINDEX fix SQLite database corruption?

REINDEX can only fix corruption in index B-tree pages by rebuilding them from the table data. If the underlying table pages are also corrupted, REINDEX will propagate bad data into the new index. Use PRAGMA integrity_check first to identify the scope of corruption, and use iterdump to export and reimport data if table pages are affected.

Does running ANALYZE slow down my SQLite database?

ANALYZE reads the index B-trees to collect row distribution statistics and writes the results to sqlite_stat1. On large databases it takes a few seconds, but it runs as a read operation and does not block other readers. Running it after bulk inserts or deletes is generally worth the brief overhead because it keeps the query planner's decisions accurate.

Why would a partial index cause missing rows in my SQLite results?

A partial index only covers rows matching its WHERE clause, so the planner can only use it for queries that include the same filter condition. If your query does not include that filter, SQLite may choose a different plan that returns unexpected results or simply misses rows the partial index would have excluded from consideration.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.