Fixing Python sqlite3 That Inserts Duplicate Rows on Every Script Run
You run your Python script once and everything looks fine. You run it again and your row count has doubled. Run it a third time and it triples. This is one of the most common sqlite3 frustrations, and it happens because the default INSERT statement in SQLite has no memory β it will happily add the same data over and over.
The fix is not complicated, but you need to choose the right strategy for your situation. This guide walks through every realistic option with working code examples.
What You'll Learn
- Why sqlite3 inserts duplicates by default and what the real root cause is
- How to use
UNIQUEconstraints to enforce data integrity at the schema level - How to use
INSERT OR IGNOREandINSERT OR REPLACEto control conflict behavior - How to write a proper upsert with
ON CONFLICT DO UPDATE - How to handle composite uniqueness across multiple columns
Prerequisites
You need Python 3.6 or later (the sqlite3 module ships in the standard library, no install required). Basic familiarity with SQL CREATE TABLE and INSERT statements is assumed. If you have run into the related problem of queries returning no rows despite data being there, the article on fixing sqlite3 queries that return no rows covers that angle separately.
Understanding the Root Cause
SQLite does not treat a repeated INSERT as an update or a no-op. Every time you call cursor.execute("INSERT INTO ..."), the database engine creates a new row with a new internal rowid, regardless of whether a row with identical column values already exists.
Consider this minimal example:
import sqlite3
conn = sqlite3.connect("products.db")
cur = conn.cursor()
cur.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER,
name TEXT,
sku TEXT
)
""")
cur.execute("INSERT INTO products VALUES (1, 'Widget', 'WGT-001')")
conn.commit()
conn.close()
Run that script twice and you get two rows with identical data. Run it ten times and you get ten rows. SQLite assigned a unique internal rowid to each one, so from the engine's perspective they are all distinct. The fix requires you to tell SQLite what your definition of uniqueness is.
Adding a UNIQUE Constraint to Your Table
The most important thing you can do is declare uniqueness in the schema itself. This is the foundation that every other strategy builds on. Without it, the database cannot enforce anything automatically.
If you are creating a new table, add a UNIQUE constraint on the column (or columns) that identify a row:
cur.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY,
name TEXT,
sku TEXT UNIQUE
)
""")
The sku column is now a uniqueness anchor. If you try to insert a second row with sku = 'WGT-001', SQLite raises an IntegrityError by default. That is a good start β at least the database is protecting itself β but an unhandled exception will crash your script. You need to tell SQLite what to do on a conflict, not just detect it.
If your table already exists and you need to add a constraint after the fact, the cleanest approach is to rebuild it:
-- Run this in a migration script or a one-time setup block
CREATE TABLE products_new (
id INTEGER PRIMARY KEY,
name TEXT,
sku TEXT UNIQUE
);
INSERT OR IGNORE INTO products_new SELECT id, name, sku FROM products;
DROP TABLE products;
ALTER TABLE products_new RENAME TO products;
Using INSERT OR IGNORE to Skip Existing Rows
INSERT OR IGNORE tells SQLite: if this insert would violate a constraint, silently discard the new row and move on. It is the right tool when your existing data is already correct and you only want to add genuinely new rows.
import sqlite3
conn = sqlite3.connect("products.db")
cur = conn.cursor()
cur.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY,
name TEXT,
sku TEXT UNIQUE
)
""")
rows_to_insert = [
(1, 'Widget', 'WGT-001'),
(2, 'Gadget', 'GDG-002'),
(1, 'Widget', 'WGT-001'), # duplicate β will be silently skipped
]
cur.executemany(
"INSERT OR IGNORE INTO products (id, name, sku) VALUES (?, ?, ?)",
rows_to_insert
)
conn.commit()
print(f"Rows in table: {cur.execute('SELECT COUNT(*) FROM products').fetchone()[0]}")
conn.close()
Running this script any number of times will always leave exactly two rows in the table. The third item in the list hits the UNIQUE constraint on sku and is discarded without an error or an exception.
One important note: INSERT OR IGNORE also silently swallows other constraint violations, such as a NOT NULL breach. If you need finer control, catch sqlite3.IntegrityError explicitly around plain INSERT statements instead.
Using INSERT OR REPLACE (Upsert) to Overwrite Stale Data
INSERT OR REPLACE takes the opposite approach: when a conflict is detected, it deletes the conflicting row and inserts the new one. This means your data is always up to date, but you should know what the deletion does to related tables.
cur.executemany(
"INSERT OR REPLACE INTO products (id, name, sku) VALUES (?, ?, ?)",
rows_to_insert
)
The key difference from INSERT OR IGNORE: the existing row is physically removed and a fresh one is written. That means any columns not included in your insert statement get their default values, not their previous values. It also means foreign keys referencing the old row's rowid may break. Use INSERT OR REPLACE when you want the latest version of a row to win, and when you are confident about the downstream effects.
Checking Before You Insert with SELECT
Sometimes you need to apply custom logic before deciding whether to insert or skip. In those cases, you can query first and branch in Python:
def insert_if_not_exists(cur, product_id, name, sku):
existing = cur.execute(
"SELECT 1 FROM products WHERE sku = ?", (sku,)
).fetchone()
if existing is None:
cur.execute(
"INSERT INTO products (id, name, sku) VALUES (?, ?, ?)",
(product_id, name, sku)
)
return "inserted"
return "skipped"
This is readable, but it has a race condition if multiple processes share the same database file. For single-process scripts it is fine. For anything concurrent, rely on the database-level conflict clauses instead and let SQLite be the arbiter. The article on fixing slow PostgreSQL queries at scale touches on similar concurrency considerations that apply to SQLite in a lighter form.
Using ON CONFLICT DO UPDATE for Partial Upserts
SQLite 3.24 (released with Python 3.8 effectively) added the ON CONFLICT DO UPDATE clause, which is the SQL-standard upsert syntax. It gives you surgical control: insert the row if it does not exist, but if it does, update only the columns you specify.
cur.execute("""
INSERT INTO products (id, name, sku)
VALUES (:id, :name, :sku)
ON CONFLICT(sku) DO UPDATE SET
name = excluded.name
WHERE excluded.name != products.name
""", {"id": 1, "name": "Widget Pro", "sku": "WGT-001"})
The keyword excluded refers to the row that was proposed for insertion but was rejected by the conflict. This lets you reference the incoming values in your update expression. The optional WHERE clause means the update only fires when the incoming name actually differs β useful for avoiding unnecessary write amplification on rows that have not changed.
This is the most expressive option and the one you should reach for in production scripts where data may evolve over time.
Handling Composite Uniqueness
Sometimes a single column is not enough to define a unique record. Imagine a price-history table where (product_id, recorded_date) together identify a row, but neither column alone is unique.
cur.execute("""
CREATE TABLE IF NOT EXISTS price_history (
product_id INTEGER,
recorded_date TEXT,
price REAL,
UNIQUE(product_id, recorded_date)
)
""")
Now all the conflict strategies above work exactly the same way β they just evaluate the constraint against the combination of both columns. An ON CONFLICT DO UPDATE upsert references the composite key in the ON CONFLICT clause:
cur.execute("""
INSERT INTO price_history (product_id, recorded_date, price)
VALUES (?, ?, ?)
ON CONFLICT(product_id, recorded_date) DO UPDATE SET
price = excluded.price
""", (42, '2024-06-15', 9.99))
This is the pattern you want for any time-series or event-log style table. It is similar in spirit to the kind of deduplication problems you encounter when debugging row loss in a Pandas left join, where the join key also needs to be well-defined before you can reason about matches.
Common Pitfalls and Gotchas
Forgetting to commit
If you call conn.commit() at the end of one run but your script crashes midway through the next, partial rows from that second run may or may not be committed depending on where the crash happened. Use a context manager to make commit/rollback automatic:
with sqlite3.connect("products.db") as conn:
cur = conn.cursor()
cur.execute("INSERT OR IGNORE INTO products (id, name, sku) VALUES (?, ?, ?)",
(3, 'Doohickey', 'DOO-003'))
# conn.commit() is called automatically on exit; rollback on exception
Using autocommit mode accidentally
If you set isolation_level=None on the connection, sqlite3 goes into autocommit mode. Every statement commits immediately, which means a failed batch job leaves partial state. Keep the default isolation level unless you have a specific reason to change it.
Missing the UNIQUE index after schema changes
If you added a UNIQUE constraint to a new version of your CREATE TABLE IF NOT EXISTS statement, it will not be applied to an existing database file. The IF NOT EXISTS clause means the statement is skipped entirely when the table already exists. Use a migration script or check the schema version with PRAGMA user_version.
INSERT OR REPLACE silently resets auto-increment IDs
When INSERT OR REPLACE deletes and re-inserts a row, the new row gets a fresh rowid. If your application stores that rowid anywhere externally (a config file, another table, a cache), those references are now stale. ON CONFLICT DO UPDATE avoids this because it updates in place without changing the rowid.
Confusing Python-level deduplication with database-level deduplication
Filtering duplicates out of your Python list before calling executemany is not a substitute for a database constraint. Another script, a background job, or a future developer could insert via a different code path. Always enforce uniqueness at the schema level, and treat application-level filtering as an optimization on top of that, not a replacement for it. This mirrors good practices in working with openpyxl sheet management, where relying solely on code logic without validating the target structure leads to subtle, hard-to-trace bugs.
Wrapping Up
Duplicate rows in sqlite3 always trace back to the same root issue: the database does not know what uniqueness means for your data until you tell it. Here are the concrete steps to take:
- Add a UNIQUE constraint to any column (or set of columns) that should identify a row. Do this first β everything else depends on it.
- Switch your INSERT to INSERT OR IGNORE if you want to keep existing rows unchanged and skip any incoming duplicates.
- Use ON CONFLICT DO UPDATE when you need existing rows to be refreshed with new values without changing their
rowid. - Test your schema migrations explicitly β verify that
PRAGMA table_info(your_table)shows the unique flag on the right columns before deploying. - Use a context manager (
with sqlite3.connect(...) as conn) to ensure commits and rollbacks are handled cleanly on every run.
Once your schema enforces uniqueness and your insert statements respect conflict behavior, your script becomes safe to run any number of times without accumulating garbage data.
Frequently Asked Questions
Why does Python sqlite3 insert duplicate rows every time my script runs?
By default, SQLite's INSERT statement has no concept of whether a row already exists β it just creates a new row each time. Without a UNIQUE constraint on your table, nothing stops the database from accepting identical data repeatedly. You need to define uniqueness at the schema level and then use a conflict-aware insert strategy like INSERT OR IGNORE or ON CONFLICT DO UPDATE.
How do I stop sqlite3 from inserting duplicates without deleting existing rows?
Use INSERT OR IGNORE along with a UNIQUE constraint on the identifying column. When a conflicting row is detected, the new insert is silently discarded and the existing row remains untouched. This makes your script safe to run multiple times without changing data that is already correct.
What is the difference between INSERT OR IGNORE and INSERT OR REPLACE in SQLite?
INSERT OR IGNORE skips the incoming row entirely when a conflict occurs, leaving the existing row unchanged. INSERT OR REPLACE deletes the conflicting row and inserts the new one, which resets the rowid and any columns not included in your insert statement to their defaults. Use INSERT OR IGNORE to preserve existing data and INSERT OR REPLACE when you want the newest version to always win.
How do I add a UNIQUE constraint to an existing SQLite table in Python?
SQLite does not support ALTER TABLE ADD CONSTRAINT, so you need to recreate the table. Create a new table with the UNIQUE constraint, copy data into it using INSERT OR IGNORE INTO new_table SELECT ... FROM old_table, drop the old table, and rename the new one. Wrap the whole operation in a transaction so it is atomic.
Can I use ON CONFLICT DO UPDATE in Python's sqlite3 module?
Yes, as long as your SQLite version is 3.24 or later, which ships with Python 3.8 and above. You write the ON CONFLICT clause directly in your SQL string passed to cursor.execute() or cursor.executemany(), and the sqlite3 module handles it transparently.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!