Fixing Silently Missing Rows After SQLAlchemy Bulk Insert with on_conflict_do_nothing
You fire off a bulk insert through SQLAlchemy, the call returns without an exception, and your row count is lower than you expected. No error. No warning. Just missing data. This is the classic silent failure mode of on_conflict_do_nothing, and if you haven't seen it before, it can burn hours of debugging time.
- Why
on_conflict_do_nothingsilently drops rows and returns no error - How to detect exactly which rows were skipped
- How to log or re-route skipped rows without blocking your insert pipeline
- Patterns for testing that your conflict handling is working correctly
Prerequisites
This article assumes you're using SQLAlchemy 1.4 or 2.x with a PostgreSQL backend (the psycopg2 or asyncpg driver). The insert().on_conflict_do_nothing() construct is PostgreSQL-specific via sqlalchemy.dialects.postgresql. If you're on MySQL, the equivalent is INSERT IGNORE and the same concepts apply with slightly different syntax.
Why on_conflict_do_nothing Drops Rows Without Telling You
When PostgreSQL encounters a constraint violation during an INSERT, the default behavior is to raise an error and abort the transaction. ON CONFLICT DO NOTHING changes that contract: instead of erroring, PostgreSQL silently skips any row that would violate the constraint and continues inserting the rest.
SQLAlchemy faithfully passes this behavior through. The result object you get back has a rowcount attribute, but here's the catch: rowcount reflects how many rows were actually inserted, not how many you tried to insert. There is no separate count of skipped rows built into the result.
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine, Table, MetaData, Column, Integer, String
engine = create_engine("postgresql+psycopg2://user:pass@localhost/mydb")
meta = MetaData()
users = Table(
"users", meta,
Column("id", Integer, primary_key=True),
Column("email", String, unique=True),
Column("name", String),
)
rows = [
{"id": 1, "email": "alice@example.com", "name": "Alice"},
{"id": 2, "email": "bob@example.com", "name": "Bob"},
{"id": 1, "email": "alice@example.com", "name": "Alice Duplicate"}, # conflict
]
with engine.begin() as conn:
stmt = insert(users).on_conflict_do_nothing(index_elements=["email"])
result = conn.execute(stmt, rows)
print(result.rowcount) # prints 2, not 3 β no error raised
Two rows land in the table. The third is silently discarded. result.rowcount says 2. Nothing in the exception chain tells you about the skipped row.
The Difference Between rowcount and len(data)
The simplest detection technique is comparing what you sent against what was recorded. Before you insert, take the length of your input list. After the insert, read result.rowcount. If they differ, you have skipped rows.
with engine.begin() as conn:
stmt = insert(users).on_conflict_do_nothing(index_elements=["email"])
result = conn.execute(stmt, rows)
attempted = len(rows)
inserted = result.rowcount
skipped = attempted - inserted
if skipped > 0:
print(f"{skipped} row(s) were skipped due to conflicts")
This tells you how many were skipped, but not which ones. For a pipeline that just needs a health metric, this is often enough. For anything that requires auditability, you need more.
Identifying Which Rows Were Skipped
To find the exact rows that didn't make it in, you need to cross-reference your input against what actually exists in the table. The cleanest way is a post-insert query using the conflict column as the lookup key.
from sqlalchemy import select
def bulk_insert_with_skipped(conn, table, rows, conflict_col):
stmt = insert(table).on_conflict_do_nothing(index_elements=[conflict_col])
result = conn.execute(stmt, rows)
if result.rowcount == len(rows):
return rows, [] # nothing skipped
# Find which keys are already in the table
keys = [r[conflict_col] for r in rows]
existing = conn.execute(
select(table.c[conflict_col]).where(table.c[conflict_col].in_(keys))
).scalars().all()
existing_set = set(existing)
skipped = [r for r in rows if r[conflict_col] in existing_set]
inserted = [r for r in rows if r[conflict_col] not in existing_set]
return inserted, skipped
with engine.begin() as conn:
inserted, skipped = bulk_insert_with_skipped(conn, users, rows, "email")
print("Skipped:", skipped)
This approach adds one extra query per bulk call. For large batches, the IN clause could get expensive if you're pushing tens of thousands of keys β in that case, consider chunking your inserts or using a temporary table for the cross-reference.
Using RETURNING to Get Inserted Row IDs Directly
PostgreSQL supports a RETURNING clause that tells you exactly which rows came back from the insert. SQLAlchemy exposes this through the .returning() modifier. Rows that were skipped by on_conflict_do_nothing simply don't appear in the returned result set.
stmt = (
insert(users)
.on_conflict_do_nothing(index_elements=["email"])
.returning(users.c.id, users.c.email)
)
with engine.begin() as conn:
result = conn.execute(stmt, rows)
inserted_rows = result.fetchall()
inserted_emails = {row.email for row in inserted_rows}
all_emails = {r["email"] for r in rows}
skipped_emails = all_emails - inserted_emails
print("Inserted:", inserted_emails)
print("Skipped:", skipped_emails)
This is the most precise approach. You get the database-assigned state of every row that landed, and anything missing from that set was skipped. Note that RETURNING adds a small overhead but is generally the right trade-off when you need exact audit data.
Handling Skipped Rows: Log, Queue, or Raise
Once you can identify skipped rows, you need a policy for what to do with them. The right choice depends on your use case.
Log them for auditing
If duplicates are expected but you need a record, write the skipped rows to a dedicated audit log or dead-letter table. Avoid writing to application logs for anything beyond low-volume diagnostic use β structured storage is much easier to query later.
def log_skipped_rows(conn, skipped):
if skipped:
# write to an audit table instead of stdout
conn.execute(
insert(skipped_audit_table),
[{"email": r["email"], "reason": "duplicate"} for r in skipped]
)
Raise an error if skips are unexpected
If your business logic guarantees no duplicates should exist at insert time, treat any skip as a bug and raise immediately.
if skipped:
raise ValueError(f"Unexpected conflicts on {len(skipped)} row(s): {skipped}")
Re-queue for upsert
Sometimes a skipped row means the existing record needs to be updated rather than ignored. In that case, pull the skipped rows and run a second pass with on_conflict_do_update.
if skipped:
upsert_stmt = (
insert(users)
.on_conflict_do_update(
index_elements=["email"],
set_={"name": insert(users).excluded.name},
)
)
conn.execute(upsert_stmt, skipped)
Common Pitfalls
Assuming rowcount is always accurate
The rowcount value comes from the DBAPI cursor's rowcount attribute. Some drivers or configurations return -1 when the count is unavailable. Always check that result.rowcount != -1 before treating the number as reliable. With psycopg2 on PostgreSQL this is generally stable, but it's worth a quick assertion in your test suite.
Forgetting that on_conflict_do_nothing is constraint-scoped
If you omit index_elements, PostgreSQL applies the conflict check to all unique and primary key constraints on the table. A row might be skipped by a constraint you weren't thinking about. Always specify index_elements explicitly so the intent is clear and the behavior is predictable.
# Risky: conflicts on ANY constraint
insert(users).on_conflict_do_nothing()
# Safe: explicit about which constraint triggers the skip
insert(users).on_conflict_do_nothing(index_elements=["email"])
Using this inside an autocommit session
If your session is in autocommit mode, a partial batch insert might commit some rows before a later error occurs elsewhere in your code. Always use an explicit transaction context (engine.begin() or a session with commit() called explicitly) so that an unexpected error rolls back the whole batch.
Mixing ORM and Core bulk inserts
The on_conflict_do_nothing construct lives in sqlalchemy.dialects.postgresql and works with the Core insert() function. If you're using the ORM's session.bulk_insert_mappings() or session.add_all(), you cannot attach dialect-specific conflict handling directly. Switch to the Core insert path for any bulk operation that needs conflict control.
Testing That Your Conflict Handling Works
The worst thing about silent failures is that they're easy to miss in tests if you only assert that no exception was raised. Write tests that check actual row counts and skipped-row detection.
def test_bulk_insert_skips_duplicate_email(engine, users_table):
initial = [{"id": 10, "email": "test@example.com", "name": "Original"}]
duplicate = [{"id": 11, "email": "test@example.com", "name": "Copy"}]
with engine.begin() as conn:
conn.execute(insert(users_table), initial)
with engine.begin() as conn:
stmt = insert(users_table).on_conflict_do_nothing(index_elements=["email"])
result = conn.execute(stmt, duplicate)
assert result.rowcount == 0, "Expected duplicate to be skipped"
with engine.begin() as conn:
count = conn.execute(
select(users_table).where(users_table.c.email == "test@example.com")
).fetchall()
assert len(count) == 1, "Expected exactly one row in the table"
assert count[0].name == "Original", "Original row should not have been overwritten"
This pattern β insert a known row, attempt a conflicting insert, assert both the rowcount and the database state β gives you confidence that your conflict handling is doing what you think it is.
Next Steps
You now have a clear picture of why rows disappear silently and several tools to catch and handle them. Here are concrete actions to take:
- Add a
rowcountcheck immediately after every bulk insert that useson_conflict_do_nothingand log or raise when the count diverges from your input length. - Switch to the
.returning()pattern for any insert pipeline where you need exact audit data on what was skipped. - Always specify
index_elementsexplicitly so the conflict target is clear and unambiguous. - Write at least one test per insert function that asserts both the skipped-row behavior and the resulting database state.
- If skipped rows need to trigger updates, implement a two-pass strategy:
on_conflict_do_nothingfor the bulk insert, thenon_conflict_do_updatefor the identified skipped subset.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!