Fixing Silently Skipped Rows in SQLAlchemy bulk_insert_mappings Calls
You call bulk_insert_mappings, the session commits without complaint, and then you query the table and the row count is wrong. No exception, no warning, just missing data sitting somewhere between your Python list and the database.
This is one of the more frustrating bugs in SQLAlchemy because the ORM's bulk APIs are deliberately stripped down for speed, which means the usual safety nets are quietly absent. Understanding exactly what gets skipped β and why β is the fastest path to a fix.
What You'll Learn
- Why
bulk_insert_mappingssilently drops rows instead of raising errors - The most common root causes: duplicate primary keys, constraint violations, and driver batching
- How to add lightweight validation before the insert
- How to switch to a safer insert strategy when data integrity matters more than raw speed
- A repeatable debugging workflow you can apply in production
Prerequisites
This article assumes you are working with SQLAlchemy 1.4 or 2.x, though most points apply to older versions too. You should be comfortable with sessions, mapped classes, and basic SQL. The code examples use PostgreSQL, but the behaviour is the same on MySQL and SQLite unless otherwise noted.
Why bulk_insert_mappings Skips Rows Without Telling You
The standard ORM path β session.add() followed by session.commit() β runs per-object validation, fires events, and checks constraints before the SQL hits the wire. bulk_insert_mappings bypasses almost all of that. It is a thin wrapper around a raw INSERT statement built from a list of plain dictionaries.
That speed trade-off has a direct consequence: when a row in the middle of a batch violates a constraint, what happens depends entirely on the database driver and how you have configured the session. Some drivers roll back the whole batch; others silently continue. PostgreSQL with psycopg2 will raise a psycopg2.errors.UniqueViolation and abort the transaction β but only if you are not using INSERT ... ON CONFLICT DO NOTHING. With certain MySQL modes and the executemany path, a duplicate key can be swallowed depending on your sql_mode setting.
The most common reason rows go missing without any Python exception is that you are running inside a try/except block that is catching and suppressing database errors, or you are using a dialect-level option that converts errors into warnings.
Root Cause 1: Duplicate Primary Keys in the Input Data
If your list of dictionaries contains two entries with the same primary key value, one of them will not make it into the database. On PostgreSQL the transaction will fail loudly; on MySQL with INSERT IGNORE semantics or a permissive sql_mode, it will fail silently.
Check for duplicates before you insert:
from collections import Counter
def find_duplicate_keys(mappings, key_field):
counts = Counter(row[key_field] for row in mappings)
return [k for k, v in counts.items() if v > 1]
duplicates = find_duplicate_keys(records, "id")
if duplicates:
raise ValueError(f"Duplicate primary keys before insert: {duplicates[:10]}")
This runs in O(n) time and catches the problem before it ever reaches the database.
Root Cause 2: Missing Required Fields in Some Dictionaries
bulk_insert_mappings uses the keys from the first dictionary in your list to build the column list for the INSERT statement. If a later dictionary is missing a key that is present in the first, the generated SQL will include that column but pass None for the missing rows. If the column has a NOT NULL constraint, that row gets dropped or raises a constraint error depending on the driver.
Worse: if the first dictionary is missing a column that later rows have, those extra values are silently ignored for every row, including the first.
Normalise your dictionaries to a consistent set of keys before the call:
def normalise_mappings(mappings, all_columns):
"""Ensure every mapping has exactly the columns in all_columns."""
normalised = []
for row in mappings:
normalised.append({col: row.get(col) for col in all_columns})
return normalised
columns = {"id", "name", "email", "created_at"}
records = normalise_mappings(raw_records, columns)
Define all_columns from the model's table definition rather than hard-coding it, so it stays in sync automatically.
Root Cause 3: Constraint Violations Swallowed by Exception Handling
Search your codebase for patterns like this:
try:
session.bulk_insert_mappings(MyModel, records)
session.commit()
except Exception:
session.rollback()
pass # <-- this is the problem
A bare except Exception: pass after a bulk insert is a silent-failure factory. The rollback call is correct, but swallowing the exception means you have no idea how many rows were lost. At a minimum, log the error:
import logging
logger = logging.getLogger(__name__)
try:
session.bulk_insert_mappings(MyModel, records)
session.commit()
except Exception as exc:
session.rollback()
logger.error("bulk_insert_mappings failed: %s", exc, exc_info=True)
raise
Re-raising after logging keeps the calling code aware that something went wrong and prevents phantom success states.
Root Cause 4: The Session Was Already in a Failed State
SQLAlchemy sessions become invalid after an unhandled exception. If an earlier operation in the same request or worker thread threw a database error that was caught without a rollback, the session is in a broken state. Any subsequent calls, including bulk_insert_mappings, will either silently do nothing or raise sqlalchemy.exc.InvalidRequestError.
Always roll back the session when catching database exceptions:
try:
do_some_earlier_query(session)
except sqlalchemy.exc.SQLAlchemyError:
session.rollback() # mandatory β do not skip this
raise
If you are using a web framework like Flask or FastAPI, confirm that your session lifecycle management rolls back automatically at the end of a failed request. Most SQLAlchemy integrations do this, but it is worth verifying in your own middleware.
Root Cause 5: Driver-Level Batching Splits the Insert
Some database drivers split large executemany calls into smaller sub-batches internally. If one sub-batch fails and the driver does not propagate the error correctly to SQLAlchemy, you can end up with a partial insert. This is particularly common with older versions of cx_Oracle and some MySQL connectors.
You can detect this by comparing the row count before and after the insert:
from sqlalchemy import func, select
def insert_with_count_check(session, model, records):
count_before = session.execute(
select(func.count()).select_from(model)
).scalar()
session.bulk_insert_mappings(model, records)
session.flush() # push to DB without committing
count_after = session.execute(
select(func.count()).select_from(model)
).scalar()
inserted = count_after - count_before
if inserted != len(records):
raise RuntimeError(
f"Expected {len(records)} inserts, got {inserted}. "
"Rolling back."
)
session.commit()
This adds two extra queries but gives you a hard guarantee on data completeness before committing.
When to Drop bulk_insert_mappings Entirely
If your data has any non-trivial validation requirements, foreign key dependencies, or business-logic constraints, the performance gains from bulk_insert_mappings are not worth the debugging cost. Consider these alternatives:
Use insert() with returning()
SQLAlchemy Core's insert construct supports RETURNING on PostgreSQL and SQLite (3.35+). You can capture which rows were actually inserted:
from sqlalchemy.dialects.postgresql import insert
stmt = insert(MyModel.__table__).values(records)
stmt = stmt.on_conflict_do_nothing(index_elements=["id"])
result = session.execute(stmt.returning(MyModel.__table__.c.id))
inserted_ids = result.scalars().all()
print(f"Inserted {len(inserted_ids)} rows")
on_conflict_do_nothing handles duplicates without raising an error, and RETURNING tells you exactly which IDs made it through.
Use add_all() for smaller batches
For batches under a few thousand rows where you need full ORM event support, session.add_all() is safer and still reasonably fast:
objects = [MyModel(**row) for row in records]
session.add_all(objects)
session.commit()
The per-object overhead is real, but so is the safety. Profile before assuming you need the bulk path.
Common Pitfalls Checklist
- Empty input list:
bulk_insert_mappingswith an empty list does nothing and returns silently. Add a guard:if not records: return. - Wrong model class: Passing a model that maps to a different table than you think is a surprisingly common copy-paste mistake. Print
MyModel.__tablename__if the inserts are going somewhere unexpected. - Timezone-naive datetimes: Some databases reject timezone-naive
datetimeobjects when the column isTIMESTAMP WITH TIME ZONE. The driver may silently coerce or reject the value. Always pass timezone-aware datetimes. - String truncation: If a string value exceeds the column's
VARCHARlength, MySQL in non-strict mode will silently truncate it. PostgreSQL will raise an error. Validate string lengths before inserting if portability matters. - Autoflush interfering: If you query the session between constructing
recordsand callingbulk_insert_mappings, andautoflush=True, pending state can cause unexpected behaviour. Callsession.flush()explicitly at the right moment or disable autoflush for the operation.
A Repeatable Debugging Workflow
When you hit a silent-skip bug in production, work through these steps in order. Each step either surfaces the root cause or rules it out.
- Enable SQLAlchemy's SQL echo:
create_engine(url, echo=True)and inspect the exactINSERTstatement that was generated. - Run the same
INSERTmanually in a database client (DBeaver, psql, mysql CLI) with a single problematic row. Note the exact error. - Count rows before and after with an explicit
SELECT COUNT(*)inside the same transaction before committing. - Check your exception handlers for bare
passor overly broad catches. - Validate the input list for duplicate keys and missing required fields using the helper functions above.
- If using MySQL, check
SHOW VARIABLES LIKE 'sql_mode'and confirmSTRICT_TRANS_TABLESis present.
Wrapping Up
Silent row skips in bulk_insert_mappings nearly always trace back to one of a handful of causes: duplicate keys, inconsistent dictionary shapes, swallowed exceptions, or a broken session state. The fix in each case is straightforward once you know where to look.
Here are the concrete actions to take next:
- Add a duplicate-key check on your input list before every
bulk_insert_mappingscall. - Normalise all dictionaries to the same set of keys derived from the model's table definition.
- Replace bare
except: passblocks with logging and re-raises in any code path that touches a session. - For PostgreSQL inserts where idempotency matters, switch to
insert().on_conflict_do_nothing().returning()so you get an explicit count of what was written. - Add a before/after row-count assertion in any pipeline where data completeness is business-critical, not just a nice-to-have.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!