Fixing Silently Missing Rows After SQLAlchemy Bulk Insert with on_conf

Bulk inserts are a critical optimization in modern applications.

Whether you're building:

ETL pipelines
Analytics platforms
Event processing systems
SaaS applications
Data warehouses
Machine learning pipelines

you will eventually need to insert thousands or millions of records efficiently.

A common SQLAlchemy pattern looks like:

stmt = insert(User).values(records)

stmt = stmt.on_conflict_do_nothing(
    index_elements=["email"]
)

This approach offers several benefits:

Fast Bulk Inserts
↓
Automatic Deduplication
↓
No Duplicate-Key Errors

Developers often deploy this pattern and move on.

Everything appears successful.

No exceptions occur.

No transaction failures appear.

No PostgreSQL errors are reported.

Then a business stakeholder asks:

CSV File:
100,000 Rows

Database:
93,742 Rows

Questions immediately arise:

Where did the missing rows go?
Did PostgreSQL fail?
Did SQLAlchemy drop data?
Was the transaction incomplete?
Is the database corrupted?

In most cases:

Nothing failed.

The rows were intentionally skipped.

The challenge is that ON CONFLICT DO NOTHING is specifically designed to suppress conflicts, making missing records easy to overlook.

In this guide, you'll learn why rows disappear, how SQLAlchemy and PostgreSQL handle conflicts, and how to build reliable bulk ingestion pipelines that detect skipped records before they become production problems.

What You Will Learn From This Article

After reading this guide, you'll understand:

How ON CONFLICT DO NOTHING works.
Why rows silently disappear.
How SQLAlchemy executes bulk inserts.
Common causes of skipped records.
How to detect ignored rows.
Better monitoring techniques.
Best practices for production ETL systems.

Understanding the Problem

Consider:

records = [
    {"email": "alice@test.com"},
    {"email": "bob@test.com"},
    {"email": "alice@test.com"},
]

The table contains:

UNIQUE(email)

Bulk insert:

stmt = insert(User).values(records)

stmt = stmt.on_conflict_do_nothing(
    index_elements=["email"]
)

Result:

alice@test.com
Inserted

bob@test.com
Inserted

alice@test.com
Skipped

No error occurs.

The duplicate record simply disappears.

Why Developers Get Confused

Many developers assume:

Insert Statement
↓
Success
↓
All Rows Written

Reality:

Insert Statement
↓
Success
↓
Some Rows Skipped

The SQL executes successfully even when records are ignored.

What Does ON CONFLICT DO NOTHING Mean?

PostgreSQL behavior:

INSERT INTO users
(email)

VALUES
('alice@test.com')

ON CONFLICT DO NOTHING;

Translation:

If Conflict Exists
↓
Ignore Row
↓
Continue Processing

No exception is raised.

Why This Feature Exists

Without conflict handling:

Duplicate Row
↓
Unique Constraint Error
↓
Transaction Fails

With:

ON CONFLICT DO NOTHING

workflow becomes:

Duplicate Row
↓
Skip Row
↓
Continue

This is ideal for:

Event ingestion
Data synchronization
Log processing
Incremental imports

The Hidden Cost

The trade-off is:

Error Visibility
↓
Reduced

Developers gain robustness but lose visibility into skipped data.

Common Cause #1

Duplicate Data Inside the Batch

Example:

records = [
    {"email": "a@test.com"},
    {"email": "a@test.com"},
    {"email": "a@test.com"},
]

Only one row survives.

The rest are ignored.

Common Cause #2

Existing Database Records

Database already contains:

a@test.com

New import:

{"email": "a@test.com"}

Result:

Conflict
↓
Skipped

Nothing appears broken.

The row simply never enters the table.

Common Cause #3

Composite Unique Constraints

Example:

UNIQUE(company_id, email)

Developers often focus only on:

Email

while conflicts occur on:

Company + Email

Unexpected skips follow.

Common Cause #4

Incorrect Conflict Target

Example:

stmt.on_conflict_do_nothing(
    index_elements=["email"]
)

Actual database constraint:

UNIQUE(username)

The conflict behavior may not match expectations.

Always verify actual indexes.

Why Missing Rows Often Go Undetected

Typical workflow:

session.execute(stmt)

session.commit()

No exception:

Success

Most applications stop here.

The skipped rows remain invisible.

Detecting Skipped Rows

The simplest check:

result = session.execute(stmt)

print(result.rowcount)

Example:

Input Records:
1000

Inserted:
973

Difference:

27 Skipped Rows

This immediately reveals the problem.

Why rowcount Matters

Many ETL systems only log:

Import Successful

Instead log:

Rows Received
Rows Inserted
Rows Skipped

This dramatically improves observability.

Using RETURNING

PostgreSQL supports:

stmt = (
    insert(User)
    .values(records)
    .returning(User.id)
)

Result:

Inserted Rows Returned

Only successful inserts appear.

Comparing counts reveals skipped records.

Example

Input:

100 Records

Returned:

92 IDs

Conclusion:

8 Rows Skipped

No guesswork required.

Identifying Which Rows Were Skipped

A common approach:

Input Dataset
↓
Bulk Insert
↓
Returned IDs
↓
Difference Analysis

Useful for:

Auditing
Compliance
Data quality reporting

Common Mistake #1

Assuming Commit Means Everything Inserted

Many developers interpret:

session.commit()

as:

All Records Written

Actually:

Transaction Succeeded

which is not the same thing.

Common Mistake #2

Ignoring Import Metrics

Bad:

Import Complete

Good:

100,000 Received
98,542 Inserted
1,458 Skipped

Visibility prevents surprises.

Common Mistake #3

Treating Skips as Harmless

Sometimes skipped rows indicate:

Bad source data
Duplicate exports
Broken deduplication logic
Upstream bugs

Ignoring them can hide larger problems.

When DO NOTHING Is Appropriate

Excellent for:

Event Streams

Duplicate events are common.

Idempotent APIs

Safe retries matter.

Log Ingestion

Redundant data may occur.

Sync Jobs

Repeated imports should not fail.

In these cases, silent conflict handling is beneficial.

When DO NOTHING Is Dangerous

High-risk scenarios include:

Financial Transactions

Every row matters.

Compliance Records

Missing entries create risk.

Customer Billing

Skipped rows affect revenue.

Healthcare Data

Missing records can be serious.

Here, visibility becomes critical.

Alternative: ON CONFLICT DO UPDATE

Instead of:

DO NOTHING

use:

DO UPDATE

Workflow:

Conflict
↓
Update Existing Row

No data disappears.

This often provides better traceability.

Example

stmt.on_conflict_do_update(
    index_elements=["email"],
    set_={
        "updated_at": func.now()
    }
)

The conflicting record remains visible.

Monitoring Bulk Imports

Track:

Records Received

Records Inserted

Records Updated

Records Skipped

Duplicate Rate

Constraint Violations

These metrics reveal data quality trends.

Real-World Example

An analytics pipeline processes:

5 Million Events Daily

Import uses:

on_conflict_do_nothing()

Monitoring initially reports:

Import Successful

Weeks later:

Event Count Lower Than Expected

Investigation reveals:

12% Duplicate Events

The database behaved correctly.

The missing visibility created the confusion.

Adding insert metrics immediately exposed the issue.

Best Practices Checklist

When using ON CONFLICT DO NOTHING:

✅ Monitor inserted row counts

✅ Compare input and output totals

✅ Log skipped-record metrics

✅ Verify unique constraints

✅ Audit duplicate rates

✅ Use RETURNING when possible

✅ Validate source data quality

✅ Document expected skip behavior

✅ Build dashboards for ingestion metrics

✅ Test imports with duplicate data

Common Mistakes to Avoid

Avoid:

❌ Assuming successful execution means all rows inserted

❌ Ignoring rowcount

❌ Hiding skipped-record metrics

❌ Using DO NOTHING without understanding constraints

❌ Treating all duplicates as harmless

❌ Failing to audit import results

❌ Debugging PostgreSQL before checking conflicts

Performance Considerations

One reason developers love:

ON CONFLICT DO NOTHING

is performance.

Benefits:

No Exception Handling
↓
No Transaction Rollbacks
↓
High Throughput

The optimization is excellent.

The challenge is ensuring visibility remains intact.

Why This Issue Is So Common

The problem stems from a mismatch between:

Developer Expectations

and:

Database Behavior

Developers often expect:

Missing Rows
=
System Failure

PostgreSQL expects:

Missing Rows
=
Conflict Resolution

Understanding this distinction eliminates much of the confusion.

Wrapping Summary

ON CONFLICT DO NOTHING is one of PostgreSQL's most useful features for building resilient bulk ingestion pipelines. It prevents duplicate-key errors, improves throughput, and enables idempotent imports. However, its greatest strength is also its biggest source of confusion: conflicting rows are silently ignored rather than generating errors.

As a result, developers often discover that imported row counts do not match source datasets and mistakenly assume the database, SQLAlchemy, or transaction system failed. In reality, PostgreSQL is behaving exactly as instructed. The missing rows are usually the result of unique-constraint conflicts, duplicate records, or previously imported data.

The solution is not to abandon ON CONFLICT DO NOTHING, but to add visibility around it. By tracking inserted counts, skipped rows, duplicate rates, and import metrics, teams can enjoy the performance benefits of conflict-free bulk inserts while maintaining confidence in their data ingestion processes.

Fixing Silently Missing Rows After SQLAlchemy Bulk Insert with on_conflict_do_nothing

Duplicate Data Inside the Batch

Existing Database Records

Composite Unique Constraints

Incorrect Conflict Target

Assuming Commit Means Everything Inserted

Ignoring Import Metrics

Treating Skips as Harmless

Event Streams

Idempotent APIs

Log Ingestion

Sync Jobs

Financial Transactions

Compliance Records

Customer Billing

Healthcare Data

Records Received

Records Inserted

Records Updated

Records Skipped

Duplicate Rate

Constraint Violations

Related Articles

Writing a Contributor Guide That Gets First-Time PRs You Can Actually Merge

Pinpointing CPU Spikes in Node.js Services Using Clinic.js Flame

Fixing React useState Updates That Batch Silently in Async Event Handlers

Comments (0)

Leave a Comment

Fixing Silently Missing Rows After SQLAlchemy Bulk Insert with on_conflict_do_nothing

Duplicate Data Inside the Batch

Existing Database Records

Composite Unique Constraints

Incorrect Conflict Target

Assuming Commit Means Everything Inserted

Ignoring Import Metrics

Treating Skips as Harmless

Event Streams

Idempotent APIs

Log Ingestion

Sync Jobs

Financial Transactions

Compliance Records

Customer Billing

Healthcare Data

Records Received

Records Inserted

Records Updated

Records Skipped

Duplicate Rate

Constraint Violations

Related Articles

Writing a Contributor Guide That Gets First-Time PRs You Can Actually Merge

Pinpointing CPU Spikes in Node.js Services Using Clinic.js Flame

Fixing React useState Updates That Batch Silently in Async Event Handlers

Comments (0)

Leave a Comment

Stay ahead of the curve