Fixing PostgreSQL COPY FROM That Silently Skips Rows With Null Delimiter Mismatch

July 02, 2026 10 min read 3 views

You run COPY FROM, PostgreSQL reports success, and your row count is short. No error, no rejected-row log, no warning β€” just missing data sitting somewhere between your file and your table. This is one of the quieter bugs in PostgreSQL bulk loading, and it almost always comes down to a null-string or delimiter mismatch.

What You'll Learn

  • How PostgreSQL interprets null strings and delimiters during COPY FROM
  • The exact conditions that cause rows to be silently skipped or truncated
  • How to inspect a raw data file to confirm what options you actually need
  • How to use a staging table as a safety net for bulk imports
  • How to verify import completeness with before-and-after row counts

Prerequisites

You should be comfortable running psql commands and basic SQL. The examples target PostgreSQL 13 and later, but the behavior described here has been consistent for many major versions. You will need shell access to the server or the ability to run \copy from a client.

How PostgreSQL COPY FROM Parses Null Values

When PostgreSQL reads a file with COPY FROM, it matches each field against the NULL option you supply. The default null string is an empty string for CSV format and \N for text format. If your file uses something else β€” say, the literal string NULL, or na, or nothing at all β€” PostgreSQL will not recognize those values as null. It will try to cast the string to your column's data type instead.

For NOT NULL columns or columns with strict type constraints, a failed cast raises an error and stops the entire COPY. But for nullable columns where the unexpected string happens to cast successfully to a wrong value, or where type coercion silently truncates, you get bad data with no complaint. The most dangerous case is when the unrecognized null string lands in a column used as a join key downstream β€” your data looks complete but joins silently return nothing.

The NULL option in COPY is not a regex or a list β€” it is a single exact string. If your file uses two different conventions for missing values, you need to preprocess the file or use a staging table.

How Delimiter Mismatches Cause Silent Row Drops

A delimiter mismatch is subtler. If your file uses a tab character but you specify DELIMITER ',', PostgreSQL reads each line as a single enormous field. Whether that causes an error or a silent skip depends on your table structure.

If the table has exactly one column, the entire line goes into that column β€” no error at all. If the table has multiple columns but the single-field line happens to be too long for a varchar limit without a CHECK constraint, PostgreSQL may silently truncate it (in older configurations) or error out. The really nasty case is a mismatch that produces the wrong number of fields for only some rows, not all, because the file mixes quoting styles. Those rows error out, and if you have COPY wrapped inside a transaction that you catch and retry, you may never notice which rows were skipped.

Similar issues appear in PostgreSQL's JSON handling β€” if you've ever debugged a JSONB query that silently falls back to a sequential scan, you know PostgreSQL will choose a valid but wrong execution path rather than raise an error. The same philosophy applies to data parsing.

Step 1: Inspect the Raw File Before You COPY

Never assume the file matches what the source system claims. Open it and verify the actual delimiter, the quoting character, the line endings, and the null representation.

For a quick shell inspection:

# Show the first 5 lines with visible special characters
cat -A yourfile.csv | head -5

# Check line endings (^M at end of line means CRLF)
file yourfile.csv

# Count fields on line 1 using comma as delimiter
awk -F',' '{print NF; exit}' yourfile.csv

# Count fields on line 1 using tab as delimiter
awk -F'\t' '{print NF; exit}' yourfile.csv

If cat -A shows ^I between fields, your file is tab-delimited. If you see ^M$ at line endings, you have Windows CRLF line endings. PostgreSQL's COPY handles CRLF correctly on most platforms, but it's worth confirming.

To check for inconsistent null representations across the file:

# Find every distinct value that looks like a null placeholder
awk -F',' '{for(i=1;i<=NF;i++) print $i}' yourfile.csv \
  | sort | uniq -c | sort -rn | head -20

Look for entries like NULL, \N, N/A, na, (null), or just an empty string. Each one requires a decision.

Step 2: Match Your COPY Options to the File Exactly

Once you know the file's actual format, write the COPY statement to match it precisely. The most important options are FORMAT, DELIMITER, NULL, QUOTE, and ESCAPE.

A typical CSV import where missing values are represented as the literal string NULL:

COPY orders (
  order_id,
  customer_id,
  amount,
  notes
)
FROM '/data/orders.csv'
WITH (
  FORMAT CSV,
  HEADER TRUE,
  DELIMITER ',',
  NULL 'NULL',
  QUOTE '"',
  ESCAPE '"'
);

A tab-delimited file where missing values are represented as \N (the text format default):

COPY orders (order_id, customer_id, amount, notes)
FROM '/data/orders.tsv'
WITH (
  FORMAT TEXT,
  NULL '\N'
);

Note that in text format, the delimiter defaults to tab β€” you don't have to specify it, but being explicit prevents future confusion.

If your file uses an empty string to represent null (common in CSV exports from Python's pandas):

COPY orders (order_id, customer_id, amount, notes)
FROM '/data/orders.csv'
WITH (
  FORMAT CSV,
  HEADER TRUE,
  NULL ''
);

This is the CSV format default, so leaving NULL unspecified achieves the same result β€” but being explicit documents your intent for anyone reading the code later.

The same class of silent-data-loss problems appears in Python when Pandas groupby silently drops groups whose key is None β€” the root cause is always the same: a tool's internal null representation doesn't match what's actually in your data.

Step 3: Use a Staging Table to Catch What Slips Through

Even after you've matched your COPY options carefully, it's worth routing the import through a staging table with permissive column types. This lets you inspect what arrived before you commit it to the real table.

-- Create a staging table with all text columns
CREATE TEMP TABLE orders_staging (
  order_id   text,
  customer_id text,
  amount      text,
  notes       text
);

-- Load everything as raw text; type issues become visible data, not errors
COPY orders_staging (order_id, customer_id, amount, notes)
FROM '/data/orders.csv'
WITH (
  FORMAT CSV,
  HEADER TRUE,
  NULL 'NULL'
);

-- Inspect unexpected values before casting
SELECT amount, COUNT(*) AS cnt
FROM orders_staging
WHERE amount !~ '^[0-9]+(\.[0-9]+)?$'
GROUP BY amount
ORDER BY cnt DESC;

Once you're satisfied with the staging data, move it to the real table with explicit casts:

INSERT INTO orders (order_id, customer_id, amount, notes)
SELECT
  order_id::integer,
  customer_id::integer,
  amount::numeric,
  NULLIF(notes, '') -- treat empty string as null in destination
FROM orders_staging
WHERE order_id IS NOT NULL;  -- explicit filter documents intent

The NULLIF trick is useful when your staging data has empty strings that should become real database nulls. You control exactly when that conversion happens, rather than relying on COPY's null-string matching to get it right for every column.

This pattern mirrors the advice for other PostgreSQL operations where silent wrong-row selection is the bug β€” see the breakdown of how UPDATE FROM can silently modify the wrong rows when join conditions are underspecified.

Step 4: Verify Row Counts Before and After

Make row count verification a non-negotiable part of your import pipeline. You should know the expected row count before the import starts and confirm it after.

# Count data rows in the file (subtract 1 for header)
wc -l yourfile.csv
-- After COPY, check the count
SELECT COUNT(*) FROM orders;

-- Or check the count added in this import if the table is not empty
SELECT COUNT(*) FROM orders
WHERE imported_at >= NOW() - INTERVAL '5 minutes';

If you are running imports inside application code, you can capture the row count from COPY programmatically. In Python with psycopg2:

import psycopg2

conn = psycopg2.connect("dbname=mydb user=myuser")
cur = conn.cursor()

with open("orders.csv", "r") as f:
    cur.copy_expert(
        """
        COPY orders (order_id, customer_id, amount, notes)
        FROM STDIN
        WITH (FORMAT CSV, HEADER TRUE, NULL 'NULL')
        """,
        f
    )

rows_imported = cur.rowcount
print(f"Imported {rows_imported} rows")

conn.commit()
cur.close()
conn.close()

cur.rowcount after copy_expert gives you the number of rows PostgreSQL accepted. Compare this against the file's line count minus the header, and fail loudly if they differ by more than your accepted tolerance.

Common Pitfalls and Gotchas

COPY stops at the first error, not row by row

COPY FROM is transactional: if any row fails, the whole statement rolls back. This means you typically either get all rows or none, not a partial import. What looks like "skipped rows" is usually rows that were never in the file, or rows that caused the statement to abort before PostgreSQL even reached them.

The NULL option is case-sensitive and exact

If your file has NULL (uppercase) and you specify NULL 'null' (lowercase), PostgreSQL will not treat those fields as null. The match is byte-for-byte. Run a quick grep to confirm the exact casing in your file.

Windows line endings on Linux servers

A file produced on Windows with CRLF line endings will usually work, but if your last column appears to have a trailing \r, your null-string checks will all fail because 'NULL' does not equal 'NULL\r'. Strip CRLF before importing with sed -i 's/\r//' yourfile.csv or the dos2unix utility.

Quoted fields and the QUOTE option

If some of your null-value placeholders appear inside quoted fields β€” for example "NULL" with quotes β€” PostgreSQL treats the quotes as part of the value unless you have FORMAT CSV with the correct QUOTE character. In CSV mode, "NULL" is the string NULL (without quotes), which will match your NULL 'NULL' option. But NULL without quotes in a CSV file is also treated the same way, so both forms work identically in CSV mode.

Encoding mismatches between client and server

If the file is UTF-8 but the database encoding is SQL_ASCII (or vice versa), multi-byte characters can cause parse failures that look like delimiter mismatches because the byte sequence for a multi-byte character may contain bytes that match your delimiter. Always check \l in psql to confirm database encoding, and specify ENCODING 'UTF8' in your COPY options when in doubt.

Encoding-driven silent failures also appear in lateral joins β€” the same "no rows, no error" symptom shows up when a LATERAL JOIN subquery loses its reference to the outer column. The debugging approach is similar: isolate one row, test it in isolation, then expand scope.

\copy vs COPY: file paths and permissions

COPY (server-side) reads files from the server's filesystem with the PostgreSQL service account's permissions. \copy (client-side, psql only) reads files from the client machine. If your file is on your laptop, use \copy. Mixing them up produces permission errors, not missing rows, but it's a common source of confusion during debugging.

Wrapping Up: Next Steps

Silent data loss during COPY FROM is almost always a formatting mismatch, not a PostgreSQL bug. Here's what to do right now:

  1. Inspect the file with cat -A and awk field-count checks before you write a single COPY statement. Know your delimiter, null string, quote character, and line ending.
  2. Be explicit in every COPY option. Never rely on defaults unless you have verified they match. Write FORMAT CSV, DELIMITER ',', NULL '', QUOTE '"' even when those are defaults β€” future readers will thank you.
  3. Route large or critical imports through a staging table with all-text columns so type-cast failures surface as data you can inspect, not errors that abort the import.
  4. Compare row counts programmatically and fail loudly if the count doesn't match. A 0.1% row drop is invisible to the eye but obvious to a simple integer comparison.
  5. Add encoding and line-ending normalization to your file-preparation step, especially for files arriving from Windows machines or third-party exports.

Frequently Asked Questions

Why does PostgreSQL COPY FROM complete without errors but import fewer rows than the file contains?

COPY FROM is transactional and stops at the first hard error, so if you got no error the file likely parsed completely. The missing rows are usually rows that were never in the file (header line counted, or blank trailing lines excluded), or rows where a null-string or type mismatch caused the value to be stored incorrectly rather than skipped outright.

How do I tell PostgreSQL COPY FROM to treat the string 'NULL' as a null value instead of the literal text?

Add NULL 'NULL' to your COPY options: COPY mytable FROM '/path/file.csv' WITH (FORMAT CSV, NULL 'NULL'). The match is exact and case-sensitive, so make sure the casing in your option matches the casing in the file byte for byte.

Can PostgreSQL COPY FROM skip individual bad rows and continue importing the rest?

No, standard COPY FROM is all-or-nothing within a transaction β€” one bad row rolls back the entire statement. To get partial-import behavior you need to preprocess the file, use a staging table with permissive text columns, or split the file into smaller chunks and import each in its own transaction.

What is the difference between using COPY and \copy in PostgreSQL, and which should I use for local files?

COPY is a server-side command that reads files from the PostgreSQL server's filesystem, while \copy is a psql client command that reads files from the machine running psql. If the file is on your laptop or workstation, use \copy; if the file is already on the database server, either works but COPY is faster.

How do I check whether a CSV file is tab-delimited or comma-delimited before running COPY FROM?

Run awk -F',' '{print NF; exit}' yourfile.csv and awk -F'\t' '{print NF; exit}' yourfile.csv and compare the field counts. The delimiter that gives you the expected number of columns (matching your table) is the correct one. You can also use cat -A to see tab characters displayed as ^I.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.