Fixing PostgreSQL UPDATE FROM That Silently Modifies Wrong Rows on Join
You run an UPDATE FROM in PostgreSQL, it reports thousands of rows affected, and everything looks fine β until someone notices the wrong values in the table. No error was raised, no warning was logged, and the query completed successfully. The data is just wrong.
This is one of the quietest bugs in SQL: a join in UPDATE FROM that matches more than one row per target row, causing PostgreSQL to pick a value that is effectively arbitrary. It happens in production, often during a data migration or a bulk sync, and the only clue is that your numbers are off.
- Understand exactly how PostgreSQL processes
UPDATE FROMjoins internally. - Reproduce the multiple-match problem with a small, auditable example.
- Learn three concrete fixes: subquery with aggregation, CTE rewrite, and lateral join.
- Know which diagnostic queries to run before executing any bulk update.
- Avoid the follow-on mistakes that re-introduce the same bug after fixing it.
The Trap Hidden in Plain Sight
PostgreSQL's UPDATE FROM clause is convenient β it lets you pull values from another table into an update in a single statement. But it does not behave like a standard SQL JOIN in a SELECT. When the source table in FROM produces multiple rows that match the target row's join condition, PostgreSQL does not aggregate them, raise an error, or pick the first one in any reliable order. It picks one β and which one is undefined.
The SQL standard doesn't even define multi-table updates this way. PostgreSQL borrowed the syntax from older Ingres-style SQL, and the semantics are subtly different from what most developers expect after years of writing SELECT ... JOIN.
How PostgreSQL UPDATE FROM Actually Works
Internally, PostgreSQL rewrites an UPDATE ... FROM as a join between the target table and the FROM list, then iterates over the join result. Each row in the target table that matches at least one row in the source gets updated once per match. If a target row matches three source rows, PostgreSQL updates it three times in sequence β and only the last write survives.
Which source row ends up being the "last" one depends on the query plan, the physical order of the source data, and any indexes involved. This is not guaranteed to be consistent between executions, between PostgreSQL versions, or even between runs on the same data after a VACUUM. You cannot rely on it.
Here is the simplest mental model: treat UPDATE FROM as if it expands to SELECT ... FROM target JOIN source ON ... and then updates the target table once for each row in that result set. If that SELECT returns duplicate target rows, your update will too.
A Concrete Example That Goes Wrong
Say you have an orders table and a price_adjustments table. You want to apply the latest adjustment to each order.
-- Schema setup
CREATE TABLE orders (
order_id INT PRIMARY KEY,
product_id INT,
price NUMERIC(10,2)
);
CREATE TABLE price_adjustments (
adjustment_id INT PRIMARY KEY,
product_id INT,
new_price NUMERIC(10,2),
applied_at TIMESTAMP
);
-- Seed data
INSERT INTO orders VALUES (1, 42, 100.00), (2, 42, 100.00);
INSERT INTO price_adjustments VALUES
(10, 42, 95.00, '2024-01-01 10:00:00'),
(11, 42, 88.00, '2024-01-02 10:00:00'), -- newer
(12, 42, 110.00, '2024-01-01 08:00:00');
Now run the naive update:
UPDATE orders o
SET price = pa.new_price
FROM price_adjustments pa
WHERE o.product_id = pa.product_id;
There are three rows in price_adjustments for product_id = 42, so each order row matches three source rows. PostgreSQL will update each order three times. The price that survives depends on the plan β you might get 95.00, 88.00, or 110.00. Run this twice with slightly different data, and you may get a different result each time.
Why PostgreSQL Picks an Arbitrary Row
PostgreSQL's query planner chooses a join strategy (hash join, nested loop, merge join) based on statistics and cost estimates. The order in which matched source rows are processed changes with the strategy. Even if you add an ORDER BY to the outer query, it does not control which source row wins β ORDER BY has no effect on the intermediate join used internally by an UPDATE.
Some engineers assume that the row with the lowest or highest primary key will "win" based on index order. This assumption breaks the moment PostgreSQL switches from an index scan to a seq scan, which can happen after a table grows past a planner threshold. The PostgreSQL EXPLAIN ANALYZE seq scan issue is a perfect example of how plan choices can shift unexpectedly, quietly changing your data.
How to Diagnose the Problem Before It Corrupts Data
Before running any bulk UPDATE FROM, check whether the join produces duplicate target rows. Wrap your intended join as a SELECT and count duplicates:
SELECT
o.order_id,
COUNT(pa.adjustment_id) AS match_count
FROM orders o
JOIN price_adjustments pa ON o.product_id = pa.product_id
GROUP BY o.order_id
HAVING COUNT(pa.adjustment_id) > 1;
If this query returns any rows, your UPDATE FROM will produce non-deterministic results. The fix must happen in the source side of the join, not in the UPDATE itself.
Also audit the actual values that would be applied by running:
SELECT
o.order_id,
o.price AS current_price,
pa.new_price AS would_set_price,
pa.applied_at
FROM orders o
JOIN price_adjustments pa ON o.product_id = pa.product_id
ORDER BY o.order_id, pa.applied_at DESC;
You want this to return exactly one source row per target row. If it returns more, pick the right one explicitly before updating.
Fix 1: Use a Subquery with DISTINCT or Aggregation
The most direct fix is to resolve the ambiguity in the FROM clause itself. Pre-aggregate the source table so it returns exactly one row per join key.
UPDATE orders o
SET price = pa.new_price
FROM (
SELECT DISTINCT ON (product_id)
product_id,
new_price
FROM price_adjustments
ORDER BY product_id, applied_at DESC
) pa
WHERE o.product_id = pa.product_id;
DISTINCT ON (product_id) is a PostgreSQL extension that keeps the first row per product_id in the given ORDER BY sequence. Combined with ORDER BY product_id, applied_at DESC, it gives you the most recently applied adjustment for each product β deterministically, every time.
If you want the aggregate value instead (for example, the minimum price across all adjustments), use a standard GROUP BY with an aggregate function:
UPDATE orders o
SET price = pa.min_price
FROM (
SELECT product_id, MIN(new_price) AS min_price
FROM price_adjustments
GROUP BY product_id
) pa
WHERE o.product_id = pa.product_id;
Either approach guarantees a 1-to-1 relationship between the subquery result and the target table's join key.
Fix 2: Rewrite with a CTE to Control the Join
A CTE gives you a named, readable staging area that you can verify separately before plugging into the UPDATE. This is especially useful when the deduplication logic is complex.
WITH latest_adjustment AS (
SELECT DISTINCT ON (product_id)
product_id,
new_price
FROM price_adjustments
ORDER BY product_id, applied_at DESC
)
UPDATE orders o
SET price = la.new_price
FROM latest_adjustment la
WHERE o.product_id = la.product_id;
You can run the CTE body as a standalone SELECT to confirm it returns exactly one row per product_id before committing the update. This makes auditing and code review much simpler.
One important note: by default, PostgreSQL may or may not materialize a CTE depending on the version and query shape. For the DISTINCT ON pattern this rarely matters, but if you're working with a more complex CTE that you need to force into a specific evaluation order, review how PostgreSQL CTE materialization affects query behavior before relying on a CTE for deduplication in a hot path.
Fix 3: Use a Lateral Join for Per-Row Lookups
When the correct source row depends on data from the target row (not just on a shared key), a lateral subquery gives you a correlated lookup that evaluates once per target row.
UPDATE orders o
SET price = pa.new_price
FROM LATERAL (
SELECT new_price
FROM price_adjustments
WHERE product_id = o.product_id
ORDER BY applied_at DESC
LIMIT 1
) pa
WHERE true;
The LATERAL keyword allows the subquery to reference columns from the outer table (o.product_id). The LIMIT 1 combined with ORDER BY applied_at DESC selects exactly the most recent row for each order. Because the subquery runs once per target row, you can never get a many-to-one ambiguity.
The trade-off is performance: a lateral join can be slower than a pre-aggregated subquery on large tables because it runs the inner query once per outer row. Check the query plan with EXPLAIN (ANALYZE, BUFFERS) and compare it to the subquery approach if you're updating millions of rows.
Common Pitfalls After Fixing
Fixing the immediate query doesn't mean you're done. Here are the mistakes that bring the same bug back.
Forgetting to wrap the fix in a transaction
Always run bulk updates inside an explicit transaction so you can inspect the result before committing:
BEGIN;
WITH latest_adjustment AS (
SELECT DISTINCT ON (product_id)
product_id,
new_price
FROM price_adjustments
ORDER BY product_id, applied_at DESC
)
UPDATE orders o
SET price = la.new_price
FROM latest_adjustment la
WHERE o.product_id = la.product_id
RETURNING o.order_id, o.price;
-- Inspect the RETURNING output, then:
COMMIT; -- or ROLLBACK;
The RETURNING clause lets you see exactly what was written without running a separate SELECT after the fact.
Adding a unique constraint and assuming the problem is gone
If you add a unique index to the source table on the join key, future inserts will be blocked β but existing duplicates won't be removed. Run the diagnostic query from earlier after any schema change to confirm the source table is actually clean.
Relying on application-layer ordering
Some engineers insert records into the source table in a specific order and assume PostgreSQL will read them back that way. Heap storage has no guaranteed row order. Data written in order today may be read in a different order after a VACUUM FULL or table rewrite. Always express your intended ordering explicitly in SQL, never in insertion order.
Not testing with production-scale data volumes
A query that works correctly on a 100-row staging table can behave differently at scale if the planner switches strategies. The wrong-row bug and the silent constraint failure on bulk inserts are both problems that often only appear under realistic data volumes. Test your fixed query with a representative data sample before deploying to production.
Wrapping Up
The core rule is simple: before any UPDATE FROM, verify that the join source returns exactly one row per target row. If it doesn't, fix the source β not the UPDATE clause itself.
Here are four concrete actions to take right now:
- Run the duplicate diagnostic on every existing
UPDATE FROMquery in your codebase. Any query joining to a table without a unique constraint on the join key is a candidate for this bug. - Standardize on the
DISTINCT ONsubquery pattern for updates that need the latest record per key. It's readable, performant, and deterministic. - Wrap all bulk updates in
BEGIN ... RETURNING ... ROLLBACK/COMMITduring development so you can audit the outcome before it lands in production. - Add a comment to complex update queries documenting why the source subquery is deduplicated and what the join cardinality is. Future reviewers (including yourself) will thank you.
- Check query plans with
EXPLAIN (ANALYZE, BUFFERS)when updating large tables, and compare the subquery vs. lateral approach to choose the faster one for your data distribution.
Frequently Asked Questions
Why does PostgreSQL UPDATE FROM update the wrong rows without throwing an error?
PostgreSQL silently applies the update for every row produced by the internal join between the target table and the FROM clause. When that join returns multiple source rows for a single target row, PostgreSQL updates the target multiple times and the last write wins β which source row is last depends on the query plan, not on any predictable ordering.
How can I check whether my UPDATE FROM join produces duplicate rows before running it?
Convert your UPDATE into a SELECT first, group by the target table's primary key, and use HAVING COUNT(*) > 1 to surface any target rows that match more than one source row. If this query returns results, your update will produce non-deterministic values and you must fix the join before proceeding.
Is DISTINCT ON safe to use inside an UPDATE FROM subquery in PostgreSQL?
Yes, DISTINCT ON is a PostgreSQL-specific feature that selects the first row per group based on the ORDER BY you specify, making the result deterministic. Using it inside a subquery in the FROM clause of an UPDATE is a well-established pattern for resolving many-to-one join ambiguity.
What is the difference between using a CTE versus a subquery to fix a bad UPDATE FROM join?
Both approaches work the same way functionally β they pre-aggregate the source data before the join so each target row matches exactly one source row. A CTE is often easier to read and audit because you can run its body as a standalone SELECT to verify its output before attaching the UPDATE.
Does adding ORDER BY to an UPDATE FROM query control which source row PostgreSQL uses?
No. ORDER BY in an UPDATE statement controls the order in which target rows are locked and processed, not which source row wins when there are multiple matches. To control which source row is selected, you must apply ORDER BY inside a subquery or lateral join within the FROM clause.
π€ Share this article
Sign in to saveRelated Articles
How-To Guides
Fixing PostgreSQL JSONB Query That Ignores Index and Falls Back to Seq Scan
9m read
How-To Guides
Fixing PostgreSQL LATERAL JOIN That Returns No Rows When Subquery References Outer Column
10m read
How-To Guides
Fixing Excel INDEX MATCH That Returns the Wrong Row When Data Is Sorted
8m read
Comments (0)
No comments yet. Be the first!