Fixing PostgreSQL DISTINCT ON That Returns Wrong Row When Order Matters
You wrote a DISTINCT ON query to pick one row per group, ran it, and the wrong row came back. Maybe you wanted the most recent record but got the oldest. Maybe you expected the highest value but got a random-looking one. This is one of PostgreSQL's most common gotchas, and the fix is simple once you understand the rule.
What you'll learn
- Exactly how PostgreSQL decides which row to keep with
DISTINCT ON - Why
ORDER BYis not optional when usingDISTINCT ON - How to return the latest, oldest, or highest-value row per group
- When to use a subquery or CTE instead of
DISTINCT ON - How
DISTINCT ONcompares toROW_NUMBER()window functions
What DISTINCT ON Actually Does
DISTINCT ON (expr) tells PostgreSQL: for each unique value of expr, keep exactly one row and discard the rest. The surviving row is always the first row in the result set for that group, based on the sort order you specify with ORDER BY.
That last part is the key. PostgreSQL does not pick a random row, and it does not pick the row with the highest or lowest value automatically. It picks the first row after sorting. If you do not specify ORDER BY, the database can sort however it likes, giving you unpredictable results.
Here is the basic syntax:
SELECT DISTINCT ON (column_to_deduplicate)
column_to_deduplicate,
other_column,
another_column
FROM your_table
ORDER BY column_to_deduplicate, sort_column DESC;
The ORDER BY clause must start with the same expressions used in DISTINCT ON. PostgreSQL enforces this β you will get an error if you try to order by something unrelated without including the distinct column first.
Why You're Getting the Wrong Row
The most common mistake is writing DISTINCT ON without an ORDER BY, or writing an ORDER BY that does not express which row you want to survive.
Consider a table of user events:
CREATE TABLE events (
id SERIAL PRIMARY KEY,
user_id INT,
event_type TEXT,
created_at TIMESTAMPTZ
);
You want the most recent event per user. You write:
-- Broken: no ORDER BY means unpredictable results
SELECT DISTINCT ON (user_id)
user_id,
event_type,
created_at
FROM events;
PostgreSQL is free to return any row for each user_id. In practice, it often returns rows in physical storage order, which might look consistent during development but will break after a VACUUM, an index scan, or a table rewrite.
A second common mistake is including an ORDER BY that sorts in the wrong direction:
-- Broken: ASC means the OLDEST row survives, not the newest
SELECT DISTINCT ON (user_id)
user_id,
event_type,
created_at
FROM events
ORDER BY user_id, created_at ASC;
Because DISTINCT ON keeps the first row and ASC sorts oldest-first, you get the oldest event per user. If you wanted the newest, the sort direction is backwards.
The Fix: Align ORDER BY With Your Intent
The rule is straightforward: the secondary sort columns in your ORDER BY determine which row survives. Put the row you want at the top of each group.
- You want the most recent row: sort by the timestamp
DESC. - You want the oldest row: sort by the timestamp
ASC. - You want the row with the highest value: sort by that column
DESC. - You want the row with the lowest value: sort by that column
ASC.
The first column in ORDER BY must always match what you put in DISTINCT ON. Everything after that is your sorting preference within each group.
Returning the Latest Row Per Group
This is the most frequent use case: one row per user, customer, or entity, where you want the most recent record.
-- Correct: DESC on created_at makes the newest row come first
SELECT DISTINCT ON (user_id)
user_id,
event_type,
created_at
FROM events
ORDER BY user_id, created_at DESC;
PostgreSQL sorts the entire events table by user_id first, then by created_at DESC within each user. After that sort, the first row in each user group is the most recent event, and that is what DISTINCT ON keeps.
If you also want to control the final output order (for example, display users sorted by their name), you need to wrap this in a subquery. More on that in a moment.
Returning the Row With the Highest Value
The same logic applies when you want to pick a row based on a numeric column, such as the highest score or the largest order amount.
-- One row per customer: the order with the highest total amount
SELECT DISTINCT ON (customer_id)
customer_id,
order_id,
total_amount
FROM orders
ORDER BY customer_id, total_amount DESC;
Sorting total_amount DESC puts the biggest order at the top of each customer group. DISTINCT ON picks that top row and discards the rest. Clean and fast.
Using a Subquery or CTE When You Need Both
Here is a real constraint: the outer ORDER BY in a DISTINCT ON query controls which row survives, not how the final results are presented. You cannot reorder the final output with a different ORDER BY in the same query level.
If you need the deduplication to pick the newest row and you want to sort the final results by username alphabetically, wrap the DISTINCT ON query in a subquery or CTE:
-- CTE approach: deduplicate first, then sort the output however you like
WITH latest_events AS (
SELECT DISTINCT ON (user_id)
user_id,
event_type,
created_at
FROM events
ORDER BY user_id, created_at DESC
)
SELECT
u.username,
le.event_type,
le.created_at
FROM latest_events le
JOIN users u ON u.id = le.user_id
ORDER BY u.username ASC;
The CTE handles deduplication with its own ORDER BY. The outer query joins and sorts however you need without interfering with which row was chosen. This pattern also makes your intent explicit and easier for teammates to read.
If you're dealing with unexpected NULL values in aggregated results from similar queries, the same subquery-isolation technique applies β it is the same mental model as fixing Pandas pivot_table queries that return NaN instead of zero: isolate the transformation step before applying the final projection.
DISTINCT ON vs ROW_NUMBER() β Which Should You Use?
DISTINCT ON is PostgreSQL-specific syntax. ROW_NUMBER() is standard SQL and works across most databases. For straightforward deduplication within PostgreSQL, DISTINCT ON is often shorter and slightly more readable. But ROW_NUMBER() gives you more control and is more portable.
Here is the equivalent query using a window function:
-- ROW_NUMBER equivalent: gives you the same result with more flexibility
SELECT
user_id,
event_type,
created_at
FROM (
SELECT
user_id,
event_type,
created_at,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY created_at DESC
) AS rn
FROM events
) ranked
WHERE rn = 1;
Use DISTINCT ON when you are working exclusively in PostgreSQL and want concise code. Prefer ROW_NUMBER() when you need to keep rows that tie (use RANK() or DENSE_RANK() instead of filtering on rn = 1), or when portability matters. The window function approach is also easier to extend β you can add a second ranking criterion without restructuring the query.
Performance considerations
Both approaches can use an index to avoid a full sort. For DISTINCT ON, a multi-column index on (user_id, created_at DESC) lets PostgreSQL use an index scan and skip the sort step entirely. Create it like this:
CREATE INDEX idx_events_user_created
ON events (user_id, created_at DESC);
Run EXPLAIN ANALYZE on your query before and after adding the index. On large tables, the difference between a sequential scan with a sort and an index scan can be significant.
Common Pitfalls
Forgetting that DISTINCT ON columns must lead ORDER BY
PostgreSQL will raise an error if the expressions in DISTINCT ON do not appear at the beginning of ORDER BY. This is by design. The fix is to always put your distinct column(s) first in ORDER BY, then add your sort preference columns after them.
Using DISTINCT instead of DISTINCT ON
Plain SELECT DISTINCT deduplicates the entire row across all selected columns. DISTINCT ON (col) deduplicates based on one column while keeping all other columns from a single chosen row. These are not interchangeable. Using SELECT DISTINCT when you meant DISTINCT ON will give you more rows than expected, or a syntax error when combined with selective column projections.
NULL values in the distinct column
PostgreSQL treats NULL values as equal for the purpose of DISTINCT ON. All rows where the distinct column is NULL form a single group, and only one survives. If your column has NULLs and you did not expect this, filter them out with a WHERE col IS NOT NULL clause before deduplication.
Expecting DISTINCT ON to work across joined tables without a subquery
When you join tables before applying DISTINCT ON, the join can multiply rows. The DISTINCT ON then deduplicates after the join, which may not give you what you want. The safer approach is to deduplicate in a subquery first, then join β the same isolation technique shown in the CTE example above. This is conceptually identical to the isolation patterns needed when debugging stale data issues in SQLite queries β the source of truth matters as much as the query logic.
Assuming the result order matches DISTINCT ON order
The final output order of a DISTINCT ON query is only guaranteed insofar as it matches the ORDER BY you specified. If you add a second query level (pagination, a wrapper query), that order can change. Always apply your final ORDER BY at the outermost query level.
Wrapping Up
The root cause of DISTINCT ON returning the wrong row is almost always a missing or incorrectly directed ORDER BY. PostgreSQL keeps the first row per group, and ORDER BY is what defines
Frequently Asked Questions
Why does PostgreSQL DISTINCT ON return a different row each time I run the query?
Without an ORDER BY clause, PostgreSQL can return any row from each group, and the choice can change based on how the planner accesses the table. Always include ORDER BY with the DISTINCT ON column leading, followed by the column that defines which row you want to keep.
Can I use DISTINCT ON with multiple columns in PostgreSQL?
Yes. DISTINCT ON (col1, col2) deduplicates based on the combination of both columns, keeping one row for each unique pair. Your ORDER BY must start with those same columns in the same order before adding any secondary sort criteria.
Is DISTINCT ON faster than using ROW_NUMBER() in PostgreSQL?
Performance is similar when both can use an index, but DISTINCT ON often has a slight edge for simple cases because the planner can optimize it directly. For complex filtering or portability needs, ROW_NUMBER() is more flexible despite similar cost.
How do I return the two most recent rows per group in PostgreSQL, not just one?
DISTINCT ON only keeps exactly one row per group, so you need ROW_NUMBER() for this. Use ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY sort_col DESC) in a subquery, then filter WHERE rn <= 2 in the outer query.
Does PostgreSQL DISTINCT ON work correctly with NULL values in the grouping column?
PostgreSQL treats all NULL values in the DISTINCT ON column as belonging to the same group, so only one NULL row survives. If you need to preserve all rows where the column is NULL, filter them out before applying DISTINCT ON and union the results back.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!