Why does SELECT DISTINCT return more rows than GROUP BY on the same table?

SELECT DISTINCT returns more rows when your SELECT list includes a column with varying values that GROUP BY would collapse via aggregation. Every unique combination of all selected columns survives DISTINCT, so a high-cardinality column like a price or timestamp will produce far more rows than a GROUP BY on just the key columns.

Is SELECT DISTINCT slower than GROUP BY in SQL?

For identical column sets with no aggregation, most modern databases generate the same or very similar execution plans for both. The practical performance difference is negligible. GROUP BY becomes the better choice the moment you need aggregate functions, while DISTINCT is cleaner for pure deduplication.

Can DISTINCT and GROUP BY ever return different results even with the same columns?

Yes, in MySQL with ONLY_FULL_GROUP_BY disabled, GROUP BY can silently return an arbitrary value for non-grouped columns, while DISTINCT always uses the actual row values. Also, DISTINCT ON in PostgreSQL is a different operator entirely and will produce one row per specified key, which is not equivalent to SELECT DISTINCT on all columns.

How do NULL values affect DISTINCT vs GROUP BY row counts?

Both DISTINCT and GROUP BY treat multiple NULL values as a single group, so a column with many NULLs will collapse them into one output row under either approach. The divergence appears when a join produces NULLs on one side or when a WHERE clause filters NULLs differently across two versions of your query.

When should I use COUNT(DISTINCT column) instead of SELECT DISTINCT?

Use COUNT(DISTINCT column) when you want a single number representing how many unique non-NULL values exist in a column, without retrieving the individual values. Use SELECT DISTINCT when you need the actual deduplicated rows returned as a result set for further use or inspection.

SQL DISTINCT vs GROUP BY Different Row Counts Explained

You write a query with SELECT DISTINCT, check the row count, then rewrite it with GROUP BY expecting the same result — and the numbers don't match. Before you start doubting your data, know this: the difference is almost always intentional behavior, not a bug. Understanding why they diverge will save you a lot of time.

What you'll learn

The fundamental difference between DISTINCT and GROUP BY in SQL
Four concrete scenarios that cause different row counts between the two
How NULL handling differs and silently changes your results
How to verify which result is actually correct for your use case
Performance tradeoffs so you can pick the right tool for the job

The problem: same query, different counts

Consider a sales table with columns customer_id, region, and amount. You run two queries that look equivalent:

-- Query A
SELECT DISTINCT customer_id, region
FROM sales;

-- Query B
SELECT customer_id, region
FROM sales
GROUP BY customer_id, region;

In this simple form, both queries return the same rows. Now add an aggregated column to Query B:

-- Query B (modified)
SELECT customer_id, region, SUM(amount) AS total
FROM sales
GROUP BY customer_id, region;

Suddenly you cannot run the equivalent with DISTINCT without also selecting amount — and as soon as you do that, the row counts can diverge. The divergence always has a root cause. Let's go through each one.

What DISTINCT actually does

SELECT DISTINCT eliminates duplicate rows from the result set after the column list in your SELECT is evaluated. It operates on the entire projected row — every column you listed. Two rows are considered duplicates only if all selected column values are identical.

SELECT DISTINCT customer_id, region, amount
FROM sales;

If the same customer bought in the same region with the same exact amount twice, those two rows collapse into one. If the amounts differ, both rows survive. DISTINCT does no aggregation — it simply deduplicates the projected output.

What GROUP BY actually does

GROUP BY partitions rows into groups, one group per unique combination of the grouped columns. Any column in your SELECT that is not in the GROUP BY clause must be wrapped in an aggregate function (SUM, COUNT, MAX, etc.).

SELECT customer_id, region, COUNT(*) AS purchase_count
FROM sales
GROUP BY customer_id, region;

Here, each unique (customer_id, region) pair produces exactly one output row, regardless of how many source rows it contains. GROUP BY is fundamentally about collapsing and summarizing. DISTINCT is about deduplicating a flat projection. That distinction is where the trouble starts.

Why the counts diverge: four concrete scenarios

Scenario 1: NULL values

NULL handling is the most common hidden cause of mismatched counts. DISTINCT treats NULLs as equal — two NULL values in the same column collapse into one row. GROUP BY does the same thing. So far, identical. But when you join GROUP BY with COUNT, the behavior can look very different.

-- Returns one row for NULL region
SELECT DISTINCT region FROM sales;

-- Also returns one row for NULL region
SELECT region FROM sales GROUP BY region;

The counts match here. But if you filter NULLs in a WHERE clause on one query and not the other, or if a join produces NULLs on one side, counts can silently diverge. If you're already investigating NULL-related GROUP BY quirks, the article on SQL GROUP BY silently excluding NULLs covers that in detail.

Scenario 2: Aggregated columns included in SELECT

This is the most direct cause of different row counts. The moment you add a non-grouped column to your SELECT DISTINCT, you change what "duplicate" means.

-- Each unique (customer_id, region, amount) is a distinct row
SELECT DISTINCT customer_id, region, amount
FROM sales;

-- Each unique (customer_id, region) is one row, amounts summed
SELECT customer_id, region, SUM(amount) AS total
FROM sales
GROUP BY customer_id, region;

Say customer 42 in the North region made three purchases: $10, $20, and $20. The DISTINCT query returns two rows (the $20 pair collapses). The GROUP BY query returns one row with a total of $50. These are answering fundamentally different questions — and both results are correct, just for different problems.

Scenario 3: DISTINCT ON vs GROUP BY in PostgreSQL

PostgreSQL offers DISTINCT ON (column), which is not standard SQL and behaves differently from both SELECT DISTINCT and GROUP BY.

-- PostgreSQL only: keep the first row per customer_id
SELECT DISTINCT ON (customer_id)
  customer_id, region, amount
FROM sales
ORDER BY customer_id, amount DESC;

DISTINCT ON keeps exactly one row per unique value of the specified column — chosen by the ORDER BY. GROUP BY without aggregation is not allowed in standard SQL for the non-grouped columns. The row counts from DISTINCT ON (customer_id) will equal the number of unique customer_id values. A GROUP BY customer_id with all columns selected would require aggregating region and amount, which changes what you get back. These are genuinely different operations.

Scenario 4: Joins amplifying duplicates before deduplication

When you join tables before applying DISTINCT or GROUP BY, the join can multiply rows in ways that affect the two operations differently.

-- sales joined to order_items: each sale row may match multiple items
SELECT DISTINCT s.customer_id, s.region
FROM sales s
JOIN order_items oi ON s.sale_id = oi.sale_id;

SELECT s.customer_id, s.region
FROM sales s
JOIN order_items oi ON s.sale_id = oi.sale_id
GROUP BY s.customer_id, s.region;

Both of these return the same row count here — because the SELECT and GROUP BY columns are identical. But if you add oi.item_price to the SELECT DISTINCT, row counts explode because each item price variation survives deduplication. This is a common trap when you copy a GROUP BY query and swap it for DISTINCT without thinking about what columns are being exposed. For a deeper look at how joins amplify duplicates, check out this post on Power BI relationships causing duplicate rows in visuals — the join semantics apply equally in SQL.

How to verify which result is correct

When your counts don't match, run this diagnostic sequence. First, check what columns each query is actually comparing:

-- Are the column lists truly equivalent?
SELECT DISTINCT customer_id, region FROM sales;     -- compares both columns
SELECT customer_id FROM sales GROUP BY customer_id; -- compares only one

Second, count NULLs in the relevant columns:

SELECT
  COUNT(*) AS total_rows,
  COUNT(customer_id) AS non_null_customer,
  COUNT(region) AS non_null_region
FROM sales;

Third, use a HAVING clause to find groups with more than one source row, which tells you where GROUP BY is collapsing data that DISTINCT would not:

SELECT customer_id, region, COUNT(*) AS row_count
FROM sales
GROUP BY customer_id, region
HAVING COUNT(*) > 1;

Any row in this result is a place where GROUP BY reduced multiple source rows to one. SELECT DISTINCT customer_id, region would do the same — but if you add any varying column to the DISTINCT select list, those rows will reappear.

Finally, confirm your intent: are you trying to deduplicate a flat list, or summarize grouped data? If it's the former, use DISTINCT. If it's the latter, use GROUP BY. Mixing the two mental models is what causes the confusion in the first place.

Performance considerations

For simple deduplication of the same column set, most modern query optimizers (PostgreSQL, SQL Server, MySQL 8+) generate equivalent or near-equivalent execution plans for DISTINCT and GROUP BY. You can verify this with EXPLAIN or EXPLAIN ANALYZE.

EXPLAIN ANALYZE
SELECT DISTINCT customer_id, region FROM sales;

EXPLAIN ANALYZE
SELECT customer_id, region FROM sales GROUP BY customer_id, region;

In practice, DISTINCT is slightly cleaner to write when you only need deduplication. GROUP BY is the right choice the moment you need any aggregation. Using GROUP BY purely for deduplication (without aggregate functions) works, but a reader of your code will wonder if you forgot a COUNT or SUM. Clarity matters.

On large datasets, both operations need a sort or hash-based deduplication step. If the column you're deduplicating is indexed, the optimizer may use the index to skip the sort entirely. If your query is slow, look at whether adding an index on the GROUP BY or DISTINCT columns helps. The same principles that apply to window function performance apply here — see the breakdown on SQL window functions giving wrong totals when PARTITION BY is missing for context on how the optimizer handles partitioned vs. grouped operations.

Common pitfalls to watch for

Adding a non-grouped column to SELECT DISTINCT changes the granularity. Every extra column you add is a new dimension of uniqueness. A column with high cardinality (like a timestamp or price) will drastically increase the row count.
Using DISTINCT inside COUNT(DISTINCT column) is not the same as SELECT DISTINCT. COUNT(DISTINCT customer_id) counts unique non-NULL values of that column. It does not deduplicate the result set.
MySQL's historical lenient GROUP BY. In older MySQL configurations (ONLY_FULL_GROUP_BY disabled), you could write GROUP BY customer_id while selecting region without an aggregate. The database would silently pick an arbitrary value for region. This is a major source of wrong results. Always enable ONLY_FULL_GROUP_BY or use standard-compliant SQL.
Subqueries and CTEs hiding row multiplication. If your FROM clause references a subquery or CTE that joins tables, the row count entering your DISTINCT or GROUP BY may already be inflated. Debug by running the inner query first and checking its count.
DISTINCT on a SELECT * after a join. This is almost always a mistake. You're deduplicating every column from every joined table, which rarely does what you intend.

These patterns appear in many forms across different tools. If you're working in pandas and hitting similar deduplication issues after reshaping, the guide on pandas melt and stack producing duplicate rows covers the equivalent traps in Python.

Wrapping up

DISTINCT and GROUP BY are not interchangeable, even when they occasionally return the same result. Here are the concrete actions to take from here:

Audit your SELECT list. Every column you add to a SELECT DISTINCT changes what counts as a duplicate. Make sure the columns match the question you're asking.
Run EXPLAIN on both queries when you see a count mismatch. The execution plan often reveals where a join is multiplying rows before deduplication kicks in.
Check for NULLs in your key columns using COUNT(*) vs COUNT(column) and cross-reference with your GROUP BY output.
Use GROUP BY when you need aggregation, and DISTINCT when you only need a deduplicated list with no summary columns.
Enable ONLY_FULL_GROUP_BY in MySQL environments to prevent the database from silently picking arbitrary column values in non-grouped columns.

SQL DISTINCT vs GROUP BY Giving Different Row Counts Explained

What you'll learn

The problem: same query, different counts

What DISTINCT actually does

What GROUP BY actually does

Why the counts diverge: four concrete scenarios

Scenario 1: NULL values

Scenario 2: Aggregated columns included in SELECT

Scenario 3: DISTINCT ON vs GROUP BY in PostgreSQL

Scenario 4: Joins amplifying duplicates before deduplication

How to verify which result is correct

Performance considerations

Common pitfalls to watch for

Wrapping up

Frequently Asked Questions

Related Articles

Power BI Time Intelligence Giving Wrong Results with Non-Standard Calendars

Power BI Cumulative Totals Resetting Unexpectedly: DAX Fixes That Actually Work

Pandas melt and stack Producing Duplicate Rows: Reshaping Pitfalls Fixed

Comments (0)

Leave a Comment

SQL DISTINCT vs GROUP BY Giving Different Row Counts Explained

What you'll learn

The problem: same query, different counts

What DISTINCT actually does

What GROUP BY actually does

Why the counts diverge: four concrete scenarios

Scenario 1: NULL values

Scenario 2: Aggregated columns included in SELECT

Scenario 3: DISTINCT ON vs GROUP BY in PostgreSQL

Scenario 4: Joins amplifying duplicates before deduplication

How to verify which result is correct

Performance considerations

Common pitfalls to watch for

Wrapping up

Frequently Asked Questions

Related Articles

Power BI Time Intelligence Giving Wrong Results with Non-Standard Calendars

Power BI Cumulative Totals Resetting Unexpectedly: DAX Fixes That Actually Work

Pandas melt and stack Producing Duplicate Rows: Reshaping Pitfalls Fixed

Comments (0)

Leave a Comment

Stay ahead of the curve