Pandas groupby Wrong Results: Common Pitfalls and Fixes

Your groupby ran cleanly, no exceptions, and the output DataFrame looks reasonable — but the numbers don't match what you calculated by hand. That's one of the more frustrating bugs in data work because there's no error to chase. The problem is in how you called groupby, not in your data.

Most groupby surprises come from a handful of predictable patterns: NaN rows disappearing quietly, multi-column group keys producing unexpected cartesian-style combinations, or using apply where transform is what you actually needed. Once you've seen these patterns, you'll recognize them instantly.

Why NaN values in group keys silently drop rows
How sort=True and observed=False can distort results you then use downstream
The difference between agg, transform, and apply — and when each one misleads you
How duplicate index values after a merge cause inflated aggregation totals
How to verify your groupby output before it reaches a dashboard or report

Prerequisites

You should be comfortable with basic Pandas operations: loading a DataFrame, slicing columns, and running simple aggregations. The examples below use Pandas 1.5+ and Python 3.9+. Most behaviors described here exist in older versions too, but the exact warning messages may differ.

NaN in Group Keys Silently Drops Rows

This is the most common silent data loss in groupby. By default, if a row's group key column contains NaN, that row is excluded from every group — no warning, no error.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'region': ['North', 'South', 'North', None, 'South'],
    'sales': [100, 200, 150, 400, 300]
})

result = df.groupby('region')['sales'].sum()
print(result)
# region
# North    250
# South    500
# dtype: int64

Notice that the row with sales=400 simply vanished. The total should be 1150, but you got 750. If you're not checking row counts before and after, this will slip through.

The fix is dropna=False, introduced in Pandas 1.1:

result = df.groupby('region', dropna=False)['sales'].sum()
print(result)
# region
# North      250
# South      500
# NaN        400
# dtype: int64

Now the NaN group appears as its own bucket. You can then decide whether to impute, label, or filter those rows explicitly rather than having them vanish.

Categorical Columns and observed=False

When your group key is a Pandas Categorical dtype, groupby defaults to observed=False in older Pandas versions (prior to 2.0) — meaning it returns a row for every category that exists in the dtype, even if no data rows belong to it. In Pandas 2.0 the default changed to observed=True, but if you're on an earlier version you'll see phantom zero rows.

df['region'] = pd.Categorical(
    df['region'].fillna('Unknown'),
    categories=['North', 'South', 'East', 'West', 'Unknown']
)

result = df.groupby('region', observed=False)['sales'].sum()
print(result)
# region
# North      250
# South      500
# East         0   <-- no data, but appears anyway
# West         0   <-- same
# Unknown    400

This becomes dangerous when you compute percentages or ratios downstream. East and West contribute zeros to totals but still appear as rows, which inflates your group count and can produce divide-by-zero issues. Use observed=True unless you specifically need the empty categories.

Using agg vs transform: Choosing the Wrong One

agg reduces each group to a single scalar and returns a shorter DataFrame. transform returns a result with the same index shape as the original DataFrame, broadcasting the group result back onto every row. Mixing them up is a frequent source of bugs.

When agg misleads you

If you call agg and then try to join the result back to the original DataFrame by index, you're likely to hit an index mismatch or accidental reindex that fills with NaN.

# Risky pattern: computing group mean then merging back
group_mean = df.groupby('region')['sales'].agg('mean')
df['group_mean'] = df['region'].map(group_mean)  # works but fragile

The clean way with transform

# Correct pattern: transform keeps the original index
df['group_mean'] = df.groupby('region')['sales'].transform('mean')
print(df)
#   region  sales  group_mean
# 0  North    100       125.0
# 1  South    200       250.0
# 2  North    150       125.0
# 3  Unknown  400       400.0
# 4  South    300       250.0

Every row gets the mean of its group's sales, aligned correctly by index. No merge required, no risk of row duplication from an accidental many-to-many join.

apply Can Run Your Function Multiple Times

When you pass a custom function to groupby().apply(), Pandas may call it more than once on the first group to detect the return type. If your function has side effects — writing to a list, incrementing a counter, calling an API — you'll get duplicated effects.

call_log = []

def logged_sum(group):
    call_log.append(len(group))
    return group['sales'].sum()

df.groupby('region').apply(logged_sum)
print(call_log)
# [2, 2, 1, 1]  <-- first group called twice

This is a known Pandas behavior documented in the source, not a bug. The fix is to avoid side effects inside apply functions entirely. If you need to log or accumulate something, do it outside the apply call, using the resulting Series or DataFrame as input.

Duplicate Index Values Inflate Aggregations

After a merge or concat operation, your DataFrame may contain duplicate index values. When you then call groupby on a column, the aggregation still runs over those rows — but if you later do an index-based operation (like .loc on the result), you'll get unexpected multi-row returns.

df_a = pd.DataFrame({'id': [1, 2], 'sales': [100, 200]}, index=[0, 1])
df_b = pd.DataFrame({'id': [1, 2], 'region': ['North', 'South']}, index=[0, 1])

# Incorrect merge that resets nothing
merged = pd.concat([df_a, df_b], axis=0)  # stacks rows, not joins
print(merged.groupby('id')['sales'].sum())
# id
# 1.0    100.0
# 2.0    200.0
# NaN      NaN   <-- region rows had no sales column

The real danger is when a many-to-one merge is accidentally done as many-to-many, duplicating rows before aggregation and inflating every sum. Always call df.shape and df.duplicated().sum() before and after a merge, and reset the index with df.reset_index(drop=True) when you don't need the original index.

Multi-Column Group Keys and Unexpected Combinations

When you group by multiple columns, each unique combination of values becomes a group. This is correct behavior, but it surprises people who expect a hierarchical result instead of a flat multi-index.

df2 = pd.DataFrame({
    'region': ['North', 'North', 'South', 'South'],
    'product': ['A', 'B', 'A', 'A'],
    'sales': [100, 200, 300, 400]
})

result = df2.groupby(['region', 'product'])['sales'].sum()
print(result)
# region  product
# North   A          100
#         B          200
# South   A          700
# dtype: int64

If you expected a separate row for South/B (which has no data), it won't appear unless you're using a Categorical with observed=False. More importantly, accessing the result with a single-level key like result['North'] returns a Series, not a scalar — which then propagates type errors downstream. Use result.loc[('North', 'A')] with a tuple key when accessing a specific cell in a multi-index result.

sort=True Changes Row Order Downstream

By default, groupby(sort=True) sorts groups alphabetically or numerically. This is fine for display, but if you're relying on positional row order in any downstream step — like pairing results with a separate list or using iloc — you may be pairing the wrong values.

labels = ['South total', 'North total']  # manually ordered
result = df.groupby('region', sort=True)['sales'].sum()
# result index is ['North', 'South'] after sort — opposite of labels
result.index = labels  # now North gets labeled 'South total'

Use sort=False if you need to preserve insertion order, and never assign an index or label list to a groupby result unless you've explicitly verified the group order first. Better yet, use a dictionary or named Series to pair labels with values rather than relying on positional alignment.

Common Pitfalls at a Glance

Pitfall	Default behavior	Fix
NaN in group key	Rows dropped silently	`groupby(dropna=False)`
Categorical with empty categories	Phantom zero rows appear	`groupby(observed=True)`
agg + manual join	Index mismatch or NaN fill	Use `transform` instead
apply with side effects	Function called twice on first group	Remove side effects from apply
Duplicate rows after merge	Inflated aggregation totals	Check duplicates before groupby
Multi-index key access	Single key returns Series, not scalar	Use tuple key with `.loc`
sort=True with positional labels	Wrong label-to-value pairing	Use `sort=False` or named mapping

How to Verify groupby Results Before They Reach Production

A few quick sanity checks can catch most of these issues before a wrong number ends up in a dashboard or report.

First, compare row counts. The sum of group sizes should equal the number of rows in your DataFrame (if dropna=False) or your DataFrame minus the NaN-key rows (if dropna=True):

group_sizes = df.groupby('region', dropna=False).size()
print(group_sizes.sum(), 'vs', len(df))  # should match

Second, verify totals against a global aggregation:

group_sum = df.groupby('region', dropna=False)['sales'].sum().sum()
global_sum = df['sales'].sum()
assert group_sum == global_sum, f"Totals diverge: {group_sum} vs {global_sum}"

Third, spot-check a known group manually. Pick one group value, filter the DataFrame directly with .loc, and compute the same metric. If the result differs from what groupby produced, you've isolated a bug.

Next Steps

Now that you know where the traps are, here are concrete things to do right now:

Audit any existing groupby calls in your codebase and check whether dropna=True is silently dropping rows you need.
Add a row-count assertion and total-sum assertion as automated checks in any ETL or reporting script that uses groupby.
Replace any pattern of agg + manual merge-back with transform where the use case fits — it's simpler and index-safe.
If you're on Pandas below 2.0 and using Categorical columns, set observed=True explicitly on every groupby call to avoid phantom rows.
Run df.duplicated().sum() after every merge before any aggregation step — it takes one line and has saved countless incorrect reports.

Pandas groupby Giving Wrong Results: Common Pitfalls and Fixes

Prerequisites

NaN in Group Keys Silently Drops Rows

Categorical Columns and observed=False

Using agg vs transform: Choosing the Wrong One

When agg misleads you

The clean way with transform

apply Can Run Your Function Multiple Times

Duplicate Index Values Inflate Aggregations

Multi-Column Group Keys and Unexpected Combinations

sort=True Changes Row Order Downstream

Common Pitfalls at a Glance

How to Verify groupby Results Before They Reach Production

Next Steps

Related Articles

Fixing Incorrect Totals in Power BI Matrix Visuals Without Rewriting Your Model

Tableau LOD Expressions Explained: Fixing Wrong Aggregation Levels

Fixing Slow Power BI Reports: DAX Measures That Kill Dashboard Performance

Comments (0)

Leave a Comment

Pandas groupby Giving Wrong Results: Common Pitfalls and Fixes

Prerequisites

NaN in Group Keys Silently Drops Rows

Categorical Columns and observed=False

Using agg vs transform: Choosing the Wrong One

When agg misleads you

The clean way with transform

apply Can Run Your Function Multiple Times

Duplicate Index Values Inflate Aggregations

Multi-Column Group Keys and Unexpected Combinations

sort=True Changes Row Order Downstream

Common Pitfalls at a Glance

How to Verify groupby Results Before They Reach Production

Next Steps

Related Articles

Fixing Incorrect Totals in Power BI Matrix Visuals Without Rewriting Your Model

Tableau LOD Expressions Explained: Fixing Wrong Aggregation Levels

Fixing Slow Power BI Reports: DAX Measures That Kill Dashboard Performance

Comments (0)

Leave a Comment

Stay ahead of the curve