Why does pandas pivot_table aggregate columns I didn't ask for?

When you don't specify the values parameter, pivot_table defaults to aggregating every numeric column that isn't used as index or column labels. Always pass values explicitly to avoid this silent behavior.

How do I use aggfunc with multiple columns in pivot_table without getting wrong results?

Pass a dictionary to aggfunc where each key exactly matches a column name listed in values. If any key is misspelled or missing, Pandas may silently apply the default aggregation instead of raising an error.

Does pandas pivot_table raise an error when aggregating the wrong column?

No, in most cases it does not. Pandas silently complies with the default behavior, which is why the bug is hard to catch. You need to verify the output explicitly against known totals or a manual calculation.

How can I verify that my pivot_table output is correct?

Cross-check a single cell value using a filtered groupby or a direct DataFrame filter on the same index-column combination. If the numbers match, the pivot is working as intended.

What changed in Pandas 2.0 that affects pivot_table behavior?

Pandas 2.0 removed the implicit numeric_only mode, so applying a numeric aggregation like sum or mean to a string column now raises a TypeError instead of silently ignoring that column. Code that worked silently in Pandas 1.x may break during an upgrade.

Pandas pivot_table Wrong Column Aggregation: How to Fix It

You run pd.pivot_table(), the output looks plausible, and you move on. Two hours later you realize the revenue column is actually averaging quantities, or the aggregation is running on a column you never asked for. Pandas didn't warn you — it just complied.

This is one of the most frustrating bugs in a data pipeline because the result looks correct. The fix is usually a single parameter, but you need to know which one and why.

What You'll Learn

How pivot_table chooses which columns to aggregate by default
Why omitting the values parameter causes silent wrong results
How aggfunc interacts with column selection in unexpected ways
How fill_value and margins can mask the underlying bug
A repeatable verification pattern to confirm your pivot output is correct

Prerequisites

You should be comfortable constructing a basic DataFrame and calling pd.pivot_table(). The examples use Pandas 1.5 and above, though the behavior described here has been consistent for many major versions. A working Python environment with Pandas installed is all you need to follow along.

How pivot_table Decides What to Aggregate

Before you can fix the problem, you need a clear mental model of what pivot_table is actually doing. The function signature looks straightforward:

pd.pivot_table(
    data,
    values=None,
    index=None,
    columns=None,
    aggfunc='mean',
    fill_value=None,
    margins=False,
    observed=False
)

The index and columns parameters define the shape of the output table — the row labels and column labels respectively. The values parameter tells Pandas which column(s) to place in the cells. If you leave values as None, Pandas aggregates every numeric column that isn't used as an index or column label. That default behavior is the source of most silent aggregation bugs.

The Most Common Cause: Not Specifying `values`

Here is a concrete example. Suppose you have a sales DataFrame with columns region, product, quantity, and revenue.

import pandas as pd

data = {
    'region':   ['North', 'North', 'South', 'South', 'North'],
    'product':  ['A', 'B', 'A', 'B', 'A'],
    'quantity': [10, 20, 15, 5, 8],
    'revenue':  [100, 400, 150, 80, 90],
}
df = pd.DataFrame(data)

# Intent: pivot revenue by region and product
table = pd.pivot_table(df, index='region', columns='product')
print(table)

The output will look like this:

        quantity          revenue
product        A     B        A      B
region
North        9.0  20.0     95.0  400.0
South       15.0   5.0    150.0   80.0

Notice the problem: you probably wanted only revenue, but you got both quantity and revenue. If you had written a downstream script that accessed table['revenue'], it would work — but if the column hierarchy differed in any way (say, you renamed a column upstream), you might accidentally pick up quantity instead, and nothing would throw an error.

The fix is to always be explicit about values:

table = pd.pivot_table(
    df,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum'
)
print(table)

product      A      B
region
North    190.0   400.0
South    150.0    80.0

Clean, unambiguous, and exactly what you asked for. This single habit eliminates the majority of silent aggregation bugs.

The same kind of silent mismatch shows up in groupby workflows — if you've seen wrong numbers from Pandas groupby aggregation, the root cause is often the same default-column behavior.

When `aggfunc` Picks the Wrong Column

Even when you do specify values, you can still get wrong results if you pass a dictionary to aggfunc and the keys don't exactly match your column names.

# This looks correct but silently drops revenue aggregation
table = pd.pivot_table(
    df,
    values=['quantity', 'revenue'],
    index='region',
    columns='product',
    aggfunc={'quantity': 'sum', 'Revenue': 'sum'}  # typo: 'Revenue' != 'revenue'
)
print(table)

Pandas will not raise a KeyError here. It simply applies the default mean to any column in values that has no matching key in the aggfunc dict — or in some versions, it ignores that column entirely. The output will differ from what you intended, with no warning.

To guard against this, verify that every column you list in values has a corresponding key in aggfunc:

values_cols = ['quantity', 'revenue']
agg_map = {'quantity': 'sum', 'revenue': 'sum'}

assert set(values_cols) == set(agg_map.keys()), \
    f"Mismatch: {set(values_cols).symmetric_difference(agg_map.keys())}"

table = pd.pivot_table(
    df,
    values=values_cols,
    index='region',
    columns='product',
    aggfunc=agg_map
)

This assertion costs nothing in production and saves you from subtle pipeline corruption.

Numeric-Only Mode and Its Hidden Behavior

In older versions of Pandas (pre-2.0), pivot_table had an implicit numeric_only mode that silently dropped non-numeric columns from the aggregation. In Pandas 2.0, this behavior changed: Pandas will raise a TypeError if you try to apply a numeric aggregation to a string column instead of silently ignoring it.

This means code that used to produce a narrower-than-expected pivot table now raises an exception — which is actually an improvement, but it can catch you off guard during a version upgrade. If you're upgrading from Pandas 1.x to 2.x and your pivot calls suddenly break, check whether your values list contains any non-numeric columns that were previously silently excluded.

# Check your columns before pivoting
print(df[values_cols].dtypes)

Any column showing object dtype with a numeric aggfunc like sum or mean will be the culprit. Either cast it or remove it from values.

This kind of dtype-related silent failure also comes up in Pandas merge operations that silently drop rows, where type mismatches on join keys produce unexpectedly small output.

The `fill_value` and `margins` Trap

Two optional parameters — fill_value and margins — can make an already-wrong pivot table look convincing.

fill_value Hides Missing Combinations

When a combination of index and column values has no rows in the source data, pivot_table produces NaN by default. Setting fill_value=0 replaces those gaps with zeros, which is often correct. But if your aggregation was targeting the wrong column to begin with, those zeros will happily sit in your output without any signal that something upstream went wrong.

# NaN hints that data is missing -- fill_value hides this signal
table_without_fill = pd.pivot_table(df, values='revenue', index='region', columns='product', aggfunc='sum')
print(table_without_fill)  # NaN where no data exists -- a useful diagnostic

table_with_fill = pd.pivot_table(df, values='revenue', index='region', columns='product', aggfunc='sum', fill_value=0)
print(table_with_fill)  # Zeros everywhere -- harder to spot if something is wrong

The rule of thumb: only add fill_value after you've confirmed the pivot is correct. Use raw NaN output during development as a diagnostic signal.

margins=True Can Give Misleading Totals

margins=True adds a row and column labelled

Pandas pivot_table Silently Aggregating Wrong Columns: How to Fix It

What You'll Learn

Prerequisites

How pivot_table Decides What to Aggregate

The Most Common Cause: Not Specifying `values`

When `aggfunc` Picks the Wrong Column

Numeric-Only Mode and Its Hidden Behavior

The `fill_value` and `margins` Trap

fill_value Hides Missing Combinations

margins=True Can Give Misleading Totals

Frequently Asked Questions

Related Articles

Power BI Relationships Causing Duplicate Rows in Visuals: How to Fix

SQL GROUP BY Silently Excluding NULLs: How to Catch and Fix It

Diagnosing Slow Pandas read_csv Calls on Large Files: What to Fix First

Comments (0)

Leave a Comment

Pandas pivot_table Silently Aggregating Wrong Columns: How to Fix It

What You'll Learn

Prerequisites

How pivot_table Decides What to Aggregate

The Most Common Cause: Not Specifying values

When aggfunc Picks the Wrong Column

Numeric-Only Mode and Its Hidden Behavior

The fill_value and margins Trap

fill_value Hides Missing Combinations

margins=True Can Give Misleading Totals

Frequently Asked Questions

Related Articles

Power BI Relationships Causing Duplicate Rows in Visuals: How to Fix

SQL GROUP BY Silently Excluding NULLs: How to Catch and Fix It

Diagnosing Slow Pandas read_csv Calls on Large Files: What to Fix First

Comments (0)

Leave a Comment

Stay ahead of the curve

The Most Common Cause: Not Specifying `values`

When `aggfunc` Picks the Wrong Column

The `fill_value` and `margins` Trap