Fixing Python Pandas pivot_table That Returns NaN Instead of Zero
You build a pivot_table, the shape looks exactly right, but half the cells contain NaN where you expected a tidy zero. The report is broken, and the aggfunc you chose makes it worse. This is one of the most common Pandas gotchas, and it has a handful of distinct causes β each needing a slightly different fix.
What you'll learn
- Why
pivot_tableproducesNaNfor missing combinations - How to use
fill_valuecorrectly (and when it silently does nothing) - How
dropnainteracts with your index and columns - How custom
aggfuncchoices propagateNaNunexpectedly - A reliable post-pivot cleanup strategy as a safety net
Why pivot_table Produces NaN in the First Place
A pivot table is essentially a grouped aggregation laid out on a two-dimensional grid. When a particular combination of row index and column index has no rows in the source data, Pandas has nothing to aggregate β so it fills that cell with NaN rather than inventing a value.
This is the correct default behavior for many aggregations. The mean of zero rows is genuinely undefined, not zero. The problem is that for count or sum aggregations, zero is the right answer, and the default NaN breaks downstream arithmetic.
There is a second, less obvious cause: NaN values already present in your source columns propagate through the aggregation. A single NaN in a sum column causes the whole group to become NaN unless you tell Pandas otherwise.
The Quickest Fix: fill_value Parameter
pivot_table accepts a fill_value argument that replaces every NaN in the result after aggregation. For count and sum use-cases, this is usually all you need.
import pandas as pd
df = pd.DataFrame({
'region': ['North', 'North', 'South', 'East'],
'product': ['A', 'B', 'A', 'B'],
'sales': [100, 200, 150, 300],
})
table = pd.pivot_table(
df,
values='sales',
index='region',
columns='product',
aggfunc='sum',
fill_value=0, # <-- this is the key argument
)
print(table)
Without fill_value=0, the East / A and South / B cells would show NaN. With it, they show 0, which is correct for a sum: no sales means zero sales.
One important nuance: fill_value only replaces NaN cells that arise from missing combinations. It does not fix NaN values that come from the source data itself propagating through the aggregation. That requires a separate step covered below.
When fill_value Doesn't Work
There are two scenarios where fill_value silently fails to produce zeros.
Multiple aggfunc values returning a dict-of-DataFrames
When you pass a list or dict to aggfunc, the result has a MultiIndex on the columns. Pandas applies fill_value to each inner DataFrame independently, which usually works fine. But if you return a custom function that itself returns NaN for empty groups, fill_value has already been consumed before your function runs β it won't help.
The aggfunc returns NaN even for non-empty groups
Consider a lambda that computes a ratio:
table = pd.pivot_table(
df,
values='sales',
index='region',
columns='product',
aggfunc=lambda x: x.sum() / x.count() if x.count() > 0 else 0,
fill_value=0,
)
Here the lambda explicitly returns 0 for empty groups, making fill_value redundant but harmless. If you forget the guard clause, the lambda produces NaN from a 0/0 division, and fill_value does not replace it because the group was not technically missing β it just returned a bad value. You must handle the guard inside the function.
Handling NaN from the Source Data
If your values column contains NaN before the pivot, those NaNs can corrupt the aggregation. The standard Pandas aggregation functions like sum and mean use NumPy under the hood and β depending on the version and the data type β may or may not skip NaN automatically.
The safest strategy is to decide what a NaN source value means to your problem before you pivot, and clean it up there:
# Option 1: treat NaN source values as zero before pivoting
df['sales'] = df['sales'].fillna(0)
# Option 2: drop rows where the value is NaN entirely
df = df.dropna(subset=['sales'])
table = pd.pivot_table(
df,
values='sales',
index='region',
columns='product',
aggfunc='sum',
fill_value=0,
)
Which option you choose depends on business logic. Filling with zero says "a missing sale counts as no revenue". Dropping the row says "we have no data for that event and it should not influence totals". Both are valid β just be deliberate about it.
If you work with Excel files as your data source, you may also encounter cells that look empty but are stored as empty strings. The problem of openpyxl reading unexpected cell content is worth reviewing before you load the file into a DataFrame.
The dropna Parameter and What It Actually Controls
Many developers try dropna=False expecting it to replace NaN with zeros. It does not. This is a common source of confusion.
dropna controls whether index and column labels that are entirely NaN are excluded from the output. By default, dropna=True drops any row or column label that is itself NaN. Setting dropna=False keeps those labels in the result, which can introduce more NaN cells, not fewer.
df_with_nan_label = pd.DataFrame({
'region': ['North', None, 'South'],
'product': ['A', 'A', 'A'],
'sales': [100, 200, 150],
})
# With dropna=True (default), the None region row is excluded
table_default = pd.pivot_table(
df_with_nan_label,
values='sales', index='region', columns='product',
aggfunc='sum', fill_value=0,
)
# With dropna=False, a NaN index label appears in the result
table_keep = pd.pivot_table(
df_with_nan_label,
values='sales', index='region', columns='product',
aggfunc='sum', fill_value=0, dropna=False,
)
If you have rows with NaN in your index or column fields and you want them included with a real label, clean those fields before pivoting β replace NaN with "Unknown" or "N/A" rather than relying on dropna=False.
Custom aggfunc and NaN Propagation
The built-in string shortcuts 'sum', 'mean', 'count', and 'median' all have well-defined behavior around NaN. Lambda functions and custom callables do not.
When you write a custom aggfunc, Pandas calls it with a Series of values for that group. If that Series is empty (because no rows matched), your function receives an empty Series. Operations like .sum() on an empty Series return 0, but .mean() returns NaN, and a ratio like x.iloc[0] / x.iloc[1] will raise an IndexError.
def safe_ratio(x):
"""Return revenue per transaction, or 0 if the group is empty."""
if len(x) == 0:
return 0
total = x.sum()
count = len(x)
return total / count if count else 0
table = pd.pivot_table(
df,
values='sales',
index='region',
columns='product',
aggfunc=safe_ratio,
fill_value=0,
)
Defensive checks inside the aggfunc plus fill_value=0 on the outside covers both failure modes: the group exists but returns a bad value, and the group doesn't exist at all.
Checking Your Result With a Post-Pivot fillna
Even after applying all the fixes above, a final .fillna(0) on the result is a useful safety net in production code. It is cheap, explicit, and self-documenting.
table = pd.pivot_table(
df,
values='sales',
index='region',
columns='product',
aggfunc='sum',
fill_value=0,
).fillna(0) # belt-and-suspenders for any edge cases
The cost is negligible. If you later change the aggfunc to something that produces unexpected NaN, the post-pivot fillna catches it automatically. Just make sure you document why it is there so the next developer doesn't remove it thinking it is redundant.
This pattern is especially important when you export the pivot table to Excel. NaN values written by openpyxl become empty cells, which can break Excel formulas downstream. If you're using openpyxl for the export step, the guide on applying number formatting reliably pairs well with this cleanup strategy.
Common Pitfalls
- Confusing
fill_valuewithfillna. They behave differently.fill_valueis a pivot_table parameter applied during construction;fillnais a DataFrame method applied after. Both are useful; neither is a full replacement for the other. - Using
aggfunc='count'when you meanaggfunc='sum'.countcounts non-null entries; for combinations with no data it still returns 0 (not NaN), but it does not sum your values. Pick the right function for your metric. - Forgetting that
observed=Trueon a Categorical column can hide combinations. If your index or column is a PandasCategorical, all possible category values appear in the pivot even if absent from the data. Passobserved=Trueto suppress phantom categories, or leave itFalse(the default) if you want all combinations visible. - Applying
fill_valueto a string column. If yourvaluescolumn is non-numeric,fill_value=0will insert integer zeros into a string column, causing mixed-type issues. Usefill_value=""or handle the type explicitly. - Assuming
margins=Truetotals are correct after filling. When you addmargins=Trueto include row/column totals, those totals are computed beforefill_valueis applied. The totals reflect the true aggregated values, which is correct β just be aware the margin cells may still showNaNif the margin aggregation itself fails.
A related problem appears in Excel itself when you're building pivot tables natively. The fix for Excel pivot table blank values follows the same conceptual logic β the tool defaults to empty for missing data, and you have to explicitly opt into zeros.
Wrapping Up
The root cause of NaN in a Pandas pivot table is almost always one of three things: a combination of index and column values that doesn't exist in the source data, NaN values in the source column propagating through the aggregation, or a custom aggfunc that returns NaN for edge cases. Each has a targeted fix.
Here are the concrete steps to take right now:
- Add
fill_value=0to yourpivot_tablecall as the baseline fix for missing combinations. - Inspect your source DataFrame with
df[values_col].isna().sum()before pivoting and decide whether to fill or drop those rows. - If you use a custom aggfunc, add an explicit guard for empty input:
if len(x) == 0: return 0. - Chain a
.fillna(0)after the pivot as a production safety net, especially if the result feeds into an Excel export. - If
NaNstill appears on index or column labels, clean those fields in your source DataFrame before pivoting rather than relying ondropna=False.
Frequently Asked Questions
Why does pandas pivot_table show NaN even when I use fill_value=0?
fill_value only replaces NaN cells that arise from missing row/column combinations after aggregation. If your source data already contains NaN values in the column being aggregated, those propagate through the aggfunc and are not replaced by fill_value. You need to clean the source column with fillna or dropna before calling pivot_table.
What is the difference between fill_value and fillna in pivot_table?
fill_value is a parameter passed directly to pivot_table that replaces NaN cells produced during construction. fillna is a DataFrame method you call on the finished pivot table after the fact. Using both together gives you belt-and-suspenders coverage for different sources of NaN.
Does setting dropna=False in pivot_table replace NaN cells with zeros?
No. The dropna parameter only controls whether index or column labels that are themselves NaN get included or dropped from the output. It has no effect on NaN cell values inside the table. Use fill_value=0 or a post-pivot fillna(0) to replace cell-level NaN values.
How do I handle NaN in a custom aggfunc passed to pivot_table?
Add an explicit guard at the start of your function: if the input Series is empty or all-NaN, return 0 (or whatever sentinel value makes sense). Relying on fill_value alone won't help here because your function runs before fill_value is applied, and if it returns NaN for a non-empty group, fill_value won't catch it.
Can margins=True cause NaN values to appear in pivot_table totals?
Yes, if the margin aggregation itself encounters an edge case that produces NaN, the total cells can show NaN even when the body cells are fine. The safest fix is to chain a fillna(0) call after the pivot, which covers both body and margin cells in one step.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!