How do I include NaN in pandas value_counts results?

Pass dropna=False to value_counts(), like series.value_counts(dropna=False). This tells pandas to treat NaN as a regular value and include it as a row in the output frequency table.

Does value_counts normalize=True include NaN in the percentage calculation?

Only when you also pass dropna=False. By default, normalize=True computes percentages over the non-null population, which inflates every percentage. With dropna=False the denominator is the full series length, giving you accurate proportions.

Why does my pandas value_counts output not add up to the total number of rows?

Pandas drops NaN values from value_counts by default, so the counts only cover non-missing rows. Add dropna=False to include missing values and make the totals match your full dataset size.

How can I count NaN values across all columns in a pandas DataFrame at once?

Use df.isna().sum() for absolute counts per column, or df.isna().mean() for the missing fraction. Both return a Series indexed by column name and are much faster than looping over value_counts for this purpose.

What is the difference between None and np.nan in pandas value_counts?

In most pandas column types, both None and np.nan are treated as missing values and grouped together under a single NaN label in the value_counts output. You do not need to handle them separately when using dropna=False.

Fix Pandas value_counts Silently Excluding NaN Values

You run value_counts() on a column and the frequency table looks perfectly clean. No warnings, no errors. Then you check df[col].isna().sum() and realize you have 400 null rows that never showed up. That silence is the bug.

Pandas drops NaN from value_counts() results by default. It does this quietly, and it can silently distort every percentage and ranking in your analysis if you don't know to look for it.

What You'll Learn

Why value_counts() excludes NaN by default and what that means for your results
How to include missing values with dropna=False
How normalization behaves differently once NaN is included
How to count missing values across multiple columns efficiently
When excluding NaN is actually the right call

Prerequisites

You need pandas installed (any version from 1.0 onward covers everything here) and a basic familiarity with Series and DataFrame objects. The examples use Python 3.8+.

Why value_counts() Drops NaN by Default

The dropna parameter on value_counts() defaults to True. This design decision mirrors SQL's treatment of NULL in aggregations: missing values are considered non-observations and excluded unless you explicitly ask for them.

The intent is defensible in some contexts. If you're generating a report on product categories, an unknown category might be noise rather than signal. But in data cleaning and quality analysis workflows, missing values are exactly what you're looking for. Dropping them by default means you need to remember to override the behavior every single time.

The problem compounds when you use normalize=True. The percentages are computed over the non-null population, so every percentage in your output is inflated relative to the true dataset size. You end up with a frequency table that sums to 100% while hiding a potentially large chunk of your data.

Reproducing the Silent NaN Exclusion

Here's a minimal example that shows exactly what happens. Build a small Series with some missing values baked in:

import pandas as pd
import numpy as np

status = pd.Series(["active", "inactive", "active", None, "active", np.nan, "inactive", None])

print(status.value_counts())

Output:

active      3
inactive    2
dtype: int64

Three NaN-like values exist in the series (two None and one np.nan), and none of them appear in the result. You get back 5 counts from an 8-element series with no indication that anything was dropped.

Now try with normalize=True:

print(status.value_counts(normalize=True))

active      0.600000
inactive    0.400000
dtype: float64

The percentages sum to 100%, which is factually wrong relative to the full dataset. Active is 3/8 = 37.5%, not 60%. That error flows silently into any dashboard, model feature, or business report built on this output.

The Fix: dropna=False

Pass dropna=False and NaN appears in the result like any other value:

print(status.value_counts(dropna=False))

active      3
NaN         3
inactive    2
dtype: int64

Now you see the full picture. Both None and np.nan are grouped together under a single NaN label. Pandas treats them as the same missing sentinel at this level of aggregation, so you don't need to handle them separately.

You can also sort by count descending (the default) or ascending if you want to spot rare values at the top:

print(status.value_counts(dropna=False, ascending=True))

inactive    2
active      3
NaN         3
dtype: int64

This is particularly useful when you're auditing a column and want to confirm that your most common values aren't missing ones masquerading as data.

Normalizing Percentages When NaN Is Present

Once you pass dropna=False, normalization is computed over the full series length, which is what you actually want:

print(status.value_counts(dropna=False, normalize=True))

active      0.375
NaN         0.375
inactive    0.250
dtype: float64

Now the percentages reflect reality: 37.5% active, 37.5% missing, 25% inactive. These are the numbers you need to make honest decisions about imputation, data collection quality, or whether a feature is usable in a model.

If you want to display these as formatted percentages, you can chain a simple transform:

pct = status.value_counts(dropna=False, normalize=True).mul(100).round(1).astype(str) + "%"
print(pct)

active      37.5%
NaN         37.5%
inactive    25.0%
dtype: object

Counting NaN Values Across Multiple Columns

On a full DataFrame you often want missing-value frequency counts for every column at once. value_counts() operates on a Series, so you need a different approach for the column-level overview.

The fastest way to get absolute counts of NaN per column:

df = pd.DataFrame({
    "status": ["active", "inactive", None, "active", np.nan],
    "region": ["north", None, "south", None, "east"],
    "score":  [8.5, 7.0, np.nan, 9.1, np.nan],
})

print(df.isna().sum())

status    2
region    2
score     2
dtype: int64

To get missing-value percentages per column:

print(df.isna().mean().mul(100).round(1))

status    40.0
region    40.0
score     40.0
dtype: float64

When you need the full value_counts() breakdown for every column in a loop, you can use a dictionary comprehension:

freq = {col: df[col].value_counts(dropna=False) for col in df.columns}
for col, counts in freq.items():
    print(f"\n--- {col} ---")
    print(counts)

This pattern is useful during exploratory data analysis when you want to audit each column before deciding on a cleaning strategy. For a deeper look at how NaN propagates through row-level operations, see the article on fixing Pandas dropna that removes rows with partial NaN when you need complete cases only.

When to Keep NaN Out (and When You Shouldn't)

There are legitimate situations where the default dropna=True behavior is exactly right.

Reporting known-good data: If missing rows have already been filtered upstream and you're summarizing a clean dataset, including NaN in output adds noise.
Model input validation: Confirming that a feature column is fully populated before training — you want a clean count without null entries polluting the frequency table.
Display-layer summaries: End-user dashboards often treat missing as non-existent by convention; showing a NaN row can confuse non-technical stakeholders.

But in data cleaning, quality audits, and any EDA phase, you should default to dropna=False. The discipline is simple: treat NaN as a first-class value until you consciously decide to remove it. The same principle applies when you're debugging other silent-drop behaviors, like the one covered in fixing Pandas groupby that returns NaN when the group key contains None.

Common Pitfalls With value_counts and Missing Data

Pitfall 1: Assuming the index tells you if NaN is present

Even with dropna=False, NaN appears in the index of the returned Series, not as a string. If you try to filter or look it up with result["NaN"] you'll get a KeyError. Use result[result.index.isna()] instead.

result = status.value_counts(dropna=False)
# Wrong:
# result["NaN"]  # KeyError

# Correct:
nan_count = result[result.index.isna()].sum()
print(nan_count)  # 3

Pitfall 2: Mixing pd.NA and np.nan in the same column

In columns with pandas nullable dtypes (Int64, StringDtype), missing values are represented as pd.NA, not np.nan. Pandas still groups them correctly under NaN in the value_counts output, but watch out when you're doing comparisons elsewhere in your pipeline. pd.isna() handles both; direct equality checks with np.nan do not.

Pitfall 3: Forgetting dropna=False after a resample or groupby

If you call value_counts() inside a groupby().apply() block, the dropna parameter applies to each group independently. A group might have zero nulls but another might have many — and the default behavior silently handles each group differently depending on what's in it. Be explicit in every call inside applied functions. Related: Pandas apply() can also silently swallow errors on axis=1, which compounds this kind of invisible data loss.

Pitfall 4: Using value_counts on object columns with whitespace

A string " " (space) and np.nan are different things. Columns imported from CSV sometimes have empty-string values that aren't caught by isna(). Run a quick df[col].replace("", np.nan) before counting if your source is CSV or Excel. For related CSV-to-DataFrame quirks, the article on fixing Pandas to_datetime that silently produces NaT on mixed formats is worth reading alongside this one.

Pitfall 5: Treating NaN frequency as always bad

Sometimes high NaN frequency in a column is the signal you needed to find. A column that is 80% missing might be a candidate for dropping from a model, for a targeted data collection effort, or for an upstream pipeline fix. Don't suppress that information before you've understood what it means.

Wrapping Up: Next Steps

The one-parameter fix is simple. But the habit of thinking about where missing values go is what separates reliable data pipelines from ones that produce subtly wrong results. Here's what to do now:

Audit your existing notebooks: Search for every call to value_counts() in your codebase and check whether dropna=False should have been there.
Add a missing-value audit step to your EDA template: Make df.isna().sum() and per-column value_counts(dropna=False) a standard first step before any analysis.
Be explicit in team code: If you're writing shared pipeline code, always pass dropna explicitly (either True or False) so the intent is clear to reviewers.
Check your normalization denominators: Anywhere you use normalize=True, verify the total against the original series length to confirm nothing was silently excluded.
Extend the pattern to groupby: Review how NaN group keys behave in your aggregations — the behavior is different from value_counts and equally easy to misread.

Fixing Python Pandas value_counts That Silently Excludes NaN From Results

What You'll Learn

Prerequisites

Why value_counts() Drops NaN by Default

Reproducing the Silent NaN Exclusion

The Fix: dropna=False

Normalizing Percentages When NaN Is Present

Counting NaN Values Across Multiple Columns

When to Keep NaN Out (and When You Shouldn't)

Common Pitfalls With value_counts and Missing Data

Pitfall 1: Assuming the index tells you if NaN is present

Pitfall 2: Mixing pd.NA and np.nan in the same column

Pitfall 3: Forgetting dropna=False after a resample or groupby

Pitfall 4: Using value_counts on object columns with whitespace

Pitfall 5: Treating NaN frequency as always bad

Wrapping Up: Next Steps

Frequently Asked Questions

Related Articles

Fixing Python Pandas read_csv That Silently Truncates Large Integer Columns

Fixing Python Pandas str.split That Silently Drops Rows With Missing Delimiter

Fixing a PostgreSQL Trigger That Fires Twice on UPDATE With a Row-Level Rule

Comments (0)

Leave a Comment

Fixing Python Pandas value_counts That Silently Excludes NaN From Results

What You'll Learn

Prerequisites

Why value_counts() Drops NaN by Default

Reproducing the Silent NaN Exclusion

The Fix: dropna=False

Normalizing Percentages When NaN Is Present

Counting NaN Values Across Multiple Columns

When to Keep NaN Out (and When You Shouldn't)

Common Pitfalls With value_counts and Missing Data

Pitfall 1: Assuming the index tells you if NaN is present

Pitfall 2: Mixing pd.NA and np.nan in the same column

Pitfall 3: Forgetting dropna=False after a resample or groupby

Pitfall 4: Using value_counts on object columns with whitespace

Pitfall 5: Treating NaN frequency as always bad

Wrapping Up: Next Steps

Frequently Asked Questions

Related Articles

Fixing Python Pandas read_csv That Silently Truncates Large Integer Columns

Fixing Python Pandas str.split That Silently Drops Rows With Missing Delimiter

Fixing a PostgreSQL Trigger That Fires Twice on UPDATE With a Row-Level Rule

Comments (0)

Leave a Comment

Stay ahead of the curve