Fixing Python Pandas resample That Returns NaN for Irregular Time Series

June 23, 2026 8 min read 4 views

You call df.resample('1H').mean() on a DataFrame that clearly has data, and the result comes back full of NaN. No error, no warning β€” just missing values where numbers should be. The problem is almost never the aggregation function. It's almost always something upstream about the index.

This guide walks through every common root cause and gives you a concrete fix for each one.

What you'll learn

  • Why Pandas silently produces NaN instead of raising an error during resample
  • How to detect and fix a DatetimeIndex that has no freq attribute set
  • How to handle duplicate timestamps that kill aggregations
  • How timezone mismatches cause resampled buckets to miss your data
  • Which fill strategy to use when your data genuinely has gaps

Prerequisites

  • Python 3.9+ with Pandas 1.5 or later (examples were verified on Pandas 2.x)
  • A DataFrame with a DatetimeIndex β€” if you're using a plain integer index, sort that out first with pd.to_datetime() and set_index()
  • Basic familiarity with time series concepts: frequency strings, upsampling vs. downsampling

Why resample Returns NaN in the First Place

Pandas resample works by binning timestamps into fixed-width buckets and then running an aggregation function over each bin. When a bin contains no data points, the result is NaN. That sounds obvious, but the tricky part is that a bin can appear empty even when you have data β€” because the data was never placed into the right bucket.

The four most common reasons this happens:

  1. The DatetimeIndex has no freq set, and Pandas can't infer the bin boundaries cleanly.
  2. Duplicate timestamps mean some rows get dropped or produce NaN under certain aggregations.
  3. The index is unsorted, and Pandas's internal grouping logic silently misaligns bins.
  4. Timezone offsets differ between the index and the resample anchor, so timestamps fall outside every bucket.

Let's fix each one in turn.

Fix 1: Set or Infer the DatetimeIndex Frequency

A DatetimeIndex loaded from a CSV or database query often has freq=None. Resample still works in most cases, but if the index is irregular (sensor data, API logs, event streams), Pandas may produce NaN for bins it can't confidently populate.

Check your index first:

print(df.index.freq)        # None means no freq is set
print(df.index.is_monotonic_increasing)  # should be True

If freq is None but your data is regular, use pd.infer_freq to detect it, then assign it explicitly:

import pandas as pd

inferred = pd.infer_freq(df.index)
print(inferred)             # e.g. 'T', '5T', 'H', 'D'

df.index.freq = inferred

If pd.infer_freq returns None, your data is genuinely irregular (missing rows, variable intervals). In that case you don't assign a freq β€” you just make sure you resample to a target frequency that makes sense for your data and apply an appropriate fill strategy (covered in Fix 5).

For data that has a few missing rows but is otherwise regular, the cleaner approach is to reindex to a complete range first:

full_range = pd.date_range(
    start=df.index.min(),
    end=df.index.max(),
    freq='5T'           # whatever your intended cadence is
)

df = df.reindex(full_range)   # fills gaps with NaN intentionally
result = df.resample('1H').mean()  # now each bucket has the right rows

After reindex, missing periods show up as explicit NaN rows, which is correct input for resample. The aggregation no longer skips them silently.

Fix 2: Remove or Aggregate Duplicate Timestamps

Duplicate timestamps are a common source of unexpected NaN. Some aggregation functions β€” especially those that rely on positional logic β€” behave oddly when two rows share the same timestamp. Even mean() can produce NaN if one of the duplicates carries a NaN value and you haven't set min_count.

Detect duplicates in your index:

dupes = df.index.duplicated(keep=False)
print(df[dupes])   # see all duplicated timestamps

The right fix depends on your data. If the duplicates are genuine measurement noise, aggregate them down before resampling:

# Collapse duplicates by averaging, then resample
df = df.groupby(df.index).mean()
result = df.resample('1H').mean()

If one duplicate is always the authoritative row (e.g., the last write wins), keep only that one:

df = df[~df.index.duplicated(keep='last')]

Either way, deduplicate before you call resample. Trying to work around duplicates inside the resample call leads to fragile code.

Fix 3: Sort the Index Before Resampling

An unsorted DatetimeIndex is one of those problems that sometimes works and sometimes doesn't, which makes it particularly annoying to debug. Pandas resample internally groups rows by bin, but if the index is not monotonically increasing, the bin boundaries can be computed incorrectly and rows end up uncategorized β€” producing NaN.

Always sort before you resample:

df = df.sort_index()
assert df.index.is_monotonic_increasing, "Index is still unsorted!"
result = df.resample('1H').mean()

If you're reading data from multiple sources and concatenating them, sort after the concat:

combined = pd.concat([df_source_a, df_source_b]).sort_index()
result = combined.resample('1H').mean()

The assert is worth keeping in development. A failing assertion is far easier to diagnose than a silent column of NaN.

Fix 4: Align Timezones Across the Series

Timezone mismatches are subtle. If your index is timezone-aware (e.g., UTC) but the data was collected in a local timezone and wasn't converted properly, your timestamps can be off by hours. When you resample to hourly bins, those offset rows fall into the wrong bucket β€” or fall completely outside the range you're looking at β€” and the result is NaN.

Check what timezone your index carries:

print(df.index.tz)   # None if tz-naive, e.g. UTC if tz-aware

If your index is timezone-naive but should be UTC, localize it:

df.index = df.index.tz_localize('UTC')

If it's already localized but in the wrong timezone, convert it:

df.index = df.index.tz_convert('UTC')

Never mix tz-naive and tz-aware series in the same DataFrame. If you concatenate a tz-naive block with a tz-aware block, Pandas will raise a TypeError in newer versions β€” but in older versions it silently falls back to object dtype, which breaks resample entirely. Normalize everything to UTC at ingestion time and convert to local time only for display.

Fix 5: Choose the Right Fill Strategy for Gaps

Sometimes the NaN in your resample output is correct β€” there genuinely was no data in that interval. The question is whether you want to keep those gaps or fill them. Choosing the wrong fill method is a common source of misleading results.

Forward fill (last observation carried forward)

Use this when your data represents a state that persists until it changes β€” sensor readings, configuration values, prices.

result = df.resample('1H').mean().ffill()

Interpolation

Use this for continuous measurements where a linear (or spline) estimate between two readings is meaningful:

result = df.resample('1H').mean().interpolate(method='time')

method='time' weights the interpolation by the actual time distance between points, which is almost always what you want for time series with uneven gaps.

Fill with zero

Use this only when a missing interval genuinely means zero β€” for example, transaction counts, event logs, or error counts where no events were recorded.

result = df.resample('1H').sum(min_count=1).fillna(0)

The min_count=1 argument on sum tells Pandas to return NaN (not zero) for bins with no data, which lets you distinguish "zero events happened" from "data was missing." Then fillna(0) converts only the genuinely empty bins to zero. If you're dealing with similar NaN issues in pivot tables, the same fill_value approach is covered in detail in fixing Pandas pivot_table that returns NaN instead of zero.

Common Pitfalls to Watch Out For

Using resample on a column instead of the index

Resample operates on the index, not on a column. If your timestamp lives in a regular column named timestamp, you must set it as the index first:

df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp').sort_index()
result = df.resample('1H').mean()

Resampling a MultiIndex without specifying the level

If your DataFrame has a MultiIndex and the timestamp is not the outermost level, resample won't know which level to use and may return unexpected results:

# Resample on level 0 (the timestamp level) explicitly
result = df.resample('1H', level=0).mean()

Confusing asfreq with resample

asfreq reindexes to a fixed frequency with no aggregation β€” it just selects or creates rows. If your data has multiple rows per bucket, asfreq returns only the last one (or NaN if the exact timestamp doesn't exist). Use resample when you need aggregation; use asfreq only when you need a single point per period and you know one exists.

Large gaps causing memory spikes

If your data spans months but has a gap of weeks in the middle, resampling to minute-level frequency allocates a row for every minute in the entire range β€” including the gap. That can balloon memory fast. Consider resampling to a coarser frequency, or splitting the DataFrame at known gap boundaries and resampling each segment separately.

Silent NaN propagation in chained operations

Pandas NaN values propagate through arithmetic. If you resample and then compute a rolling mean or a percentage change on the result, any remaining NaN in the resampled output will spread through those derived columns too. Always inspect the intermediate resample result with df.isnull().sum() before chaining further transformations. The same silent propagation issue shows up in other contexts β€” for example, incorrect window function results in PostgreSQL often stem from the same kind of undetected null in an upstream step.

If you've hit threading-related data issues in Python before, it's worth knowing that silent data loss from concurrency bugs has a similar feel β€” the data appears to be there but the results are wrong. The debugging approach in fixing sqlite3 that fails to return results inside a thread covers that pattern well.

Wrapping Up

Most resample NaN bugs trace back to one of four root causes: no frequency on the index, duplicate timestamps, an unsorted index, or a timezone mismatch. Here are the concrete steps to take right now:

  1. Run print(df.index.freq, df.index.is_monotonic_increasing, df.index.tz) and fix any problem you see before calling resample.
  2. Check for duplicate timestamps with df.index.duplicated().sum() and collapse them with groupby().mean() if any exist.
  3. Call df.sort_index() unconditionally before resampling β€” it's cheap and prevents a class of silent bugs.
  4. For genuinely irregular data, use reindex to a complete date_range first, then resample the uniformly-indexed result.
  5. Pick your fill strategy deliberately: ffill for state data, interpolate(method='time') for continuous measurements, and fillna(0) with min_count=1 on sum for event counts.

Frequently Asked Questions

Why does pandas resample return NaN even though there is data in the DataFrame?

The most common cause is that the DatetimeIndex has no freq set or contains duplicate timestamps, so resample can't place rows into the correct bins. Sort the index, remove duplicates, and check the freq attribute before calling resample.

How do I fill NaN values after resampling an irregular time series in pandas?

Use ffill() for state-like data that persists between changes, interpolate(method='time') for continuous measurements, or fillna(0) after sum(min_count=1) for event counts. The right method depends on what a missing interval actually means in your data.

Does pandas resample require the DatetimeIndex to be sorted?

Yes. An unsorted index causes resample to assign rows to the wrong bins, producing NaN in the output without any error or warning. Always call sort_index() before resampling.

What is the difference between pandas resample and asfreq for time series data?

resample aggregates multiple rows within each time bucket using a function like mean or sum. asfreq simply reindexes the DataFrame to a fixed frequency, selecting one row per period and returning NaN if no row matches that exact timestamp.

How can I resample a pandas DataFrame that has duplicate timestamps?

Collapse the duplicates first by running df.groupby(df.index).mean() to average them, or df[~df.index.duplicated(keep='last')] to keep only the last occurrence. Then call resample on the deduplicated result.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.