Pandas resample and asfreq Returning NaNs: Time Series Gaps Explained
You call resample('D').mean() on your DataFrame and half the rows come back as NaN. Or you use asfreq('H') to upsample to hourly data and get a wall of missing values you didn't ask for. This is one of the most common stumbling blocks when working with time series in Pandas, and the root cause is almost never what you first suspect.
This article explains what's actually happening under the hood, when each method is the right tool, and how to handle those gaps in a way that's appropriate for your data.
What you'll learn
- The difference between
resampleandasfreqand when to use each - Why NaNs appear after resampling or frequency conversion
- How to fill gaps using forward fill, backward fill, and interpolation
- Common mistakes that silently produce wrong results
- How to validate your time series index before resampling
Prerequisites
You'll need Pandas installed (any version from 1.x onwards covers everything here). The examples assume you have a DatetimeIndex on your DataFrame. If your timestamps are in a regular column, run df = df.set_index('timestamp') first and make sure the column is already parsed as datetime64 β not strings.
Why NaNs appear at all
Both resample and asfreq work by constructing a new, regular date range at the frequency you specify, then aligning your existing data against it. Any position in that new range where no original data point lands gets filled with NaN by default.
Think of it as a left join between a complete calendar and your messy real-world data. Weekends with no readings, server downtime gaps, public holidays β all of those produce empty slots in the output index. Pandas doesn't guess what the value should be; it just marks the slot as missing and leaves the decision to you.
resample vs asfreq: they are not interchangeable
resample is an aggregation tool. You use it to downsample (e.g., hourly data β daily averages) or to upsample (e.g., daily data β hourly rows). Downsampling groups existing rows into buckets and applies an aggregation function like .mean(), .sum(), or .first(). Upsampling creates new rows and leaves the values empty until you fill them.
asfreq is a frequency conversion tool. It does no aggregation. It selects the value at the exact timestamp matching the new frequency, or inserts NaN if no exact match exists. It's faster and simpler than resample when your data is already at a consistent frequency and you just want to regularise the index.
The key distinction: if you have one row per day and you call asfreq('H'), you're asking Pandas to produce 24 rows per day. Only the row that coincides with the original daily timestamp survives; the other 23 are NaN. That's not a bug β that's the function doing exactly what it promises.
Diagnosing your time series before you resample
Before you can fix NaNs, you need to understand the shape of your gaps. Start here:
import pandas as pd
# Check the index type
print(df.index.dtype)
# Check for duplicates
print(df.index.duplicated().sum())
# Check for gaps (works when you expect a consistent frequency)
expected = pd.date_range(start=df.index.min(), end=df.index.max(), freq='D')
missing_dates = expected.difference(df.index)
print(f"Missing dates: {len(missing_dates)}")
print(missing_dates[:10])
Duplicate timestamps are a common culprit. If your source data has two rows for the same timestamp and you resample expecting one-per-period, your aggregation results may look right but your gap-filling logic will behave unexpectedly. Deduplicate first with df = df[~df.index.duplicated(keep='first')].
Also confirm the index is actually sorted. Pandas resampling requires a monotonic index. Run df = df.sort_index() before doing anything else if you're unsure.
Handling NaNs after downsampling
When you downsample, NaNs typically appear because a time bucket had no data at all β for example, a sensor that was offline for a full day when you're resampling to daily frequency.
# Downsample to daily mean
daily = df['value'].resample('D').mean()
# How many NaN days?
print(daily.isna().sum())
Your options depend on what the NaN represents. If the sensor was offline, forward-filling (propagating the last known value) is often appropriate for slowly-changing signals like temperature. If the missing period genuinely had no activity β say, zero sales on a holiday β you should fill with 0, not the previous value.
# Forward fill: carry last known value forward
daily_ffill = daily.ffill()
# Fill with zero (e.g., no transactions = zero)
daily_zero = daily.fillna(0)
# Fill with the column mean (use carefully β can distort trends)
daily_mean = daily.fillna(daily.mean())
The choice here is a domain decision, not a Pandas decision. Getting it wrong produces a technically valid DataFrame that contains quietly wrong analysis.
Handling NaNs after upsampling
Upsampling is where most people hit the NaN wall hard. You take daily closing prices and resample to hourly β suddenly 23 out of every 24 rows are NaN.
# Upsample daily price data to hourly
hourly = df['price'].resample('H').asfreq()
# Or equivalently:
hourly = df['price'].asfreq('H')
You have three main filling strategies:
Forward fill (ffill)
Carry the last known value forward until the next real observation. This makes sense for prices, sensor readings, or any value that stays constant until it's updated.
hourly_ffill = df['price'].resample('H').ffill()
# Limit how many periods you'll fill forward
hourly_limited = df['price'].resample('H').ffill(limit=6)
Linear interpolation
Useful when you expect the signal to change smoothly between observations β for example, a temperature reading taken once per day that you want to estimate at an hourly granularity.
hourly_interp = df['price'].resample('H').interpolate(method='linear')
Backward fill (bfill)
Fill from the next known value. Less common, but appropriate when a data point logically applies to the period before it (e.g., a nightly batch report that covers the hours before it was written).
hourly_bfill = df['price'].resample('H').bfill()
Using asfreq with a fill method directly
asfreq accepts a method parameter so you can combine frequency conversion and filling in one call:
# Convert and forward-fill in one step
hourly = df['price'].asfreq('H', method='ffill')
# Or backward-fill
hourly = df['price'].asfreq('H', method='bfill')
# Fill with a fixed value
hourly = df['price'].asfreq('H', fill_value=0)
This is cleaner for simple cases. Use the chained .resample().ffill() form when you need more control β for instance, when applying different fill strategies to different columns in the same DataFrame.
Filling different columns differently
Real-world DataFrames often have columns with different semantics. A temperature column might warrant interpolation while a fault-code column should be forward-filled and a count column should default to zero. You can handle this column by column after resampling:
upsampled = df.resample('H').asfreq()
# Temperature: interpolate
upsampled['temperature'] = upsampled['temperature'].interpolate(method='linear')
# Fault code: carry last known state
upsampled['fault_code'] = upsampled['fault_code'].ffill()
# Event count: missing means zero events
upsampled['event_count'] = upsampled['event_count'].fillna(0)
This pattern keeps your filling logic explicit and reviewable. Avoid calling .ffill() on the whole DataFrame in one shot unless every column genuinely has the same semantics.
Common pitfalls
Timezone-naive vs timezone-aware indexes. Mixing them produces errors or silent misalignment. If your source data is UTC, localise it with df.index = df.index.tz_localize('UTC') before resampling. If it's already localised, convert with tz_convert. Never mix naive and aware timestamps in the same operation.
Business day frequencies. Using resample('B') skips weekends but not public holidays. If your data source also skips holidays, the gap count won't match and you'll end up with NaNs for every holiday in your range. Consider using pandas_market_calendars or simply forward-filling with a limit if this affects you.
Aggregating with the wrong function. Calling resample('M').sum() on a column that contains a running total (not a delta) produces nonsense. Double-check whether your values are additive before summing.
Forgetting that resample is lazy. df.resample('D') returns a Resampler object, not a DataFrame. Nothing is computed until you call an aggregation or filling method on it. Printing the resampler object will not show you NaNs β it shows you the object itself. Always chain a method: .mean(), .asfreq(), .ffill(), etc.
Using ffill across a very long gap. If your sensor was offline for two weeks and you forward-fill hourly, you'll get two weeks of identical stale values. Use the limit parameter to cap propagation at a sensible number of periods, then decide how to handle the remainder separately.
A complete worked example
Here's a realistic end-to-end snippet: irregular sensor readings resampled to 15-minute intervals with appropriate gap handling.
import pandas as pd
import numpy as np
# Simulate irregular sensor data
times = pd.to_datetime([
'2024-01-01 00:00', '2024-01-01 00:12', '2024-01-01 00:45',
'2024-01-01 01:30', '2024-01-01 03:00', # gap from 01:30 to 03:00
])
df = pd.DataFrame({'temp': [20.1, 20.4, 20.9, 21.3, 19.8]}, index=times)
df.index.name = 'timestamp'
# Sort and deduplicate (good habit)
df = df.sort_index()
df = df[~df.index.duplicated(keep='first')]
# Resample to 15-minute frequency
resampled = df['temp'].resample('15min').mean()
print("Before filling:")
print(resampled)
# Interpolate across gaps (smooth signal expected)
filled = resampled.interpolate(method='linear', limit=4)
print("\nAfter interpolation (limit=4 periods):")
print(filled)
The limit=4 caps the interpolation at one hour (4 Γ 15 min). Gaps longer than that remain NaN, which you can then flag for review rather than silently fill with extrapolated fiction.
Wrapping up
NaNs from resample and asfreq are not a Pandas failure β they're Pandas being honest about missing data. Your job is to decide what each gap means and choose the filling strategy that fits your domain. Here are the concrete next steps:
- Audit your index first. Check the dtype, sort order, and duplicate count before you touch resample or asfreq. Most mysterious NaN problems trace back here.
- Pick the right tool. Use
resamplewhen you need aggregation or want to group into buckets. Useasfreqwhen the data is already regular and you just need to regularise the index or upsample with a fill. - Match the fill strategy to the column's meaning. Forward-fill states, interpolate smooth signals, zero-fill counts, and leave genuinely unknown data as
NaNuntil you have a principled answer. - Use the
limitparameter. Any time you forward-fill or interpolate, cap the propagation. Unconstrained fills across long gaps are a silent data quality problem. - Validate the output. After filling, check
df.isna().sum()again. If NaNs remain, investigate them individually β they may indicate a real data pipeline issue upstream.
π€ Share this article
Sign in to saveRelated Articles
Data Analytics
Power BI Cross-Filter Direction Bugs: Why Your Slicers Break Visuals
7m read
Data Analytics
SQL Date Filtering Returning Wrong Ranges: BETWEEN, Truncation, and Timezone Traps
8m read
Data Analytics
Tableau Calculated Fields Returning Null: Fix Type Mismatch and Aggregation Errors
7m read
Comments (0)
No comments yet. Be the first!