Fixing Python Pandas apply() That Silently Ignores Errors on Axis=1
You run df.apply(my_func, axis=1) on a DataFrame with tens of thousands of rows. It finishes in seconds, no traceback, no warnings. Then you look at the output column and find NaN scattered everywhere, or results that make no sense. The function broke on certain rows and pandas quietly moved on.
This is one of the most frustrating silent failure modes in pandas. The bug is in your applied function, but nothing tells you that. Here's how to find it, fix it, and stop it from happening again.
What You'll Learn
- Exactly why
apply()onaxis=1can hide exceptions instead of raising them. - How to write a debug wrapper that surfaces the real error immediately.
- How to log row-level failures without crashing your entire pipeline.
- How to test your applied function in isolation before unleashing it on the full DataFrame.
- The most common shapes and type issues that trigger silent failures.
Prerequisites
You need Python 3.8+ and pandas 1.3 or later. The examples below use standard library modules (logging, traceback) alongside pandas β no extra dependencies. A basic understanding of row-wise DataFrame operations is assumed.
Why apply() on Axis=1 Swallows Errors
Pandas apply() does not silently swallow every error in every situation. The behavior depends on what your function returns and how pandas tries to infer the output type. The trap appears in two main scenarios.
First, when your function raises an exception on some rows but not others and you wrap the call in a broad try/except inside the function itself, returning None or np.nan on failure. This looks like error handling, but it converts crashes into invisible bad data.
Second, pandas itself suppresses certain exceptions during its internal type-inference pass. When apply() can't figure out the return shape from the first few rows, it runs the function multiple times and catches TypeError and ValueError internally to retry with a different output type. If your bug triggers one of those exception types on the type-inference pass, it gets caught by pandas β not by you.
The key insight: pandas uses exception catching internally to drive output-type selection. Your function's exception can be mistaken for a type-inference signal and discarded.
Reproducing the Silence: A Minimal Example
Start with a concrete case so you can see the failure mode before you fix it.
import pandas as pd
import numpy as np
data = {
"price": [10.0, None, 25.0, "N/A", 40.0],
"quantity": [3, 5, None, 2, 4],
}
df = pd.DataFrame(data)
def revenue(row):
# Both columns must be numeric β will fail on bad rows
return float(row["price"]) * float(row["quantity"])
result = df.apply(revenue, axis=1)
print(result)
Run this and you'll see output like:
0 30.0
1 NaN
2 NaN
3 NaN
4 160.0
dtype: float64
Three rows silently produced NaN. Row 1 failed because price is None. Row 2 failed because quantity is None. Row 3 failed because price is the string "N/A". No exception, no warning, nothing in your logs.
If you call revenue(df.iloc[3]) directly, you get a ValueError: could not convert string to float: 'N/A'. Pandas caught that and returned NaN instead of raising it to you.
Surfacing Errors Immediately With a Debug Wrapper
The fastest way to expose silent failures is to wrap your function so any exception is re-raised with the row index attached. Use this during development and remove it (or switch it to logging) before production.
def debug_apply(func):
"""Wrapper that re-raises exceptions with row context."""
def wrapper(row):
try:
return func(row)
except Exception as exc:
raise RuntimeError(
f"apply() failed on row index {row.name}: {exc}"
) from exc
return wrapper
result = df.apply(debug_apply(revenue), axis=1)
Now you get a RuntimeError on the first bad row, with the original exception chained. The row.name attribute holds the DataFrame index label, so you know exactly which row caused the problem.
This approach also works as a decorator if you control the function definition:
@debug_apply
def revenue(row):
return float(row["price"]) * float(row["quantity"])
result = df.apply(revenue, axis=1)
Using result_type and Return-Shape Mismatches
A separate class of silent error comes from return-shape mismatches. If your function sometimes returns a scalar and sometimes returns a Series or dict, pandas has to guess what the final output shape should be. When it guesses wrong, rows get dropped or expand unexpectedly.
Setting result_type="reduce" or result_type="expand" makes your intent explicit and prevents pandas from making its own inference pass, which in turn reduces the chance that internal exception-catching interferes with your function's real errors.
# When your function always returns a dict of columns:
def parse_row(row):
return {"revenue": float(row["price"]) * float(row["quantity"]),
"discount": float(row["price"]) * 0.1}
result = df.apply(parse_row, axis=1, result_type="expand")
If the function raises on any row with result_type set explicitly, pandas has fewer opportunities to intercept the exception for its own type-juggling logic, so you're more likely to see the real traceback.
Catching Errors Selectively and Logging Them
Sometimes you genuinely want to skip bad rows rather than crash. The right way to do this is explicit: catch the specific exception you expect, log it with the row context, and return a sentinel value you can filter later. Never use a bare except Exception or a broad except block that swallows everything quietly.
import logging
logging.basicConfig(level=logging.WARNING)
logger = logging.getLogger(__name__)
SENTINEL = float("nan")
def safe_revenue(row):
try:
return float(row["price"]) * float(row["quantity"])
except (TypeError, ValueError) as exc:
logger.warning(
"Row %s skipped: %s | price=%r quantity=%r",
row.name, exc, row["price"], row["quantity"]
)
return SENTINEL
result = df.apply(safe_revenue, axis=1)
bad_rows = result[result.isna()]
print(f"{len(bad_rows)} rows failed β inspect df.loc[bad_rows.index]")
This pattern gives you the best of both worlds: your pipeline doesn't crash on bad data, and you have a full audit trail of every row that failed and why. You also know exactly where the bad rows are so you can decide whether to impute, drop, or escalate them. This is a better approach than the patterns covered in fixing pandas to_datetime that silently produces NaT on mixed formats, which deals with a related class of silent data corruption.
Testing Your Applied Function in Isolation
Before you apply any function to a full DataFrame, test it against individual rows and a set of representative edge-case rows. This is the simplest debugging step that most people skip.
# Test on a single row first
test_row = df.iloc[0]
print(revenue(test_row)) # Should print 30.0
# Test against every known edge-case row explicitly
edge_cases = [
pd.Series({"price": None, "quantity": 5}),
pd.Series({"price": "N/A", "quantity": 2}),
pd.Series({"price": 0.0, "quantity": 0}),
]
for case in edge_cases:
try:
print(revenue(case))
except Exception as exc:
print(f"Edge case failed: {exc} | data={case.to_dict()}")
Running edge cases manually takes two minutes and will catch more bugs than an hour of staring at a DataFrame. Once your function handles every edge case correctly in isolation, applying it row-wise becomes much safer. This same discipline applies when debugging pandas groupby operations that return NaN when group keys contain None.
Common Pitfalls to Watch Out For
Returning None Implicitly
If your function has a code path that falls through without a return statement, Python returns None, which pandas converts to NaN in a numeric column. Make sure every branch of your function has an explicit return.
# Bad β what happens when condition is False?
def classify(row):
if row["score"] > 90:
return "A"
# Falls through, returns None silently
# Good
def classify(row):
if row["score"] > 90:
return "A"
return "B" # explicit fallback
Mutating the Row Inside apply()
Pandas passes each row as a Series. Mutating it inside your function does not modify the original DataFrame β it only affects that local copy. If you need to update the DataFrame, do so outside the apply call using the returned Series or a merge.
Relying on apply() When Vectorized Operations Exist
apply(axis=1) is a Python loop in disguise. It is slow on large DataFrames and harder to debug than vectorized operations. For arithmetic between columns, use direct column math: df["revenue"] = df["price"] * df["quantity"]. Vectorized operations propagate NaN through missing values predictably, and they raise actual TypeError exceptions when types are incompatible, rather than silently failing per row.
Catching Pandas Internal Exceptions
If your function raises a ValueError or TypeError during pandas' internal type-inference pass, pandas may absorb it. You can force pandas to skip the inference pass by using the raw=True parameter, which passes each row as a NumPy array instead of a Series. Only use raw=True if your function is designed for arrays β it disables label-based indexing, so row["price"] will break; you'd need row[0] instead.
Large DataFrames Where One Bad Row Is Hard to Find
Use the debug wrapper described earlier, but add an early-exit mechanism for development: keep a counter and raise after N failures so you get a sample of the problem without waiting for the whole DataFrame to process.
def debug_apply_sampled(func, max_errors=3):
errors = []
def wrapper(row):
try:
return func(row)
except Exception as exc:
errors.append((row.name, exc))
if len(errors) >= max_errors:
raise RuntimeError(
f"Stopping after {max_errors} errors. Sample: {errors}"
) from exc
return float("nan")
return wrapper
result = df.apply(debug_apply_sampled(revenue, max_errors=3), axis=1)
Wrapping Up
Silent errors in apply(axis=1) are dangerous precisely because pandas gives you no signal that anything went wrong. The fix isn't one trick β it's a habit of defensive coding around row-wise operations. Here are your next steps:
- Wrap your applied function with
debug_applyduring development so exceptions always surface with row context. Remove or replace it with logging before production. - Test your function against edge-case rows in isolation before calling
apply(). Usedf.iloc[n]and manually constructed edge-case Series objects. - Set
result_typeexplicitly when your function returns a dict or Series. This reduces the chance pandas intercepts your exceptions during type inference. - Replace
apply()with vectorized operations wherever the logic allows. Column arithmetic,pd.to_numeric(errors='coerce'), andnp.where()are faster and fail more visibly. - Log every row-level failure with the row index and the raw values so you can audit data quality issues after the fact rather than hunting for invisible
NaNs.
If you're dealing with similar silent data issues in other parts of a pandas pipeline, the techniques for handling mixed-format date parsing that silently produces NaT follow the same diagnostic logic and are worth reading alongside this guide.
Frequently Asked Questions
Why does pandas apply() return NaN instead of raising an exception on bad rows?
Pandas internally catches certain exceptions like TypeError and ValueError during its output type-inference pass, converting them to NaN rather than propagating them to your code. This behavior is a side effect of pandas trying to determine the correct return shape automatically, not a deliberate error-suppression feature.
How can I find which rows caused errors in a pandas apply() call?
Wrap your applied function in a debug wrapper that catches exceptions and re-raises them with the row's index label attached using row.name. This immediately tells you which DataFrame row caused the failure and preserves the original exception as context.
Does using axis=1 in pandas apply() make error handling different from axis=0?
Yes, because axis=1 applies your function once per row as a Series, so any type-inference ambiguity or internal exception interception happens at the row level rather than the column level. The row-wise nature means failures on individual rows are more likely to be silently converted to NaN rather than raising a full DataFrame-level exception.
Is it better to use vectorized operations instead of apply() on axis=1?
Whenever your logic can be expressed as direct column arithmetic or built-in pandas/numpy functions, vectorized operations are both faster and more transparent about errors. Use apply(axis=1) only for logic that genuinely requires per-row conditional branching that can't be expressed with vectorized methods.
How do I log failed rows in pandas apply() without stopping the entire pipeline?
Inside your applied function, catch only the specific exceptions you expect, log the row index and the raw values using Python's logging module, and return a sentinel value like float('nan') so processing continues. After the apply call, filter the result for sentinel values to get a complete list of failed rows for review.
π€ Share this article
Sign in to saveRelated Articles
How-To Guides
Fixing Python Pandas dropna That Removes Rows With Partial NaN When You Need Complete Cases Only
8m read
How-To Guides
Fixing Python Pandas to_datetime That Silently Produces NaT on Mixed Formats
9m read
How-To Guides
Fixing PostgreSQL COPY FROM That Silently Skips Rows With Null Delimiter Mismatch
10m read
Comments (0)
No comments yet. Be the first!