Merging Multiple Excel Sheets into One DataFrame with Pandas
You've opened an Excel file and found twelve sheets β January through December β each with the same columns but different rows. Now you need them in one place for analysis. Copy-pasting is not the answer, and neither is doing it by hand sheet by sheet inside Python with a wall of repetitive code.
Pandas gives you a clean, repeatable way to pull every sheet into a single DataFrame in fewer than ten lines. Here's how to do it correctly, including the parts most tutorials skip.
What you'll learn
- How to read all sheets from an Excel file at once using
pd.read_excel - How to tag each row with its source sheet name so you can trace it later
- How to handle sheets with mismatched columns
- How to work with multiple Excel files instead of multiple sheets in one file
- Common errors and how to fix them
Prerequisites
You need Python 3.8 or later, and the following packages installed:
pip install pandas openpyxlopenpyxl is the engine Pandas uses to read .xlsx files. Without it, you'll get an unhelpful error about a missing engine. If your files are older .xls format, install xlrd instead.
Reading all sheets at once
The simplest approach is to pass sheet_name=None to pd.read_excel. This tells Pandas to read every sheet and return them as an ordered dictionary where each key is the sheet name and each value is a DataFrame.
import pandas as pd
all_sheets = pd.read_excel("sales_data.xlsx", sheet_name=None)
print(type(all_sheets)) # <class 'dict'>
print(list(all_sheets.keys())) # ['January', 'February', 'March', ...]Each value in all_sheets is a fully formed DataFrame. You can inspect any one of them with all_sheets["January"].head() before committing to a merge.
Concatenating into a single DataFrame
Once you have the dictionary, use pd.concat to stack all the DataFrames on top of each other. Pass ignore_index=True so the final index is a clean sequence instead of repeating row numbers from each sheet.
combined = pd.concat(all_sheets.values(), ignore_index=True)
print(combined.shape)That's the core of it. Two lines after the read, and you have one unified DataFrame. The rest of this article covers the details that make this production-ready rather than just functional.
Tagging rows with their source sheet
After concatenation, you lose the information about which sheet each row came from. If that matters for your analysis β and it usually does β add a column before you concatenate.
import pandas as pd
all_sheets = pd.read_excel("sales_data.xlsx", sheet_name=None)
frames = []
for sheet_name, df in all_sheets.items():
df["source_sheet"] = sheet_name
frames.append(df)
combined = pd.concat(frames, ignore_index=True)
print(combined["source_sheet"].value_counts())Now every row carries a source_sheet label. You can filter by it, group by it, or use it purely for debugging when something looks wrong downstream.
Handling mismatched columns
Real-world Excel files are rarely pristine. One sheet might have an extra column someone added mid-year, or a column name might be spelled differently. By default, pd.concat does an outer join on columns, filling missing values with NaN.
That default behavior is usually what you want, but you should know it's happening. Check for unexpected columns after the merge:
for sheet_name, df in all_sheets.items():
print(sheet_name, df.columns.tolist())If you want to only keep columns that appear in every sheet, use an inner join instead:
combined = pd.concat(frames, join="inner", ignore_index=True)And if column names differ only by casing or whitespace, normalize them before concatenating:
for sheet_name, df in all_sheets.items():
df.columns = df.columns.str.strip().str.lower()
df["source_sheet"] = sheet_name
frames.append(df)Selecting specific sheets
Sometimes you don't want every sheet. A file might have a summary tab or a lookup table that should not be included in the row data. You have two clean options.
Option 1 β Pass a list of sheet names:
sheets_to_load = ["January", "February", "March"]
selected = pd.read_excel("sales_data.xlsx", sheet_name=sheets_to_load)Option 2 β Filter after reading all sheets:
exclude = {"Summary", "Lookup"}
frames = [
df.assign(source_sheet=name)
for name, df in all_sheets.items()
if name not in exclude
]The second approach is more flexible when you don't know the sheet names in advance and just want to skip a known set of non-data tabs.
Merging sheets across multiple Excel files
If your data lives across several files rather than several sheets in one file, the pattern is almost identical. Use Python's pathlib module to find all matching files, then loop over them.
from pathlib import Path
import pandas as pd
data_dir = Path("monthly_reports")
frames = []
for filepath in sorted(data_dir.glob("*.xlsx")):
df = pd.read_excel(filepath)
df["source_file"] = filepath.name
frames.append(df)
combined = pd.concat(frames, ignore_index=True)
print(f"Loaded {len(frames)} files, {len(combined)} rows total")Sorting the file paths with sorted() ensures a consistent order, which matters if row order is meaningful to you. The source_file column gives you the same traceability you got from source_sheet in the earlier example.
Common pitfalls and how to avoid them
Missing engine error
If you see ValueError: Missing optional dependency 'openpyxl', install it with pip install openpyxl. For .xls files, use pip install xlrd and pass engine="xlrd" explicitly.
Header rows in the wrong place
Some Excel sheets have a title in row 1 and actual column headers in row 2. Use the header argument to tell Pandas where to find the real headers:
df = pd.read_excel("report.xlsx", header=1) # zero-indexed, so row 2Empty sheets causing errors
A sheet with no data returns an empty DataFrame. That's fine for pd.concat, but if you try to access a specific column on it before concatenating you'll get a KeyError. Guard against this:
for name, df in all_sheets.items():
if df.empty:
print(f"Skipping empty sheet: {name}")
continue
frames.append(df.assign(source_sheet=name))Data type inconsistencies
A column might be stored as integers in one sheet and strings in another because someone formatted a cell differently. After concatenating, check your dtypes:
print(combined.dtypes)Use pd.to_numeric(combined["amount"], errors="coerce") to force a column to numeric, turning anything unparseable into NaN rather than crashing.
Large files and memory
For very large Excel files, loading all sheets at once can use significant memory. In those cases, read and process one sheet at a time and write intermediate results to a file, or consider converting the Excel data to CSV or a database first.
Putting it all together
Here is a complete, reusable function that handles the most common real-world scenarios:
from pathlib import Path
import pandas as pd
def load_excel_sheets(
filepath: str | Path,
exclude_sheets: set[str] | None = None,
normalize_columns: bool = True,
) -> pd.DataFrame:
"""Read all sheets from an Excel file into one DataFrame."""
filepath = Path(filepath)
exclude_sheets = exclude_sheets or set()
all_sheets = pd.read_excel(filepath, sheet_name=None)
frames = []
for name, df in all_sheets.items():
if name in exclude_sheets or df.empty:
continue
if normalize_columns:
df.columns = df.columns.str.strip().str.lower()
df["source_sheet"] = name
frames.append(df)
if not frames:
raise ValueError(f"No usable sheets found in {filepath.name}")
return pd.concat(frames, ignore_index=True)
# Usage
combined = load_excel_sheets(
"sales_data.xlsx",
exclude_sheets={"Summary", "Notes"},
)
print(combined.shape)
print(combined["source_sheet"].unique())This function is defensive about empty sheets, optionally normalizes column names, tags rows with their source, and raises a clear error instead of returning an empty DataFrame silently.
Wrapping up
Reading multiple Excel sheets into a single Pandas DataFrame is a two-step process: read with sheet_name=None to get a dictionary, then concatenate the values. Everything else is about handling the messiness of real data. Here are your concrete next steps:
- Run the basic
sheet_name=Nonepattern on your own file and inspect the resulting dictionary before concatenating. - Add a
source_sheetcolumn immediately so you can always trace a row back to where it came from. - Print column names from every sheet before merging to catch casing or naming mismatches early.
- Wrap the logic in a function like the one above so it's reusable across projects.
- If you're dealing with files bigger than a few hundred megabytes, explore reading sheet by sheet and writing to Parquet or a SQL database to avoid memory pressure.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!