Merging Multiple Excel Sheets into One DataFrame with Pandas

May 21, 2026 6 min read 6 views
Flat illustration of multiple colored Excel sheet tabs converging into a single unified data table on a soft blue background

You've opened an Excel file and found twelve sheets β€” January through December β€” each with the same columns but different rows. Now you need them in one place for analysis. Copy-pasting is not the answer, and neither is doing it by hand sheet by sheet inside Python with a wall of repetitive code.

Pandas gives you a clean, repeatable way to pull every sheet into a single DataFrame in fewer than ten lines. Here's how to do it correctly, including the parts most tutorials skip.

What you'll learn

  • How to read all sheets from an Excel file at once using pd.read_excel
  • How to tag each row with its source sheet name so you can trace it later
  • How to handle sheets with mismatched columns
  • How to work with multiple Excel files instead of multiple sheets in one file
  • Common errors and how to fix them

Prerequisites

You need Python 3.8 or later, and the following packages installed:

pip install pandas openpyxl

openpyxl is the engine Pandas uses to read .xlsx files. Without it, you'll get an unhelpful error about a missing engine. If your files are older .xls format, install xlrd instead.

Reading all sheets at once

The simplest approach is to pass sheet_name=None to pd.read_excel. This tells Pandas to read every sheet and return them as an ordered dictionary where each key is the sheet name and each value is a DataFrame.

import pandas as pd

all_sheets = pd.read_excel("sales_data.xlsx", sheet_name=None)
print(type(all_sheets))   # <class 'dict'>
print(list(all_sheets.keys()))  # ['January', 'February', 'March', ...]

Each value in all_sheets is a fully formed DataFrame. You can inspect any one of them with all_sheets["January"].head() before committing to a merge.

Concatenating into a single DataFrame

Once you have the dictionary, use pd.concat to stack all the DataFrames on top of each other. Pass ignore_index=True so the final index is a clean sequence instead of repeating row numbers from each sheet.

combined = pd.concat(all_sheets.values(), ignore_index=True)
print(combined.shape)

That's the core of it. Two lines after the read, and you have one unified DataFrame. The rest of this article covers the details that make this production-ready rather than just functional.

Tagging rows with their source sheet

After concatenation, you lose the information about which sheet each row came from. If that matters for your analysis β€” and it usually does β€” add a column before you concatenate.

import pandas as pd

all_sheets = pd.read_excel("sales_data.xlsx", sheet_name=None)

frames = []
for sheet_name, df in all_sheets.items():
    df["source_sheet"] = sheet_name
    frames.append(df)

combined = pd.concat(frames, ignore_index=True)
print(combined["source_sheet"].value_counts())

Now every row carries a source_sheet label. You can filter by it, group by it, or use it purely for debugging when something looks wrong downstream.

Handling mismatched columns

Real-world Excel files are rarely pristine. One sheet might have an extra column someone added mid-year, or a column name might be spelled differently. By default, pd.concat does an outer join on columns, filling missing values with NaN.

That default behavior is usually what you want, but you should know it's happening. Check for unexpected columns after the merge:

for sheet_name, df in all_sheets.items():
    print(sheet_name, df.columns.tolist())

If you want to only keep columns that appear in every sheet, use an inner join instead:

combined = pd.concat(frames, join="inner", ignore_index=True)

And if column names differ only by casing or whitespace, normalize them before concatenating:

for sheet_name, df in all_sheets.items():
    df.columns = df.columns.str.strip().str.lower()
    df["source_sheet"] = sheet_name
    frames.append(df)

Selecting specific sheets

Sometimes you don't want every sheet. A file might have a summary tab or a lookup table that should not be included in the row data. You have two clean options.

Option 1 β€” Pass a list of sheet names:

sheets_to_load = ["January", "February", "March"]
selected = pd.read_excel("sales_data.xlsx", sheet_name=sheets_to_load)

Option 2 β€” Filter after reading all sheets:

exclude = {"Summary", "Lookup"}
frames = [
    df.assign(source_sheet=name)
    for name, df in all_sheets.items()
    if name not in exclude
]

The second approach is more flexible when you don't know the sheet names in advance and just want to skip a known set of non-data tabs.

Merging sheets across multiple Excel files

If your data lives across several files rather than several sheets in one file, the pattern is almost identical. Use Python's pathlib module to find all matching files, then loop over them.

from pathlib import Path
import pandas as pd

data_dir = Path("monthly_reports")
frames = []

for filepath in sorted(data_dir.glob("*.xlsx")):
    df = pd.read_excel(filepath)
    df["source_file"] = filepath.name
    frames.append(df)

combined = pd.concat(frames, ignore_index=True)
print(f"Loaded {len(frames)} files, {len(combined)} rows total")

Sorting the file paths with sorted() ensures a consistent order, which matters if row order is meaningful to you. The source_file column gives you the same traceability you got from source_sheet in the earlier example.

Common pitfalls and how to avoid them

Missing engine error

If you see ValueError: Missing optional dependency 'openpyxl', install it with pip install openpyxl. For .xls files, use pip install xlrd and pass engine="xlrd" explicitly.

Header rows in the wrong place

Some Excel sheets have a title in row 1 and actual column headers in row 2. Use the header argument to tell Pandas where to find the real headers:

df = pd.read_excel("report.xlsx", header=1)  # zero-indexed, so row 2

Empty sheets causing errors

A sheet with no data returns an empty DataFrame. That's fine for pd.concat, but if you try to access a specific column on it before concatenating you'll get a KeyError. Guard against this:

for name, df in all_sheets.items():
    if df.empty:
        print(f"Skipping empty sheet: {name}")
        continue
    frames.append(df.assign(source_sheet=name))

Data type inconsistencies

A column might be stored as integers in one sheet and strings in another because someone formatted a cell differently. After concatenating, check your dtypes:

print(combined.dtypes)

Use pd.to_numeric(combined["amount"], errors="coerce") to force a column to numeric, turning anything unparseable into NaN rather than crashing.

Large files and memory

For very large Excel files, loading all sheets at once can use significant memory. In those cases, read and process one sheet at a time and write intermediate results to a file, or consider converting the Excel data to CSV or a database first.

Putting it all together

Here is a complete, reusable function that handles the most common real-world scenarios:

from pathlib import Path
import pandas as pd

def load_excel_sheets(
    filepath: str | Path,
    exclude_sheets: set[str] | None = None,
    normalize_columns: bool = True,
) -> pd.DataFrame:
    """Read all sheets from an Excel file into one DataFrame."""
    filepath = Path(filepath)
    exclude_sheets = exclude_sheets or set()

    all_sheets = pd.read_excel(filepath, sheet_name=None)
    frames = []

    for name, df in all_sheets.items():
        if name in exclude_sheets or df.empty:
            continue
        if normalize_columns:
            df.columns = df.columns.str.strip().str.lower()
        df["source_sheet"] = name
        frames.append(df)

    if not frames:
        raise ValueError(f"No usable sheets found in {filepath.name}")

    return pd.concat(frames, ignore_index=True)


# Usage
combined = load_excel_sheets(
    "sales_data.xlsx",
    exclude_sheets={"Summary", "Notes"},
)
print(combined.shape)
print(combined["source_sheet"].unique())

This function is defensive about empty sheets, optionally normalizes column names, tags rows with their source, and raises a clear error instead of returning an empty DataFrame silently.

Wrapping up

Reading multiple Excel sheets into a single Pandas DataFrame is a two-step process: read with sheet_name=None to get a dictionary, then concatenate the values. Everything else is about handling the messiness of real data. Here are your concrete next steps:

  • Run the basic sheet_name=None pattern on your own file and inspect the resulting dictionary before concatenating.
  • Add a source_sheet column immediately so you can always trace a row back to where it came from.
  • Print column names from every sheet before merging to catch casing or naming mismatches early.
  • Wrap the logic in a function like the one above so it's reusable across projects.
  • If you're dealing with files bigger than a few hundred megabytes, explore reading sheet by sheet and writing to Parquet or a SQL database to avoid memory pressure.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.