Fixing Python Pandas DataFrame That Exports Wrong Column Order to Excel

June 23, 2026 8 min read 4 views

You build a clean DataFrame, check df.columns in your terminal, and everything looks right. Then you open the Excel file and the columns are shuffled into a completely different order. This is one of those bugs that wastes time because the data itself is correct β€” only the layout is wrong, which is exactly the kind of thing a stakeholder will notice immediately.

The fix is usually one line, but understanding why it happens will stop you from chasing the same bug next time.

What You'll Learn

  • Why Pandas sometimes exports columns in an unexpected order
  • Four practical methods to lock in the column order you want
  • How merges and dict-based DataFrames silently reorder columns
  • Common pitfalls that cause the fix to stop working

The Problem: Your DataFrame Columns Land in the Wrong Order

Imagine you're exporting a sales report. You want the columns in this order: date, region, product, units, revenue. But the Excel file comes out with revenue first, or product buried at the end. Here's the kind of code that causes it:

import pandas as pd

data = {
    "revenue": [1200, 850, 2100],
    "product": ["Widget A", "Widget B", "Widget C"],
    "region": ["North", "South", "East"],
    "units": [12, 8, 21],
    "date": ["2024-01-01", "2024-01-02", "2024-01-03"],
}

df = pd.DataFrame(data)
df.to_excel("sales_report.xlsx", index=False)

The Excel file will have columns in dictionary insertion order: revenue, product, region, units, date. Not what you wanted.

Why Column Order Goes Wrong in Pandas

A Pandas DataFrame stores columns in the order they were added. When you construct a DataFrame from a Python dictionary, the column order follows dictionary insertion order (guaranteed since Python 3.7). That sounds predictable, but several common operations silently change it.

  • Dict construction: Column order mirrors the key order in the source dict, which may not match your intended report layout.
  • merge() and join(): Left columns come first, then right columns β€” but that order depends on which DataFrame is left and which is right.
  • groupby() aggregation: The resulting column order depends on the aggregation spec, not the original DataFrame order.
  • concat(): Columns from both DataFrames are union-merged alphabetically when they don't perfectly align.
  • Reading from a database or API: The source system may return columns in an arbitrary order that changes between runs.

None of these are bugs in Pandas. They're predictable behaviors β€” you just need to assert the column order explicitly before exporting.

Method 1: Reorder Columns With a List Slice

The most direct approach is to reassign df.columns order using a list of column names as an indexer. This creates a new DataFrame (or a view) with columns in exactly the sequence you specify.

import pandas as pd

data = {
    "revenue": [1200, 850, 2100],
    "product": ["Widget A", "Widget B", "Widget C"],
    "region": ["North", "South", "East"],
    "units": [12, 8, 21],
    "date": ["2024-01-01", "2024-01-02", "2024-01-03"],
}

df = pd.DataFrame(data)

column_order = ["date", "region", "product", "units", "revenue"]
df = df[column_order]

df.to_excel("sales_report.xlsx", index=False)

This works because df[list_of_columns] returns a new DataFrame with only those columns, in that order. If a column name in your list doesn't exist in df, Pandas raises a KeyError immediately β€” which is actually useful as an early sanity check.

Method 2: Use the columns Parameter in to_excel

You can pass a columns argument directly to to_excel to control which columns are exported and in what order. This avoids mutating the original DataFrame, which is handy if you're exporting a subset while keeping the full frame in memory.

import pandas as pd

df = pd.DataFrame({
    "revenue": [1200, 850, 2100],
    "product": ["Widget A", "Widget B", "Widget C"],
    "region": ["North", "South", "East"],
    "units": [12, 8, 21],
    "date": ["2024-01-01", "2024-01-02", "2024-01-03"],
})

df.to_excel(
    "sales_report.xlsx",
    index=False,
    columns=["date", "region", "product", "units", "revenue"],
)

The columns parameter accepts any ordered iterable of column names. Columns not in the list are silently omitted from the export, so this doubles as a filter. If you include a name that doesn't exist, Pandas raises a KeyError.

If you're writing to multiple sheets using an ExcelWriter β€” a pattern covered in detail in the guide on fixing Pandas to_excel that overwrites all sheets except the last β€” you can pass the columns argument to each df.to_excel(writer, ...) call independently for different sheets.

Method 3: Use DataFrame.reindex to Set a Canonical Order

DataFrame.reindex is the most explicit and safest option when you're working with DataFrames that might be missing columns at runtime. Instead of raising a KeyError for missing columns, it inserts them as NaN by default β€” which you can then handle explicitly.

import pandas as pd

df = pd.DataFrame({
    "revenue": [1200, 850, 2100],
    "product": ["Widget A", "Widget B", "Widget C"],
    "region": ["North", "South", "East"],
    "units": [12, 8, 21],
    # Note: 'date' column is missing from this DataFrame
})

canonical_columns = ["date", "region", "product", "units", "revenue"]
df = df.reindex(columns=canonical_columns)

print(df.columns.tolist())
# ['date', 'region', 'product', 'units', 'revenue']
# 'date' will be a column of NaN values

df.to_excel("sales_report.xlsx", index=False)

This pattern is useful in ETL pipelines where the source might not always provide every column. You define the schema upfront and let reindex enforce it, then fill or validate the NaN columns before export.

Method 4: Sorting Columns Alphabetically or by a Custom Key

Sometimes you don't have a hardcoded column list β€” you're working with a dynamic set of columns and just need a consistent, reproducible order. Sorting alphabetically is the simplest deterministic approach.

import pandas as pd

df = pd.DataFrame({
    "revenue": [1200, 850],
    "product": ["Widget A", "Widget B"],
    "region": ["North", "South"],
    "units": [12, 8],
    "date": ["2024-01-01", "2024-01-02"],
})

# Sort columns alphabetically
df = df.reindex(sorted(df.columns), axis=1)

df.to_excel("sales_report.xlsx", index=False)
# Columns will be: date, product, region, revenue, units

For a custom sort order that isn't fully alphabetical, use a priority mapping:

priority = {"date": 0, "region": 1, "product": 2, "units": 3, "revenue": 4}
df = df.reindex(sorted(df.columns, key=lambda c: priority.get(c, 999)), axis=1)

Columns not in the priority dict get a default sort key of 999, so they fall to the end in their original relative order. This is a clean pattern for reports with a known set of required columns and a variable set of optional ones.

Handling Column Order When Merging DataFrames

Merge operations are a common source of unexpected column ordering. When you call pd.merge(), the result puts left-DataFrame columns first, then right-DataFrame columns β€” but the key column appears only once, and its position depends on the merge type.

import pandas as pd

orders = pd.DataFrame({
    "order_id": [1, 2, 3],
    "product": ["Widget A", "Widget B", "Widget C"],
    "units": [12, 8, 21],
})

pricing = pd.DataFrame({
    "product": ["Widget A", "Widget B", "Widget C"],
    "unit_price": [100, 106, 100],
    "region": ["North", "South", "East"],
})

merged = pd.merge(orders, pricing, on="product")
print(merged.columns.tolist())
# ['order_id', 'product', 'units', 'unit_price', 'region']
# 'unit_price' and 'region' are appended at the end

# Apply your canonical order after the merge
final_columns = ["order_id", "region", "product", "units", "unit_price"]
merged = merged[final_columns]
merged.to_excel("orders_report.xlsx", index=False)

Always apply column reordering after all merges and transformations, not before. Doing it early gets undone by the next merge. For more on merge-related surprises, see how non-unique keys cause duplicate rows in Pandas merge β€” a related issue that often surfaces at the same time as column order problems.

Common Pitfalls and Gotchas

You reorder columns but the fix gets lost

If you assign df = df[column_order] and then call another operation that returns a new DataFrame β€” like groupby().agg(), merge(), or pivot_table() β€” the new DataFrame won't inherit your ordering. Always apply the final column reorder as the last step before to_excel.

MultiIndex columns cause silent reordering

If your DataFrame has a MultiIndex on the columns (common after certain groupby or pivot_table calls), simple list indexing may not behave as expected. Flatten the column MultiIndex first with df.columns = ['_'.join(col).strip() for col in df.columns.values], then reorder.

The columns parameter silently drops columns

When you pass columns=[...] to to_excel, any column not in the list is dropped from the export without any warning. If you're debugging a missing column issue, check whether you accidentally excluded it here before digging deeper.

Index column shifts everything

By default, to_excel writes the DataFrame index as the first column. If you set index=True (the default), the index appears before all your columns β€” which can make it look like column ordering is wrong when the real issue is an unexpected index. Use index=False unless you specifically need it, or handle it explicitly with openpyxl when you need more control over the final workbook layout.

Reading back and comparing

If you're verifying the export by reading it back with pd.read_excel, remember that read_excel returns columns in the order they appear in the file. So a quick round-trip check is a reliable way to confirm your fix worked:

verification = pd.read_excel("sales_report.xlsx")
print(verification.columns.tolist())
# Should print: ['date', 'region', 'product', 'units', 'revenue']

When you're building pipelines that also interact with databases, you might run into similar ordering and type surprises β€” for example, unexpected byte strings returned from SQLite TEXT columns that then land in your DataFrame before export.

Wrapping Up

Column order in a Pandas DataFrame is controlled by insertion order, not by any automatic sorting or schema enforcement. The fix is always the same shape: define your desired column order explicitly and apply it as the final step before exporting.

Here are the concrete next steps to take:

  1. Audit your pipeline: Find the last transformation before to_excel and add a column reorder there using df = df[column_order] or df.reindex(columns=column_order).
  2. Define a schema constant: Store your canonical column list as a module-level constant or config value so it's easy to update and reuse across multiple exports.
  3. Add a round-trip assertion: In any automated pipeline, read the file back after writing and assert list(df.columns) == expected_columns to catch regressions early.
  4. Use reindex for flexible pipelines: If your column set is dynamic, prefer reindex over list slicing so missing columns surface as NaN rather than a crash.
  5. Apply ordering after every merge: Make column reordering a habit at the end of merge chains β€” don't assume earlier ordering survives downstream operations.

Frequently Asked Questions

Why does Pandas export DataFrame columns to Excel in a different order than I defined them?

Pandas preserves column insertion order from however the DataFrame was constructed β€” typically dict key order, merge output order, or aggregation output order. If that doesn't match your intended report layout, you need to reorder the columns explicitly before calling to_excel.

How do I export only specific columns to Excel in a custom order using Pandas?

Pass a list of column names to the columns parameter of to_excel, like df.to_excel('file.xlsx', columns=['date', 'region', 'revenue'], index=False). This both selects and orders the columns in one step without modifying the original DataFrame.

What is the safest way to reorder DataFrame columns when some columns might be missing?

Use df.reindex(columns=your_column_list) instead of list slicing. If a column in your list doesn't exist in the DataFrame, reindex inserts it as a NaN column rather than raising a KeyError, giving you a chance to handle the missing data explicitly.

Does applying column reordering before a merge or groupby preserve the order in the final output?

No. Operations like merge(), groupby().agg(), and pivot_table() return new DataFrames with their own column order. Always apply your column reordering as the last step before to_excel, after all transformations are complete.

How can I verify that my Pandas Excel export has the correct column order?

Read the file back with pd.read_excel and check the columns attribute: list(pd.read_excel('file.xlsx').columns). This confirms the physical column order in the file matches your expected schema.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.