Fixing Python xlsxwriter That Generates Duplicate Header Rows on Sheet Merge

June 27, 2026 9 min read 12 views

You've written a loop to combine several sheets into one consolidated sheet, and when you open the file, headers appear between every block of data. It's one of the more frustrating xlsxwriter surprises because the bug isn't in xlsxwriter itself β€” it's in the merge logic you wrote around it.

xlsxwriter has no concept of "appending" a DataFrame or a block of rows. Every write call is absolute: you give it a row number, and it writes there. If your loop keeps resetting that row number β€” or if you're letting a helper function write the header each time it runs β€” you'll get headers wherever the loop restarts.

What You'll Learn

  • Why xlsxwriter produces duplicate headers during a merge loop
  • How to write headers exactly once and skip them for subsequent data blocks
  • How to maintain a reliable row offset across multiple DataFrames
  • How to handle DataFrames whose column sets differ slightly
  • Common mistakes that silently shift rows or overwrite data

Prerequisites

  • Python 3.8 or later
  • xlsxwriter installed (pip install xlsxwriter)
  • pandas installed (pip install pandas)
  • Familiarity with pd.ExcelWriter and basic DataFrame operations

The Problem: Headers Appearing Between Every Data Block

Picture this output when you open the merged file:

Name        | Region | Sales
Alice       | North  | 1200
Bob         | North  | 980
Name        | Region | Sales   <-- duplicate header
Carla       | South  | 1500
Dave        | South  | 760
Name        | Region | Sales   <-- duplicate header
Eve         | West   | 2100

Every data block starts with a fresh header row. If you're merging ten sheets, you'll have ten header rows polluting the output. Downstream processes β€” SUM formulas, pivot tables, Power Query β€” will break on those text rows hiding in the middle of numeric data.

Why xlsxwriter Doesn't Know It's Merging

xlsxwriter is a write-once, forward-only library. It does not read existing cell contents, and it has no append mode. When you call df.to_excel(writer, sheet_name='Merged', ...), pandas asks xlsxwriter to write starting at a specific row. By default, to_excel writes the DataFrame index and column headers at that row, then writes the data rows below.

If you call to_excel in a loop without explicitly controlling the startrow parameter β€” or without suppressing the header on subsequent passes β€” each call starts at row 0 (or whatever default you've set) and re-emits the column names. xlsxwriter happily writes them because it has no awareness that another DataFrame already wrote there.

This is closely related to other positional write issues in xlsxwriter. For example, the same kind of absolute-positioning model causes problems described in fixing xlsxwriter's freeze_panes behavior on workbooks with multiple sheets β€” the library never infers context from what's already on the sheet.

The Naive Loop That Causes Duplicate Headers

Here's the pattern that most developers write first:

import pandas as pd

sheets = {
    'north': df_north,
    'south': df_south,
    'west':  df_west,
}

with pd.ExcelWriter('merged.xlsx', engine='xlsxwriter') as writer:
    for name, df in sheets.items():
        df.to_excel(writer, sheet_name='Merged')  # BUG: no startrow, no header control

Every iteration overwrites from row 0. You don't even get stacked duplicates here β€” you get only the last DataFrame's data because each call silently overwrites the previous one. If someone passes startrow naively:

    row = 0
    for name, df in sheets.items():
        df.to_excel(writer, sheet_name='Merged', startrow=row)
        row += len(df)  # BUG: forgets the header row in the offset

Now you advance the row pointer, but you never account for the header row that to_excel writes above the data. After the first DataFrame (say 5 data rows), row becomes 5. But xlsxwriter wrote 6 rows (1 header + 5 data). The second DataFrame's header lands on row 5, right on top of the last data row from the first block β€” and its own header is still emitted.

Fixing the Loop: Write Headers Once, Data Only After

The fix has two parts: write the column headers exactly once, then pass header=False on every subsequent write and maintain an accurate row offset.

import pandas as pd

sheets = {
    'north': df_north,
    'south': df_south,
    'west':  df_west,
}

with pd.ExcelWriter('merged.xlsx', engine='xlsxwriter') as writer:
    current_row = 0

    for i, (name, df) in enumerate(sheets.items()):
        write_header = (i == 0)  # only on the first iteration

        df.to_excel(
            writer,
            sheet_name='Merged',
            startrow=current_row,
            index=False,
            header=write_header,
        )

        # Advance by number of data rows plus 1 if we wrote a header
        current_row += len(df) + (1 if write_header else 0)

Breaking this down:

  • index=False β€” strips the DataFrame index so you don't get an extra unnamed column.
  • header=write_header β€” on the first iteration this is True, on all subsequent ones it's False.
  • current_row += len(df) + (1 if write_header else 0) β€” adds 1 for the header row only when one was actually written.

Run this and open the file. You'll see one clean header at the top followed by uninterrupted data rows.

Handling DataFrames With Different Column Sets

Real-world data is rarely perfectly aligned. One sheet might have an extra Discount column, another might be missing Region. If you write them as-is, the columns won't line up with the single header row you wrote on the first pass.

The safest approach is to define a canonical column list upfront and reindex every DataFrame to it before writing:

import pandas as pd

all_frames = [df_north, df_south, df_west]

# Build the union of all columns while preserving order
all_columns = list(dict.fromkeys(
    col for df in all_frames for col in df.columns
))

with pd.ExcelWriter('merged.xlsx', engine='xlsxwriter') as writer:
    current_row = 0

    for i, df in enumerate(all_frames):
        # Reindex to canonical columns; missing columns become NaN
        df_aligned = df.reindex(columns=all_columns)
        write_header = (i == 0)

        df_aligned.to_excel(
            writer,
            sheet_name='Merged',
            startrow=current_row,
            index=False,
            header=write_header,
        )

        current_row += len(df_aligned) + (1 if write_header else 0)

dict.fromkeys preserves insertion order (Python 3.7+) while deduplicating column names. reindex fills any missing column with NaN, which xlsxwriter writes as an empty cell β€” clean and honest.

If you need finer control over how missing values appear, you can combine this with the techniques in fixing xlsxwriter cell format handling on merged cells to apply a format object to the entire empty range.

Tracking the Current Row Offset Correctly

The offset calculation is where most bugs hide after the obvious duplicate-header fix. Here are the cases you need to account for:

When you include the index

If you drop index=False and let pandas write the index, the header row still counts as exactly 1 row. The index values are written inline with the data rows, not as a separate row. So the offset formula doesn't change β€” but the sheet gains a left-most column you need to account for in any column-width formatting you apply afterward.

When you add blank separator rows between blocks

Some reports benefit from a blank row between data sections. Add it explicitly to the offset:

        SEPARATOR = 1  # blank rows between blocks

        current_row += len(df_aligned) + (1 if write_header else 0)
        if i < len(all_frames) - 1:   # no separator after last block
            current_row += SEPARATOR

When you apply header formatting

After writing, you may want to apply a bold format to the header row. Access the underlying xlsxwriter worksheet object through the writer and write directly to row 0:

workbook  = writer.book
worksheet = writer.sheets['Merged']
bold      = workbook.add_format({'bold': True, 'bg_color': '#D9E1F2'})

for col_num, col_name in enumerate(all_columns):
    worksheet.write(0, col_num, col_name, bold)

Write the formatted header after the loop finishes. xlsxwriter allows overwriting a cell with a format change as long as the workbook hasn't been closed yet β€” but do it before the with block exits.

Common Pitfalls When Merging Sheets

Forgetting that startrow is zero-indexed

Row 0 is the first row in xlsxwriter. If you initialize current_row = 1 thinking you're leaving space for a header, you'll get a blank first row and then the header on row 1. Start at 0 and let the offset math handle everything.

Using the same sheet name as an existing tab

If your source DataFrames were read from an Excel file and you write them back using pd.ExcelWriter with the same file path, xlsxwriter will not preserve the original file β€” it creates a new one from scratch. Any sheets not explicitly written in your current session will be gone. This is a different problem from the duplicate-header bug, but it often surfaces at the same time. For a related write-ordering issue, see fixing Pandas to_excel overwriting all sheets except the last.

Miscounting rows when DataFrames have a MultiIndex

If your DataFrame has a MultiIndex on the columns, to_excel writes multiple header rows β€” one per level. The offset must account for df.columns.nlevels instead of a flat 1:

header_rows = df.columns.nlevels if write_header else 0
current_row += len(df) + header_rows

Assuming len(df) counts empty rows

If a source DataFrame is empty (len(df) == 0), your offset won't advance. That's correct β€” no rows were written. But if you added a separator for that block, you'll have a blank separator row with nothing before it. Guard against it:

        if len(df_aligned) == 0:
            continue  # skip empty frames entirely

Not resetting the worksheet column widths after merge

Column widths set with worksheet.set_column() apply to the entire column, not a range of rows. If you set widths inside the loop based on per-DataFrame content, each iteration overwrites the previous setting. Collect the max width for each column across all DataFrames first, then apply once after the loop.

This kind of post-loop formatting step is the same principle discussed in fixing xlsxwriter row height after autofit β€” apply dimensions after all data is written, not interleaved with writes.

Next Steps

With the duplicate-header bug fixed, here are concrete actions to round out your merge pipeline:

  1. Add column validation before the loop. Assert that every DataFrame contains at least the required columns before you start writing. Fail fast with a clear error rather than discovering misaligned data in the spreadsheet.
  2. Apply header formatting after the loop. Use writer.sheets['Merged'] to access the worksheet, then style row 0 with a bold format object for readability.
  3. Handle MultiIndex columns explicitly. If any source DataFrame uses MultiIndex columns, flatten them to strings (e.g., df.columns = ['_'.join(c) for c in df.columns]) before the merge loop to keep the offset math simple.
  4. Write a unit test for the offset logic. Create three small test DataFrames of known sizes, run your merge function, then read the output back with pd.read_excel and assert len(result) equals the sum of input lengths with no duplicate header strings in any non-header row.
  5. Consider pre-concatenating with pandas. If you don't need per-block formatting, pd.concat(all_frames, ignore_index=True) followed by a single to_excel call is simpler and less error-prone. Reserve the manual offset approach for cases where you need visual separation or different formats per block.

Frequently Asked Questions

Why does xlsxwriter keep writing the column headers every time I loop through DataFrames?

xlsxwriter has no append mode and no awareness of what's already on the sheet. When you call to_excel inside a loop without setting header=False on iterations after the first, pandas writes the column names at the top of each write block. Pass header=False on every iteration except the first to suppress this.

How do I calculate the correct startrow offset when merging multiple DataFrames with xlsxwriter?

Initialize current_row to 0, write the first DataFrame with header=True, then add len(df) + 1 to current_row (the +1 accounts for the header row). For all subsequent DataFrames written with header=False, add only len(df) to the offset.

What happens if the DataFrames I'm merging have different columns?

xlsxwriter will write whatever columns each DataFrame has, causing data to land in wrong columns relative to your single header row. Reindex every DataFrame to a canonical column list using df.reindex(columns=all_columns) before writing so missing columns become empty cells rather than misaligned data.

Can I add a blank row between each merged data block in xlsxwriter?

Yes. After advancing current_row by the DataFrame length, add a separator value (e.g., 1 or 2) to current_row before the next write call. xlsxwriter will simply skip those row numbers, leaving them blank in the output file.

Is it better to use pd.concat before writing instead of looping with xlsxwriter?

If you don't need per-block formatting or visual separators between sections, concatenating all DataFrames with pd.concat and writing once is simpler and avoids offset bugs entirely. Use the manual loop approach only when you need different formats, colors, or spacing between each data block.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.