Fixing Python xlsxwriter That Generates Duplicate Header Rows on Sheet Merge
You've written a loop to combine several sheets into one consolidated sheet, and when you open the file, headers appear between every block of data. It's one of the more frustrating xlsxwriter surprises because the bug isn't in xlsxwriter itself β it's in the merge logic you wrote around it.
xlsxwriter has no concept of "appending" a DataFrame or a block of rows. Every write call is absolute: you give it a row number, and it writes there. If your loop keeps resetting that row number β or if you're letting a helper function write the header each time it runs β you'll get headers wherever the loop restarts.
What You'll Learn
- Why xlsxwriter produces duplicate headers during a merge loop
- How to write headers exactly once and skip them for subsequent data blocks
- How to maintain a reliable row offset across multiple DataFrames
- How to handle DataFrames whose column sets differ slightly
- Common mistakes that silently shift rows or overwrite data
Prerequisites
- Python 3.8 or later
xlsxwriterinstalled (pip install xlsxwriter)pandasinstalled (pip install pandas)- Familiarity with
pd.ExcelWriterand basic DataFrame operations
The Problem: Headers Appearing Between Every Data Block
Picture this output when you open the merged file:
Name | Region | Sales
Alice | North | 1200
Bob | North | 980
Name | Region | Sales <-- duplicate header
Carla | South | 1500
Dave | South | 760
Name | Region | Sales <-- duplicate header
Eve | West | 2100
Every data block starts with a fresh header row. If you're merging ten sheets, you'll have ten header rows polluting the output. Downstream processes β SUM formulas, pivot tables, Power Query β will break on those text rows hiding in the middle of numeric data.
Why xlsxwriter Doesn't Know It's Merging
xlsxwriter is a write-once, forward-only library. It does not read existing cell contents, and it has no append mode. When you call df.to_excel(writer, sheet_name='Merged', ...), pandas asks xlsxwriter to write starting at a specific row. By default, to_excel writes the DataFrame index and column headers at that row, then writes the data rows below.
If you call to_excel in a loop without explicitly controlling the startrow parameter β or without suppressing the header on subsequent passes β each call starts at row 0 (or whatever default you've set) and re-emits the column names. xlsxwriter happily writes them because it has no awareness that another DataFrame already wrote there.
This is closely related to other positional write issues in xlsxwriter. For example, the same kind of absolute-positioning model causes problems described in fixing xlsxwriter's freeze_panes behavior on workbooks with multiple sheets β the library never infers context from what's already on the sheet.
The Naive Loop That Causes Duplicate Headers
Here's the pattern that most developers write first:
import pandas as pd
sheets = {
'north': df_north,
'south': df_south,
'west': df_west,
}
with pd.ExcelWriter('merged.xlsx', engine='xlsxwriter') as writer:
for name, df in sheets.items():
df.to_excel(writer, sheet_name='Merged') # BUG: no startrow, no header control
Every iteration overwrites from row 0. You don't even get stacked duplicates here β you get only the last DataFrame's data because each call silently overwrites the previous one. If someone passes startrow naively:
row = 0
for name, df in sheets.items():
df.to_excel(writer, sheet_name='Merged', startrow=row)
row += len(df) # BUG: forgets the header row in the offset
Now you advance the row pointer, but you never account for the header row that to_excel writes above the data. After the first DataFrame (say 5 data rows), row becomes 5. But xlsxwriter wrote 6 rows (1 header + 5 data). The second DataFrame's header lands on row 5, right on top of the last data row from the first block β and its own header is still emitted.
Fixing the Loop: Write Headers Once, Data Only After
The fix has two parts: write the column headers exactly once, then pass header=False on every subsequent write and maintain an accurate row offset.
import pandas as pd
sheets = {
'north': df_north,
'south': df_south,
'west': df_west,
}
with pd.ExcelWriter('merged.xlsx', engine='xlsxwriter') as writer:
current_row = 0
for i, (name, df) in enumerate(sheets.items()):
write_header = (i == 0) # only on the first iteration
df.to_excel(
writer,
sheet_name='Merged',
startrow=current_row,
index=False,
header=write_header,
)
# Advance by number of data rows plus 1 if we wrote a header
current_row += len(df) + (1 if write_header else 0)
Breaking this down:
index=Falseβ strips the DataFrame index so you don't get an extra unnamed column.header=write_headerβ on the first iteration this isTrue, on all subsequent ones it'sFalse.current_row += len(df) + (1 if write_header else 0)β adds 1 for the header row only when one was actually written.
Run this and open the file. You'll see one clean header at the top followed by uninterrupted data rows.
Handling DataFrames With Different Column Sets
Real-world data is rarely perfectly aligned. One sheet might have an extra Discount column, another might be missing Region. If you write them as-is, the columns won't line up with the single header row you wrote on the first pass.
The safest approach is to define a canonical column list upfront and reindex every DataFrame to it before writing:
import pandas as pd
all_frames = [df_north, df_south, df_west]
# Build the union of all columns while preserving order
all_columns = list(dict.fromkeys(
col for df in all_frames for col in df.columns
))
with pd.ExcelWriter('merged.xlsx', engine='xlsxwriter') as writer:
current_row = 0
for i, df in enumerate(all_frames):
# Reindex to canonical columns; missing columns become NaN
df_aligned = df.reindex(columns=all_columns)
write_header = (i == 0)
df_aligned.to_excel(
writer,
sheet_name='Merged',
startrow=current_row,
index=False,
header=write_header,
)
current_row += len(df_aligned) + (1 if write_header else 0)
dict.fromkeys preserves insertion order (Python 3.7+) while deduplicating column names. reindex fills any missing column with NaN, which xlsxwriter writes as an empty cell β clean and honest.
If you need finer control over how missing values appear, you can combine this with the techniques in fixing xlsxwriter cell format handling on merged cells to apply a format object to the entire empty range.
Tracking the Current Row Offset Correctly
The offset calculation is where most bugs hide after the obvious duplicate-header fix. Here are the cases you need to account for:
When you include the index
If you drop index=False and let pandas write the index, the header row still counts as exactly 1 row. The index values are written inline with the data rows, not as a separate row. So the offset formula doesn't change β but the sheet gains a left-most column you need to account for in any column-width formatting you apply afterward.
When you add blank separator rows between blocks
Some reports benefit from a blank row between data sections. Add it explicitly to the offset:
SEPARATOR = 1 # blank rows between blocks
current_row += len(df_aligned) + (1 if write_header else 0)
if i < len(all_frames) - 1: # no separator after last block
current_row += SEPARATOR
When you apply header formatting
After writing, you may want to apply a bold format to the header row. Access the underlying xlsxwriter worksheet object through the writer and write directly to row 0:
workbook = writer.book
worksheet = writer.sheets['Merged']
bold = workbook.add_format({'bold': True, 'bg_color': '#D9E1F2'})
for col_num, col_name in enumerate(all_columns):
worksheet.write(0, col_num, col_name, bold)
Write the formatted header after the loop finishes. xlsxwriter allows overwriting a cell with a format change as long as the workbook hasn't been closed yet β but do it before the with block exits.
Common Pitfalls When Merging Sheets
Forgetting that startrow is zero-indexed
Row 0 is the first row in xlsxwriter. If you initialize current_row = 1 thinking you're leaving space for a header, you'll get a blank first row and then the header on row 1. Start at 0 and let the offset math handle everything.
Using the same sheet name as an existing tab
If your source DataFrames were read from an Excel file and you write them back using pd.ExcelWriter with the same file path, xlsxwriter will not preserve the original file β it creates a new one from scratch. Any sheets not explicitly written in your current session will be gone. This is a different problem from the duplicate-header bug, but it often surfaces at the same time. For a related write-ordering issue, see fixing Pandas to_excel overwriting all sheets except the last.
Miscounting rows when DataFrames have a MultiIndex
If your DataFrame has a MultiIndex on the columns, to_excel writes multiple header rows β one per level. The offset must account for df.columns.nlevels instead of a flat 1:
header_rows = df.columns.nlevels if write_header else 0
current_row += len(df) + header_rows
Assuming len(df) counts empty rows
If a source DataFrame is empty (len(df) == 0), your offset won't advance. That's correct β no rows were written. But if you added a separator for that block, you'll have a blank separator row with nothing before it. Guard against it:
if len(df_aligned) == 0:
continue # skip empty frames entirely
Not resetting the worksheet column widths after merge
Column widths set with worksheet.set_column() apply to the entire column, not a range of rows. If you set widths inside the loop based on per-DataFrame content, each iteration overwrites the previous setting. Collect the max width for each column across all DataFrames first, then apply once after the loop.
This kind of post-loop formatting step is the same principle discussed in fixing xlsxwriter row height after autofit β apply dimensions after all data is written, not interleaved with writes.
Next Steps
With the duplicate-header bug fixed, here are concrete actions to round out your merge pipeline:
- Add column validation before the loop. Assert that every DataFrame contains at least the required columns before you start writing. Fail fast with a clear error rather than discovering misaligned data in the spreadsheet.
- Apply header formatting after the loop. Use
writer.sheets['Merged']to access the worksheet, then style row 0 with a bold format object for readability. - Handle MultiIndex columns explicitly. If any source DataFrame uses MultiIndex columns, flatten them to strings (e.g.,
df.columns = ['_'.join(c) for c in df.columns]) before the merge loop to keep the offset math simple. - Write a unit test for the offset logic. Create three small test DataFrames of known sizes, run your merge function, then read the output back with
pd.read_exceland assertlen(result)equals the sum of input lengths with no duplicate header strings in any non-header row. - Consider pre-concatenating with pandas. If you don't need per-block formatting,
pd.concat(all_frames, ignore_index=True)followed by a singleto_excelcall is simpler and less error-prone. Reserve the manual offset approach for cases where you need visual separation or different formats per block.
Frequently Asked Questions
Why does xlsxwriter keep writing the column headers every time I loop through DataFrames?
xlsxwriter has no append mode and no awareness of what's already on the sheet. When you call to_excel inside a loop without setting header=False on iterations after the first, pandas writes the column names at the top of each write block. Pass header=False on every iteration except the first to suppress this.
How do I calculate the correct startrow offset when merging multiple DataFrames with xlsxwriter?
Initialize current_row to 0, write the first DataFrame with header=True, then add len(df) + 1 to current_row (the +1 accounts for the header row). For all subsequent DataFrames written with header=False, add only len(df) to the offset.
What happens if the DataFrames I'm merging have different columns?
xlsxwriter will write whatever columns each DataFrame has, causing data to land in wrong columns relative to your single header row. Reindex every DataFrame to a canonical column list using df.reindex(columns=all_columns) before writing so missing columns become empty cells rather than misaligned data.
Can I add a blank row between each merged data block in xlsxwriter?
Yes. After advancing current_row by the DataFrame length, add a separator value (e.g., 1 or 2) to current_row before the next write call. xlsxwriter will simply skip those row numbers, leaving them blank in the output file.
Is it better to use pd.concat before writing instead of looping with xlsxwriter?
If you don't need per-block formatting or visual separators between sections, concatenating all DataFrames with pd.concat and writing once is simpler and avoids offset bugs entirely. Use the manual loop approach only when you need different formats, colors, or spacing between each data block.
π€ Share this article
Sign in to saveRelated Articles
How-To Guides
Fixing PostgreSQL JSONB Query That Ignores Index and Falls Back to Seq Scan
9m read
How-To Guides
Fixing PostgreSQL LATERAL JOIN That Returns No Rows When Subquery References Outer Column
10m read
How-To Guides
Fixing PostgreSQL UPDATE FROM That Silently Modifies Wrong Rows on Join
9m read
Comments (0)
No comments yet. Be the first!