Fixing Python openpyxl That Reads Excel Formulas Instead of Values
You open a workbook with openpyxl, read a cell, and get back =SUM(B2:B50) β a raw formula string β when you expected a number. Your downstream code breaks, your pipeline silently produces garbage, and the fix is not obvious from the docs alone.
This happens because of how Excel separates formula definitions from calculated results, and openpyxl's default behavior only gives you one of those two things. Once you understand the split, every fix in this article will make immediate sense.
What You'll Learn
- Why openpyxl returns formula strings by default and what "cached values" are
- How to use the
data_only=Trueparameter correctly - Why
data_onlycan returnNoneand how to work around it - Alternative libraries (Pandas, xlrd, xlwings) for cases where openpyxl falls short
- A helper pattern to detect formula cells at runtime and handle them gracefully
Prerequisites
- Python 3.8 or later
- openpyxl 3.x installed (
pip install openpyxl) - A basic understanding of reading cells with openpyxl (
ws['A1'].value) - Optional: pandas, xlrd 1.2, xlwings (covered in the relevant sections)
How Excel Stores Formulas and Cached Values
An Excel .xlsx file is a ZIP archive containing XML files. Each cell that holds a formula stores two things in that XML: the formula text (e.g., =SUM(B2:B50)) and the last calculated result, called the cached value. Excel writes the cached value when you save the file so that other applications can read data without needing a full calculation engine.
The catch is that the cached value is only as fresh as the last save. If a user edits a cell, closes without saving, or saves with "recalculate on open" disabled, the cache goes stale. This is the root of almost every problem this article covers.
openpyxl, by design, is a pure-Python reader and writer. It does not run Excel's calculation engine. It can only read whatever is in the XML β either the formula string or the cached result.
The data_only Parameter: The First Fix to Try
When you load a workbook without any flags, openpyxl defaults to returning formula strings for formula cells. Switching to cached values takes a single argument:
import openpyxl
# Default: returns formula strings
wb_formula = openpyxl.load_workbook("report.xlsx")
ws = wb_formula.active
print(ws["B51"].value) # =SUM(B2:B50)
# data_only=True: returns the cached calculated value
wb_values = openpyxl.load_workbook("report.xlsx", data_only=True)
ws = wb_values.active
print(ws["B51"].value) # 4823.75
This is the correct first step for any workbook that was last saved by Excel (desktop or online). The formula string disappears, and you get back the number, date, or string that Excel calculated.
Important: you cannot mix modes in the same workbook object. Load once with data_only=True when you want values, and load again without it when you need the formula text. Trying to toggle the flag after load does nothing.
When data_only Returns None Instead of the Value
A frustratingly common follow-up problem: you add data_only=True, but now the cell returns None instead of the formula string or the value. This happens when the cached value was never written into the file β which occurs in three typical scenarios:
- The file was created or last written by openpyxl itself. openpyxl writes formula strings but never populates the cache, so there is nothing for
data_onlyto read. - The file was saved by a non-Excel tool (LibreOffice in some configurations, Google Sheets exports, certain reporting APIs) that skips the cache.
- The user saved with manual calculation mode and the sheet was not recalculated before saving.
You can verify this quickly. Open the .xlsx as a ZIP, look inside xl/worksheets/sheet1.xml, and find your formula cell. A cached value looks like this:
<c r="B51" t="n">
<f>SUM(B2:B50)</f>
<v>4823.75</v> <!-- this is the cached value -->
</c>
If the <v> element is missing entirely, data_only=True will return None. The cache simply is not there.
Fix: Use Pandas with openpyxl as the Engine
Pandas' read_excel uses openpyxl under the hood for .xlsx files, and it passes data_only=True automatically. For many workflows this is the cleanest path because you get a DataFrame directly, without iterating cells manually.
import pandas as pd
df = pd.read_excel("report.xlsx", sheet_name="Summary", engine="openpyxl")
print(df.head())
Pandas still relies on the cached values in the file, so if the cache is stale or absent you will see NaN instead of a number. However, for files that were properly saved by Excel, this one-liner is reliable and keeps your pipeline simple.
If you are already dealing with misread date columns after loading, the article on fixing Pandas silently misreading date columns as strings covers the parse_dates and dtype tricks that apply here too.
Fix: Use xlrd for Older .xls Files
If the problematic file is a legacy .xls format (Excel 97β2003), openpyxl cannot read it at all. Use xlrd 1.2.x, which reads .xls files and always returns evaluated values because the binary format stores results differently:
import xlrd
wb = xlrd.open_workbook("legacy_report.xls")
ws = wb.sheet_by_name("Summary")
print(ws.cell_value(50, 1)) # row 50, col 1 β always the computed value
Note that xlrd 2.x dropped .xls support and cannot read .xlsx at all, so pin your dependency to xlrd==1.2.0 if you are working with old files. For .xlsx files, stick to openpyxl or Pandas.
Fix: Use xlwings to Force Excel to Recalculate
When the cached values are stale or absent and you need the real computed result, xlwings is the most robust option. It automates the actual Excel application (or Excel for Mac) to open the file, trigger recalculation, and read the live values. This requires Excel to be installed on the machine.
import xlwings as xw
app = xw.App(visible=False) # run Excel headlessly
wb = app.books.open("report.xlsx")
ws = wb.sheets["Summary"]
# Force full recalculation
app.calculate()
value = ws.range("B51").value
print(value) # live, freshly computed result
wb.close()
app.quit()
This approach works even when the cache was never written, because Excel opens the file, calculates every formula, and xlwings reads from the live application state β not the XML cache. The tradeoff is that it requires a licensed Excel installation and is significantly slower than a pure-Python reader.
If your Python scripts are already running in environments where network or resource reliability is a concern, see the guide on fixing Python requests that hang indefinitely for patterns on adding timeouts to long-running calls β useful if you are invoking xlwings in a pipeline with a deadline.
Fix: Write a Helper to Detect and Skip Formula Cells
Sometimes you need to process a workbook without data_only=True because you also want to inspect formulas in some cells. A lightweight helper lets you decide at read time whether a cell is a formula and handle it explicitly:
import openpyxl
def safe_cell_value(cell):
"""Return the cell value, or None with a warning if the cell holds a formula."""
if isinstance(cell.value, str) and cell.value.startswith("="):
print(f"Warning: cell {cell.coordinate} contains a formula: {cell.value}")
return None
return cell.value
wb = openpyxl.load_workbook("report.xlsx") # no data_only
ws = wb.active
for row in ws.iter_rows(min_row=2, max_row=ws.max_row, min_col=1, max_col=5):
for cell in row:
val = safe_cell_value(cell)
if val is not None:
print(cell.coordinate, val)
This pattern is useful when you own the file generation pipeline and need to audit which cells are formula-driven before deciding how to handle them downstream. You can extend the helper to log formulas to a separate list for later inspection rather than silently dropping them.
For a deeper look at how missing rows or unexpected None values slip through without raising errors in data pipelines, the article on debugging silent row loss in a Pandas merge covers related defensive patterns worth adopting.
Common Pitfalls and Gotchas
Saving with openpyxl breaks the cache for future reads
If you load a workbook, make changes, and save it with openpyxl, the library strips out cached values for any cell it touches. The next person (or script) who opens that file with data_only=True will see None for those cells. If you need to preserve computed values after a write, either write only to non-formula cells or open the file in Excel once to let it recalculate and resave.
Named ranges and defined names return formula strings
Accessing a cell through a named range in openpyxl does not automatically resolve to the cached value even with data_only=True in some versions. Always verify by reading the cell directly via its coordinate (ws['B51']) rather than through the named range object.
Array formulas need special handling
Legacy array formulas (entered with Ctrl+Shift+Enter) are stored differently in the XML. openpyxl may return the formula wrapped in curly braces: {=SUMPRODUCT(...)}. The data_only=True fix applies the same way, but be aware that the cache for array formulas is sometimes split across multiple cells and may read back as a single scalar or as None depending on the Excel version that last saved the file.
read_only mode ignores data_only
If you load with read_only=True for performance on large files, note that read_only=True and data_only=True can be combined, but the behavior in edge cases (merged cells, array formulas) is less predictable than a standard load. Test on a representative sample of your actual files before deploying to production.
Merged cells can mask formula results
In a merged cell range, only the top-left cell holds the value or formula. All other cells in the merge return None. If your iteration is hitting the non-anchor cells, you may think the formula is missing when the data is actually in the first cell of the merge. Use ws.merged_cells to inspect the merge map if you suspect this is happening. For more on Excel formula debugging patterns, the article on fixing SUMIF that returns zero when criteria look correct shows how Excel's formula evaluation quirks surface in unexpected ways.
Next Steps
Here is a quick decision tree to help you pick the right fix and move on:
- Try
data_only=Truefirst. If it returns real values, you are done. This covers the majority of files saved by Excel. - If you get
None, inspect the XML cache. If the<v>element is missing, the file was generated programmatically or with a non-Excel tool. - For programmatically generated files, switch to Pandas
read_excelor consider rewriting your generation step to compute values before writing them as plain numbers instead of formulas. - For legacy
.xlsfiles, use xlrd 1.2.x. - When you absolutely need live recalculated values and Excel is available, use xlwings with
app.calculate().
If your broader pipeline involves reading and transforming data from multiple Excel sheets into DataFrames, pairing the fixes above with the defensive patterns in the article on fixing Pandas GroupBy silently ignoring NaN values will save you from a whole class of silent data errors further downstream.
Frequently Asked Questions
Why does openpyxl return the formula string instead of the calculated value?
openpyxl reads formula strings by default because it has no calculation engine. To get the last calculated result, load the workbook with data_only=True, which tells openpyxl to read the cached value Excel stored at the last save.
Why does openpyxl data_only return None for some cells?
None is returned when the cached value element is absent from the file's XML, which happens when the file was created or last saved by a tool that does not write caches β including openpyxl itself. In that case you need to open the file in Excel to trigger recalculation, or use xlwings to automate Excel programmatically.
Can I use openpyxl to read live Excel formula results without opening Excel?
No β openpyxl is a pure-Python library with no calculation engine, so it can only return whatever is stored in the file. If you need live computed results without Excel, consider restructuring your file generation to write plain values rather than formulas.
Does pandas read_excel handle formula cells automatically?
Pandas read_excel passes data_only=True to openpyxl internally, so it returns cached values rather than formula strings. If the cache is absent or stale, you will see NaN instead of a number, for the same underlying reason as with openpyxl directly.
How do I read Excel formula results in Python if the file was generated by openpyxl?
Files written by openpyxl do not contain cached values, so data_only=True will return None. The most reliable fix is to open the file in Excel once and save it, which populates the cache, or use xlwings to automate that recalculation step programmatically.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!