Fixing PostgreSQL Window Functions That Return Incorrect Partition Results
You write what looks like a perfectly correct window function, run it, and the numbers are wrong. Ranks jump, running totals restart mid-partition, or every row gets the same value across what should be separate groups. The query does not error — it just quietly returns bad data.
Window functions are one of PostgreSQL's most powerful features, but they have several failure modes that are easy to stumble into. This guide walks through each one with reproducible examples and concrete fixes.
What You'll Learn
- How a wrong or missing
PARTITION BYcauses rows to bleed across logical groups - Why leaving out
ORDER BYinsideOVER()produces non-deterministic results - How the default frame clause silently changes your aggregate window functions
- Why wrapping a window query in a subquery or CTE can kill the partition entirely
- How duplicate rows break
RANK()andROW_NUMBER()in ways that are hard to spot
Why Window Functions Go Wrong
A window function computes a result across a set of rows related to the current row — that set is its window. The window is defined by three things: the partition, the ordering, and the frame. Get any of those wrong and every output value is wrong, even if the SQL parses cleanly.
The tricky part is that PostgreSQL will not warn you. Unlike a bad JOIN that returns too many rows, a misconfigured window function returns exactly the right number of rows with plausible-looking numbers. You need to know what to look for.
Wrong or Missing PARTITION BY
The PARTITION BY clause tells PostgreSQL where the window resets. If you omit it, the entire result set is one partition. If you specify the wrong column, you get partitions that do not match your business logic.
Omitting PARTITION BY entirely
Suppose you want a running sales total per salesperson:
-- Broken: no PARTITION BY, so the window spans all rows
SELECT
salesperson_id,
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;
Without PARTITION BY salesperson_id, the running total accumulates across every salesperson. The fix is straightforward:
-- Fixed
SELECT
salesperson_id,
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY salesperson_id
ORDER BY sale_date
) AS running_total
FROM sales;
Partitioning on the wrong column
If your sales table has both a region_id and a salesperson_id, and you partition by region_id when you meant salesperson_id, you get one partition per region — not per person. The numbers look reasonable but are wrong. Always check the partition key against your actual grouping intent before assuming the query is correct.
ORDER BY Is Absent or Ambiguous
For ranking functions (RANK, DENSE_RANK, ROW_NUMBER) and navigation functions (LAG, LEAD), the ORDER BY inside OVER() determines the logical sequence within each partition. Leave it out and the result is non-deterministic.
Missing ORDER BY with ROW_NUMBER
-- Broken: row numbers are assigned in arbitrary storage order
SELECT
order_id,
customer_id,
ROW_NUMBER() OVER (PARTITION BY customer_id) AS rn
FROM orders;
On different runs — or after a vacuum — row numbers can shuffle. Fix it by providing an explicit ordering:
-- Fixed
SELECT
order_id,
customer_id,
order_date,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date, order_id -- order_id breaks ties deterministically
) AS rn
FROM orders;
Notice the tie-breaker: order_id after order_date ensures two orders on the same date always get the same row numbers across runs. Without that, two rows with identical order_date values can swap positions non-deterministically.
Ambiguous ordering causing unstable RANK
If multiple rows share the same ORDER BY value inside OVER(), RANK() assigns them the same rank (which is correct), but ROW_NUMBER() will pick one arbitrarily. This is a frequent source of confusion when you expect unique row numbers but have duplicate sort keys. Adding a unique column as a secondary sort key resolves it every time.
The Frame Clause Is Silently Changing Your Aggregates
This is the most subtle failure mode. When you use an aggregate function (SUM, AVG, COUNT) with a window, PostgreSQL applies a default frame that depends on whether you specified ORDER BY.
- With no
ORDER BY: the frame isROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING— all rows in the partition. - With
ORDER BY: the frame defaults toRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
The second default is a running aggregate, not a partition-wide aggregate. Many developers expect a total but get a cumulative sum instead:
-- Developer expects total sales per region, gets running total instead
SELECT
region_id,
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY region_id
ORDER BY sale_date -- this activates the running-total default frame
) AS total_sales
FROM sales;
If you want the partition total on every row (not a running sum), drop the ORDER BY inside the window or specify the frame explicitly:
-- Option A: no ORDER BY inside OVER — gives partition-wide total
SELECT
region_id,
sale_date,
amount,
SUM(amount) OVER (PARTITION BY region_id) AS total_sales
FROM sales;
-- Option B: explicit full frame with ORDER BY retained
SELECT
region_id,
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY region_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS total_sales
FROM sales;
The RANGE vs ROWS distinction also matters. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW includes all rows with the same ORDER BY value as the current row in the frame — so if three sales share the same date, all three are included in the "current row" frame boundary. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW counts the physical current row only. This difference causes running totals to jump at tied dates when you use RANGE.
DISTINCT, CTEs, and Subqueries That Swallow the Window
You cannot use a window function directly inside a WHERE clause, a HAVING clause, or alongside DISTINCT in a way that collapses rows before the window evaluates. PostgreSQL enforces this strictly — but the way developers work around it can introduce a new bug.
Filtering on window results with a subquery
A common pattern is wrapping the window query in a subquery to filter by the window result. That is fine. But if you accidentally apply DISTINCT or extra grouping inside the subquery, you can collapse the rows the window depends on:
-- Broken: DISTINCT collapses rows before the window function sees them
SELECT DISTINCT
customer_id,
SUM(amount) OVER (PARTITION BY customer_id) AS total
FROM orders;
DISTINCT here de-duplicates the result after the window runs, not before — so the SUM is still computed over all rows. That is actually fine in this specific case. But mixing DISTINCT with ordering functions like ROW_NUMBER() can produce confusing results because DISTINCT may eliminate rows you intended to rank.
The safe, readable pattern is always to window first, then filter in an outer query:
-- Correct: filter on window result in the outer query
SELECT *
FROM (
SELECT
customer_id,
order_id,
order_date,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date, order_id
) AS rn
FROM orders
) sub
WHERE rn = 1;
This pattern — also called a top-N per group query — is similar in concept to what fixing DISTINCT ON ordering issues addresses for a different PostgreSQL construct.
CTEs that re-materialize
In PostgreSQL 12 and earlier, CTEs were always materialized as optimization fences. From PostgreSQL 12 onward, the planner can inline non-recursive CTEs. In either case, a CTE does not change the correctness of window results — but if you apply a GROUP BY inside a CTE before passing data to a window function, you lose the row-level detail the window needs. Always keep row granularity intact inside the CTE that feeds a window function.
Ties and Duplicate Rows Breaking RANK and ROW_NUMBER
Duplicate source rows are a silent killer for ranking queries. If your underlying table has duplicate rows that should not exist, the window function faithfully ranks all of them — and you get multiple rows with rn = 1 when you expected exactly one.
-- Diagnose: are there unexpected duplicates in your partition?
SELECT customer_id, order_date, COUNT(*)
FROM orders
GROUP BY customer_id, order_date
HAVING COUNT(*) > 1;
If that query returns rows, your data has duplicates. Fix the data upstream, or deduplicate before windowing:
-- Deduplicate first, then window
WITH deduped AS (
SELECT DISTINCT ON (customer_id, order_date)
customer_id,
order_id,
order_date,
amount
FROM orders
ORDER BY customer_id, order_date, order_id -- keep the earliest order_id
)
SELECT
customer_id,
order_date,
amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date, order_id
) AS rn
FROM deduped;
Notice how DISTINCT ON is used here before the window, with a careful ORDER BY to ensure a deterministic row is kept. If you need a refresher on why that ordering matters, the article on DISTINCT ON returning the wrong row covers exactly that edge case.
Also note the difference between RANK and DENSE_RANK when ties exist:
| Function | Behavior on ties | Gap after tie? |
|---|---|---|
RANK() | Same rank for tied rows | Yes — next rank skips |
DENSE_RANK() | Same rank for tied rows | No — consecutive |
ROW_NUMBER() | Arbitrary order among ties | Always unique |
If your downstream code expects consecutive integers, use DENSE_RANK(). If it expects uniqueness, use ROW_NUMBER() with a tie-breaking sort key.
Common Pitfalls at a Glance
- No PARTITION BY: the window spans the entire result set. Every row gets the same rank or total.
- No ORDER BY inside OVER: ranking and navigation functions return non-deterministic results.
- Default frame with ORDER BY: aggregate functions become running aggregates, not partition totals.
- RANGE vs ROWS: ties in the sort key cause frame boundaries to expand unexpectedly with
RANGE. - Grouping or DISTINCT before windowing: row-level detail is lost before the window evaluates.
- Duplicate source rows: ranking produces multiple rows at the same rank position.
- Filtering in the same query level: you cannot use a window result in a
WHEREclause at the same level — always wrap in a subquery.
Data integrity issues in your source tables also affect window accuracy. If you have encountered similar data-consistency problems in other systems, the debugging approaches for Pandas pivot_table returning unexpected NaN values follow a comparable diagnostic mindset: check the input data first, then the aggregation logic.
Wrapping Up: Next Steps
Window functions fail quietly. The output looks plausible, and the query runs without error — which is exactly what makes these bugs so frustrating to track down.
Here are the concrete actions to take right now:
- Audit your PARTITION BY key. Run a quick
GROUP BYon that column and verify the groups match your business intent before trusting any window result. - Add a tie-breaker to ORDER BY. Any unique column (usually a primary key) as a secondary sort key makes ranking deterministic.
- State your frame clause explicitly. Never rely on the default. Write
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWfor running totals, orROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGfor partition totals — even when the default would match. - Check for duplicate rows before windowing. A quick
HAVING COUNT(*) > 1query on your partition key will catch hidden duplicates. - Always filter on window results in an outer query. Window, then filter — never the other way around in the same query level.
Frequently Asked Questions
Why does my PostgreSQL window function return the same value for every row?
This usually means your PARTITION BY is missing or incorrect, causing the entire result set to be treated as one partition. Check that the column in your PARTITION BY clause actually groups rows the way you intend, and confirm the column is not NULL for all rows.
How do I stop SUM OVER from giving me a running total instead of the partition total?
When you include ORDER BY inside OVER() with an aggregate like SUM, PostgreSQL defaults to a running (cumulative) frame. To get the partition total on every row, either remove the ORDER BY from inside the OVER() clause, or add an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frame clause.
Why does ROW_NUMBER return different results each time I run the same query?
ROW_NUMBER without a fully deterministic ORDER BY inside OVER() can produce different row assignments across runs, because PostgreSQL does not guarantee storage order. Add a unique column — typically the primary key — as a secondary sort key to make the ordering stable.
Can I use a window function result in a WHERE clause?
No. Window functions are evaluated after WHERE and GROUP BY, so you cannot reference their results in the same query's WHERE clause. Wrap the window query in a subquery or CTE, then filter on the window result in the outer query's WHERE clause.
What is the difference between RANK and DENSE_RANK in PostgreSQL?
Both assign the same rank to tied rows, but RANK skips subsequent rank numbers after a tie while DENSE_RANK does not. For example, two rows tied at rank 2 will be followed by rank 4 with RANK, but rank 3 with DENSE_RANK.
📤 Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!