SQL CTEs vs Subqueries: Why Your Query Returns Different Row Counts
You write a query with a subquery, check the result, then refactor it into a CTE for readability β and the row count changes. Nothing else changed. This isn't a fluke; it's a sign that CTEs and subqueries do not always behave identically, and the difference is almost always rooted in how your database engine handles materialization, deduplication, or scope.
This article walks through the specific cases where the two constructs diverge, so you can stop guessing and start debugging with intent.
What you'll learn
- How CTEs and subqueries differ in execution strategy
- The specific SQL patterns that produce different row counts between the two
- How
DISTINCT,UNION, and recursive logic affect results differently - How to write a diagnostic query to confirm which form is causing the discrepancy
- Rules of thumb for choosing CTEs vs subqueries based on the result you need
Prerequisites
You should be comfortable writing basic SELECT statements, JOINs, and nested queries. The examples use PostgreSQL syntax, but the concepts apply to MySQL 8+, SQL Server, and BigQuery with minor adjustments.
CTEs and Subqueries Are Not Always Equivalent
Most SQL tutorials treat CTEs as cosmetic β a way to name a subquery so the code is easier to read. That framing is mostly accurate, but it misses the cases where the two constructs produce genuinely different query plans and, as a result, different output.
A subquery is evaluated inline, at the point it appears in the outer query. A CTE is defined once at the top, then referenced by name. The critical question is: does the database engine materialize the CTE (run it once, store the result, reuse it) or does it inline it (substitute the definition everywhere it appears, like a macro)?
In PostgreSQL, CTEs were historically always materialized β they acted as an optimization fence. Since PostgreSQL 12, the planner can inline non-recursive CTEs unless you explicitly use MATERIALIZED. SQL Server inlines CTEs by default. BigQuery materializes them. This inconsistency is the root of a lot of cross-platform confusion.
Case 1: Duplicate Rows from Multiple References
This is the most common source of the
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!