Splitting a Delimited String into Multiple Rows in PostgreSQL
You inherited a table where someone stored tags, categories, or IDs as a comma-separated string in a single column. Now you need to query each value individually, and a LIKE '%tag%' hack is not going to cut it. You need actual rows.
PostgreSQL has several clean ways to do this, and picking the right one depends on whether your delimiter is simple, whether the data is messy, and whether you need to keep track of the original row. This guide walks through each approach with working queries.
What you'll learn
- How to use
string_to_arrayandunnesttogether to explode delimited values into rows - How
regexp_split_to_tablehandles irregular delimiters and whitespace - How to use a
LATERALjoin to keep the original row context intact - How to clean up nulls and empty strings that sneak in during splitting
- Common mistakes that produce duplicate or missing rows
Prerequisites
You need PostgreSQL 9.4 or later (all functions here are stable across modern versions). Basic familiarity with SELECT queries and joins is assumed. No extensions required β everything used here is built in.
The sample data
To keep examples concrete, assume you have a table like this:
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
tags TEXT
);
INSERT INTO articles (title, tags) VALUES
('PostgreSQL Indexing', 'database,performance,sql'),
('Python Basics', 'python,beginner,tutorial'),
('Docker Compose Tips', 'docker,devops,containers,linux');The tags column holds comma-separated values. Your goal is to produce one row per tag, still associated with its article.
Option 1: string_to_array + unnest
This is the most readable approach for simple, fixed delimiters. string_to_array converts the string into a PostgreSQL array, and unnest expands that array into a set of rows.
SELECT
id,
title,
unnest(string_to_array(tags, ',')) AS tag
FROM articles;Running this produces:
id | title | tag
----+----------------------+------------
1 | PostgreSQL Indexing | database
1 | PostgreSQL Indexing | performance
1 | PostgreSQL Indexing | sql
2 | Python Basics | python
2 | Python Basics | beginner
2 | Python Basics | tutorial
3 | Docker Compose Tips | docker
3 | Docker Compose Tips | devops
3 | Docker Compose Tips | containers
3 | Docker Compose Tips | linuxstring_to_array(tags, ',') does exactly what the name says β it splits on the literal comma and returns a text[]. Then unnest turns each element of that array into its own row, repeating all other selected columns automatically.
Trimming whitespace from each value
If your data has spaces after commas (e.g. 'python, beginner, tutorial'), wrap the unnested result in trim():
SELECT
id,
title,
trim(unnest(string_to_array(tags, ','))) AS tag
FROM articles;This keeps you from accidentally storing ' beginner' instead of 'beginner' when you later insert these values somewhere else.
Option 2: regexp_split_to_table
regexp_split_to_table does the split and the expansion in one step, and it accepts a regular expression as the delimiter. That makes it useful when your separator is inconsistent β for example, a comma that may or may not have surrounding spaces.
SELECT
id,
title,
regexp_split_to_table(tags, '\s*,\s*') AS tag
FROM articles;The pattern \s*,\s* matches a comma with any amount of whitespace on either side. You get the same row-per-value output, but the trimming is baked into the split itself.
Use this function when your delimiter varies or you need the power of regex. For a plain comma with no variation, string_to_array + unnest is slightly more explicit and arguably easier to read at a glance.
Option 3: LATERAL join
Both options above work fine in simple SELECT statements, but once you need to join the exploded values to another table or reference the split result more than once in the same query, a LATERAL join is cleaner and avoids repeating the function call.
SELECT
a.id,
a.title,
t.tag
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(a.tags, ',')) AS t(tag);The LATERAL keyword allows the subquery on the right side of the join to reference columns from the left side (a.tags in this case). The alias t(tag) gives the expanded function a proper column name, which makes referencing it in a WHERE clause or downstream join straightforward.
Filtering by a specific tag
SELECT
a.id,
a.title,
t.tag
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(a.tags, ',')) AS t(tag)
WHERE trim(t.tag) = 'python';This is much safer than a LIKE search on the original column, which would also match 'python-advanced' or any tag that contains the substring.
Keeping the array position (index)
Sometimes you need to know where in the original string each value appeared. unnest has a companion function called unnest with ordinality that returns a row number alongside each element:
SELECT
a.id,
a.title,
t.position,
t.tag
FROM articles a
CROSS JOIN LATERAL
unnest(string_to_array(a.tags, ',')) WITH ORDINALITY AS t(tag, position);WITH ORDINALITY appends a 1-based integer column representing the position of each element in the original array. This is useful when order matters β for example, if the first tag is always the primary category.
Filtering out empty strings and NULLs
Real-world data is messy. A trailing comma in 'database,performance,' produces an empty string as the last element. A null value in the tags column causes string_to_array to return NULL, and unnest(NULL) returns zero rows, which silently drops that article from your results.
Handle both cases explicitly:
SELECT
a.id,
a.title,
trim(t.tag) AS tag
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(coalesce(a.tags, ''), ',')) AS t(tag)
WHERE trim(t.tag) <> '';coalesce(a.tags, '') turns a null tags column into an empty string before splitting, so the article still appears in results (with zero expanded rows, which the WHERE clause filters out cleanly). The WHERE trim(t.tag) <> '' removes any empty strings that trailing or double commas produce.
Using the results in a practical query
Suppose you want to count how many articles use each tag, sorted by popularity:
SELECT
trim(t.tag) AS tag,
count(*) AS article_count
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(coalesce(a.tags, ''), ',')) AS t(tag)
WHERE trim(t.tag) <> ''
GROUP BY trim(t.tag)
ORDER BY article_count DESC;This gives you a clean tag frequency report without any application-side processing. The database does the heavy lifting, and you get a result set you can feed directly into a dashboard or a reporting query.
Common pitfalls
- Calling unnest in the SELECT list alongside aggregate functions β PostgreSQL allows set-returning functions in
SELECT, but mixing them withGROUP BYcan produce surprising results. Use aLATERALjoin when aggregating to keep the query intent clear. - Forgetting to trim β Even one extra space turns
'python'and' python'into two distinct values in aGROUP BY. Trim early and consistently. - Assuming unnest preserves order β Without
WITH ORDINALITY, PostgreSQL does not guarantee the order in which array elements are returned, even though in practice it usually matches the array order. If order matters, make it explicit. - NULL propagation β
string_to_array(NULL, ',')returnsNULL, not an empty array. Always usecoalesceif any rows might have a null in the split column. - Performance on large tables β Splitting strings in SQL is a query-time operation with no index support. If you run this query frequently against millions of rows, consider normalizing the data into a proper junction table instead.
When to normalize instead
These techniques are great for one-off analysis, data migration scripts, or querying legacy schemas you cannot change. But if you control the schema and the data grows or gets queried often, storing a delimited string is the wrong data model. A proper junction table β article_tags(article_id, tag) β lets PostgreSQL use indexes, enforce uniqueness, and join efficiently without any string gymnastics.
The queries above are a good bridge while you migrate. Run them in an INSERT INTO ... SELECT to populate the normalized table from the old one:
CREATE TABLE article_tags (
article_id INT REFERENCES articles(id),
tag TEXT NOT NULL
);
INSERT INTO article_tags (article_id, tag)
SELECT
a.id,
trim(t.tag)
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(coalesce(a.tags, ''), ',')) AS t(tag)
WHERE trim(t.tag) <> '';One query migrates all your data. After verifying the results, you can drop the old column.
Wrapping up
You now have a practical toolkit for splitting delimited strings in PostgreSQL without reaching for application code. Here are your next steps:
- Try the
string_to_array + unnestpattern first β it covers most cases and reads clearly. - Switch to
regexp_split_to_tableif your delimiter is inconsistent or regex-expressible. - Use a
LATERALjoin whenever you need to filter, join, or aggregate on the exploded values. - Always add a
coalesceand aWHERE trim(tag) <> ''guard to handle nulls and trailing delimiters in production data. - If this is a permanent query pattern, plan a migration to a normalized junction table β your future self will thank you at query-plan review time.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!