Splitting a Delimited String into Multiple Rows in PostgreSQL

May 23, 2026 6 min read 44 views
Abstract illustration of a single bar splitting into multiple stacked rows, representing string splitting into database records

You inherited a table where someone stored tags, categories, or IDs as a comma-separated string in a single column. Now you need to query each value individually, and a LIKE '%tag%' hack is not going to cut it. You need actual rows.

PostgreSQL has several clean ways to do this, and picking the right one depends on whether your delimiter is simple, whether the data is messy, and whether you need to keep track of the original row. This guide walks through each approach with working queries.

What you'll learn

  • How to use string_to_array and unnest together to explode delimited values into rows
  • How regexp_split_to_table handles irregular delimiters and whitespace
  • How to use a LATERAL join to keep the original row context intact
  • How to clean up nulls and empty strings that sneak in during splitting
  • Common mistakes that produce duplicate or missing rows

Prerequisites

You need PostgreSQL 9.4 or later (all functions here are stable across modern versions). Basic familiarity with SELECT queries and joins is assumed. No extensions required β€” everything used here is built in.

The sample data

To keep examples concrete, assume you have a table like this:

CREATE TABLE articles (
  id   SERIAL PRIMARY KEY,
  title TEXT,
  tags  TEXT
);

INSERT INTO articles (title, tags) VALUES
  ('PostgreSQL Indexing', 'database,performance,sql'),
  ('Python Basics',       'python,beginner,tutorial'),
  ('Docker Compose Tips', 'docker,devops,containers,linux');

The tags column holds comma-separated values. Your goal is to produce one row per tag, still associated with its article.

Option 1: string_to_array + unnest

This is the most readable approach for simple, fixed delimiters. string_to_array converts the string into a PostgreSQL array, and unnest expands that array into a set of rows.

SELECT
  id,
  title,
  unnest(string_to_array(tags, ',')) AS tag
FROM articles;

Running this produces:

id | title                | tag
----+----------------------+------------
 1 | PostgreSQL Indexing  | database
 1 | PostgreSQL Indexing  | performance
 1 | PostgreSQL Indexing  | sql
 2 | Python Basics        | python
 2 | Python Basics        | beginner
 2 | Python Basics        | tutorial
 3 | Docker Compose Tips  | docker
 3 | Docker Compose Tips  | devops
 3 | Docker Compose Tips  | containers
 3 | Docker Compose Tips  | linux

string_to_array(tags, ',') does exactly what the name says β€” it splits on the literal comma and returns a text[]. Then unnest turns each element of that array into its own row, repeating all other selected columns automatically.

Trimming whitespace from each value

If your data has spaces after commas (e.g. 'python, beginner, tutorial'), wrap the unnested result in trim():

SELECT
  id,
  title,
  trim(unnest(string_to_array(tags, ','))) AS tag
FROM articles;

This keeps you from accidentally storing ' beginner' instead of 'beginner' when you later insert these values somewhere else.

Option 2: regexp_split_to_table

regexp_split_to_table does the split and the expansion in one step, and it accepts a regular expression as the delimiter. That makes it useful when your separator is inconsistent β€” for example, a comma that may or may not have surrounding spaces.

SELECT
  id,
  title,
  regexp_split_to_table(tags, '\s*,\s*') AS tag
FROM articles;

The pattern \s*,\s* matches a comma with any amount of whitespace on either side. You get the same row-per-value output, but the trimming is baked into the split itself.

Use this function when your delimiter varies or you need the power of regex. For a plain comma with no variation, string_to_array + unnest is slightly more explicit and arguably easier to read at a glance.

Option 3: LATERAL join

Both options above work fine in simple SELECT statements, but once you need to join the exploded values to another table or reference the split result more than once in the same query, a LATERAL join is cleaner and avoids repeating the function call.

SELECT
  a.id,
  a.title,
  t.tag
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(a.tags, ',')) AS t(tag);

The LATERAL keyword allows the subquery on the right side of the join to reference columns from the left side (a.tags in this case). The alias t(tag) gives the expanded function a proper column name, which makes referencing it in a WHERE clause or downstream join straightforward.

Filtering by a specific tag

SELECT
  a.id,
  a.title,
  t.tag
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(a.tags, ',')) AS t(tag)
WHERE trim(t.tag) = 'python';

This is much safer than a LIKE search on the original column, which would also match 'python-advanced' or any tag that contains the substring.

Keeping the array position (index)

Sometimes you need to know where in the original string each value appeared. unnest has a companion function called unnest with ordinality that returns a row number alongside each element:

SELECT
  a.id,
  a.title,
  t.position,
  t.tag
FROM articles a
CROSS JOIN LATERAL
  unnest(string_to_array(a.tags, ',')) WITH ORDINALITY AS t(tag, position);

WITH ORDINALITY appends a 1-based integer column representing the position of each element in the original array. This is useful when order matters β€” for example, if the first tag is always the primary category.

Filtering out empty strings and NULLs

Real-world data is messy. A trailing comma in 'database,performance,' produces an empty string as the last element. A null value in the tags column causes string_to_array to return NULL, and unnest(NULL) returns zero rows, which silently drops that article from your results.

Handle both cases explicitly:

SELECT
  a.id,
  a.title,
  trim(t.tag) AS tag
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(coalesce(a.tags, ''), ',')) AS t(tag)
WHERE trim(t.tag) <> '';

coalesce(a.tags, '') turns a null tags column into an empty string before splitting, so the article still appears in results (with zero expanded rows, which the WHERE clause filters out cleanly). The WHERE trim(t.tag) <> '' removes any empty strings that trailing or double commas produce.

Using the results in a practical query

Suppose you want to count how many articles use each tag, sorted by popularity:

SELECT
  trim(t.tag) AS tag,
  count(*) AS article_count
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(coalesce(a.tags, ''), ',')) AS t(tag)
WHERE trim(t.tag) <> ''
GROUP BY trim(t.tag)
ORDER BY article_count DESC;

This gives you a clean tag frequency report without any application-side processing. The database does the heavy lifting, and you get a result set you can feed directly into a dashboard or a reporting query.

Common pitfalls

  • Calling unnest in the SELECT list alongside aggregate functions β€” PostgreSQL allows set-returning functions in SELECT, but mixing them with GROUP BY can produce surprising results. Use a LATERAL join when aggregating to keep the query intent clear.
  • Forgetting to trim β€” Even one extra space turns 'python' and ' python' into two distinct values in a GROUP BY. Trim early and consistently.
  • Assuming unnest preserves order β€” Without WITH ORDINALITY, PostgreSQL does not guarantee the order in which array elements are returned, even though in practice it usually matches the array order. If order matters, make it explicit.
  • NULL propagation β€” string_to_array(NULL, ',') returns NULL, not an empty array. Always use coalesce if any rows might have a null in the split column.
  • Performance on large tables β€” Splitting strings in SQL is a query-time operation with no index support. If you run this query frequently against millions of rows, consider normalizing the data into a proper junction table instead.

When to normalize instead

These techniques are great for one-off analysis, data migration scripts, or querying legacy schemas you cannot change. But if you control the schema and the data grows or gets queried often, storing a delimited string is the wrong data model. A proper junction table β€” article_tags(article_id, tag) β€” lets PostgreSQL use indexes, enforce uniqueness, and join efficiently without any string gymnastics.

The queries above are a good bridge while you migrate. Run them in an INSERT INTO ... SELECT to populate the normalized table from the old one:

CREATE TABLE article_tags (
  article_id INT REFERENCES articles(id),
  tag        TEXT NOT NULL
);

INSERT INTO article_tags (article_id, tag)
SELECT
  a.id,
  trim(t.tag)
FROM articles a
CROSS JOIN LATERAL unnest(string_to_array(coalesce(a.tags, ''), ',')) AS t(tag)
WHERE trim(t.tag) <> '';

One query migrates all your data. After verifying the results, you can drop the old column.

Wrapping up

You now have a practical toolkit for splitting delimited strings in PostgreSQL without reaching for application code. Here are your next steps:

  1. Try the string_to_array + unnest pattern first β€” it covers most cases and reads clearly.
  2. Switch to regexp_split_to_table if your delimiter is inconsistent or regex-expressible.
  3. Use a LATERAL join whenever you need to filter, join, or aggregate on the exploded values.
  4. Always add a coalesce and a WHERE trim(tag) <> '' guard to handle nulls and trailing delimiters in production data.
  5. If this is a permanent query pattern, plan a migration to a normalized junction table β€” your future self will thank you at query-plan review time.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.