AI Prompt Engineering

Getting ChatGPT to Generate Accurate Data Migration Scripts You Can Trust

June 17, 2026 10 min read 3 views

You pasted your table definitions into ChatGPT, asked for a migration script, and got something that looked solid. Then you ran it on staging and discovered it silently dropped a NOT NULL constraint, or worse, migrated 90% of the rows and stopped without an error. ChatGPT is genuinely useful for migration work — but only if you treat it as a junior engineer who needs precise specs, not a senior DBA who reads between the lines.

What you'll learn

  • How to structure your prompt so ChatGPT has the full schema context it needs
  • How to force constraint and index specifications into the output
  • How to break migrations into reviewable stages
  • How to request rollback logic and why you always should
  • A practical checklist for validating the script before you touch production

Prerequisites

This guide assumes you are comfortable writing SQL, understand foreign keys and indexes, and have at least one migration to run — whether that's moving tables between schemas, restructuring columns, or migrating from one database engine to another. You don't need to be a DBA, but you do need to be able to read the output critically.

The problem with AI-generated migration scripts

ChatGPT predicts plausible SQL based on what migration scripts generally look like. It doesn't know your data. It doesn't know that user_id in your orders table references a uuid column added in a migration three years ago, not the integer primary key that a bare schema dump might suggest.

The failure modes are subtle. A script might run cleanly but omit a UNIQUE constraint. It might transform a VARCHAR(255) column to TEXT without telling you that your application relies on the length limit as a validation fence. It might insert rows in the wrong order and violate a foreign key — or, if you're using MySQL with checks disabled during the migration, silently succeed and leave your database in an inconsistent state.

The fix isn't to avoid ChatGPT. It's to give it less room to guess, and to build a verification layer into your workflow. As with writing SQL queries with ChatGPT, the pattern is: give precise input, get targeted output, verify before you trust.

Give ChatGPT the full schema, not a description of it

The single biggest improvement you can make is to paste actual DDL into your prompt — not a prose description of your tables. "I have a users table with an email and a created_at" gives ChatGPT wide latitude to invent types, sizes, and nullability. The real CREATE TABLE statement leaves no room for guessing.

Pull your schema using whatever tool your database provides:

# PostgreSQL
pg_dump --schema-only --no-owner -t users -t orders -t products mydb > schema.sql

# MySQL
mysqldump --no-data --compact mydb users orders products > schema.sql

# SQLite
sqlite3 mydb.sqlite ".schema" > schema.sql

Then paste that output directly into your prompt. Include every table that participates in the migration, not just the primary table. Foreign key relationships won't be obvious to ChatGPT if the referenced table's definition is missing.

Your prompt should follow this structure:

Here is the current schema for the tables involved in this migration:

[PASTE DDL HERE]

I need to migrate from [source database / current structure] to [target database / new structure].
The goal of this migration is: [specific objective — e.g., split the `address` column into five separate columns, or move this schema from MySQL 5.7 to PostgreSQL 15].

Requirements:
- Preserve all existing data
- Maintain all constraints (NOT NULL, UNIQUE, FK) in the new structure
- The script must be safe to run on a live database with a brief maintenance window
- Include a row count check before and after each major step

Please write the migration script step by step.

The row count check instruction is important. It costs ChatGPT nothing to add SELECT COUNT(*) FROM table_name; before and after a bulk operation, and it gives you an immediate sanity check when you run the script.

Specify constraints explicitly — ChatGPT won't infer them

Even with full DDL, ChatGPT can miss implicit behavior your team relies on. Indexes that are not primary or unique keys won't necessarily be recreated in the migration target. Check constraints added informally might not carry over. Triggers are almost certainly going to be ignored unless you mention them.

Add a constraints section to your prompt:

Additionally, the following must be preserved or recreated in the target:
- Index on orders(customer_id, created_at) for query performance
- CHECK constraint on products(price) ensuring price > 0
- Trigger `update_modified_at` on users that updates `modified_at` on every UPDATE
- Sequence `order_number_seq` used by orders.order_number, starting from its current value

Explicitly listing these things moves them from "things ChatGPT might include" to "things ChatGPT is instructed to include." You can then search the output for each item and verify it's present, making review mechanical rather than intuitive.

This same principle applies when using ChatGPT to analyze unfamiliar codebases — the model works best when it has explicit targets rather than open-ended questions.

Break the migration into stages, not one giant script

Asking ChatGPT for a single complete migration script for a complex schema change is asking for trouble. A 200-line script is hard to review, hard to test, and catastrophic if it fails halfway through without proper transaction handling.

Instead, prompt for a staged approach:

Break this migration into numbered stages. Each stage should:
1. Do one logical unit of work
2. Be safe to run independently
3. Include a verification query that confirms the stage succeeded
4. Be wrapped in a transaction where the database supports it

After listing all stages, write each stage as a separate SQL block.

A typical output for splitting an address column might look like this:

-- Stage 1: Add new columns (non-destructive)
BEGIN;
ALTER TABLE users
  ADD COLUMN address_street  VARCHAR(200),
  ADD COLUMN address_city    VARCHAR(100),
  ADD COLUMN address_state   VARCHAR(50),
  ADD COLUMN address_zip     VARCHAR(20),
  ADD COLUMN address_country VARCHAR(60);
COMMIT;

-- Verify Stage 1
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'users'
  AND column_name LIKE 'address_%';

-- Stage 2: Populate new columns from existing data
BEGIN;
UPDATE users
SET
  address_street  = split_part(address, ',', 1),
  address_city    = trim(split_part(address, ',', 2)),
  address_state   = trim(split_part(address, ',', 3)),
  address_zip     = trim(split_part(address, ',', 4)),
  address_country = trim(split_part(address, ',', 5));
COMMIT;

-- Verify Stage 2
SELECT COUNT(*) AS total,
       COUNT(address_city) AS rows_with_city
FROM users;

This structure makes review practical. You can read Stage 1, understand it, run it, and check the verification query before moving to Stage 2. If Stage 3 fails, you haven't lost the work from Stages 1 and 2.

Ask for rollback logic in the same prompt

Always request rollback scripts in the same prompt as the migration itself. If you ask separately, ChatGPT may produce rollback logic that doesn't precisely mirror the forward migration, or that assumes a starting state that's already been partially modified.

After each stage, write the corresponding rollback block that exactly undoes that stage. Label each rollback clearly: "-- Rollback Stage N".

A good rollback for a column addition is straightforward:

-- Rollback Stage 1
BEGIN;
ALTER TABLE users
  DROP COLUMN IF EXISTS address_street,
  DROP COLUMN IF EXISTS address_city,
  DROP COLUMN IF EXISTS address_state,
  DROP COLUMN IF EXISTS address_zip,
  DROP COLUMN IF EXISTS address_country;
COMMIT;

Rollbacks for data transformation stages are trickier. If Stage 2 overwrites data, the rollback can't restore it unless you captured it first. Prompt ChatGPT to acknowledge this explicitly:

Where a stage involves data transformation that cannot be cleanly reversed, include a note explaining what backup step to take before running that stage.

This forces the output to include a comment like -- Before running this stage, back up the `address` column: ALTER TABLE users ADD COLUMN address_backup TEXT; UPDATE users SET address_backup = address; rather than leaving it implied.

Validate the output with a checklist before running anything

After you have the script, don't read it passively. Run through a structured checklist. This is the step most developers skip, and it's where silent errors get caught.

Structural checks

  • Every table referenced in the script exists in the DDL you provided
  • Every column name matches your actual schema exactly (watch for ChatGPT inventing created_date when your column is created_at)
  • Foreign key references point to the correct parent table and column type
  • Data types in ALTER TABLE statements match the target types you specified

Constraint checks

  • NOT NULL constraints that existed in the source are present in the target
  • UNIQUE constraints are recreated, not just described in a comment
  • Every index you listed is present as an explicit CREATE INDEX statement
  • CHECK constraints use the correct syntax for your target database engine

Safety checks

  • Destructive operations (DROP COLUMN, TRUNCATE, DELETE) appear only after non-destructive ones
  • Transactions wrap every data-modifying stage
  • Verification queries are present and actually test the right thing
  • The script does not disable foreign key checks without re-enabling them in a guaranteed block

Run the script against a restored copy of your production database before staging. A migration that passes on synthetic test data can fail on real data because of encoding edge cases, null values in columns you assumed were populated, or row counts that trip a timeout.

Common pitfalls when using ChatGPT for migrations

Type mapping errors between database engines

If you're migrating from MySQL to PostgreSQL, ChatGPT may map TINYINT(1) to BOOLEAN correctly most of the time — but it might leave DATETIME as TIMESTAMP without converting timezone behavior, which is a semantic change. Always prompt: "List any type mappings you applied and explain why you chose each target type." Then read that list carefully.

Sequence and auto-increment handling

PostgreSQL uses sequences; MySQL uses AUTO_INCREMENT. ChatGPT often generates the column definition correctly but forgets to set the sequence's current value to match your existing maximum ID. Add to your prompt: "Ensure any sequence or auto-increment value starts at the current maximum value of the column plus one."

Case sensitivity in column and table names

MySQL on Linux is case-sensitive for table names by default. PostgreSQL lowercases unquoted identifiers. If your schema uses mixed case, ChatGPT may drop the quotes, silently renaming "OrderItems" to orderitems. Specify: "Preserve the exact case of all identifiers using double quotes where necessary."

Missing WHERE clauses on UPDATE statements

This one is rare but dangerous. If ChatGPT writes a data transformation UPDATE, verify it has a WHERE clause if the transformation only applies to a subset of rows. An absent WHERE clause updates every row in the table. This is the kind of silent breakage covered in depth in the guide on debugging ChatGPT suggestions that break edge cases.

Assuming English locale for string parsing

If your migration involves splitting or parsing string data, ChatGPT will assume clean, well-formatted values. Your actual data probably has inconsistencies — extra spaces, missing commas, values that don't match the expected pattern. Ask explicitly: "What happens to rows where the source data doesn't match the expected format? Include handling for those cases."

The same skepticism that makes you a better reviewer of pull requests makes you a better user of AI-generated code. The advice in this guide on getting useful code reviews from ChatGPT applies here: give it something specific to react to, not an open canvas.

Wrapping up

ChatGPT can cut the time you spend drafting migration scripts significantly — but only if you treat prompt quality as the bottleneck, not the AI's capability. Here's what to do next:

  1. Dump your actual DDL for the next migration you need to write and paste it verbatim into the prompt instead of describing it in prose.
  2. Add an explicit constraints list covering indexes, CHECK constraints, triggers, and sequences that won't be obvious from the DDL alone.
  3. Request staged output with a verification query after each stage and a paired rollback block.
  4. Run the structural, constraint, and safety checklist before touching staging — not after.
  5. Test against a production-like dataset, not synthetic fixtures, to catch encoding and null-value edge cases that clean test data hides.

The migration script ChatGPT gives you after following this process will still need your eyes on it. But it will be structured, traceable, and far less likely to surprise you at 2am on a production deploy.

Frequently Asked Questions

Can ChatGPT really generate a working SQL migration script or does it always need heavy editing?

ChatGPT can generate a working migration script when given precise DDL input, explicit constraint requirements, and a staged output structure. Without that context it tends to guess at types and omit constraints, which requires significant editing before the script is safe to run.

What should I always include in a ChatGPT prompt for a database migration?

Always include the full CREATE TABLE DDL for every table involved, the source and target database engines, a list of indexes and constraints to preserve, and an explicit request for rollback logic and row count verification after each stage.

How do I handle a ChatGPT-generated migration script that fails halfway through?

If you requested staged scripts with each stage wrapped in a transaction, a failure will roll back only the current stage and leave previous stages intact. This is why breaking migrations into small, independently committable stages is critical — it limits the blast radius of any single failure.

Is it safe to run a ChatGPT-generated migration directly on a production database?

No — always run it against a restored copy of your production database first. Real production data often contains encoding edge cases, unexpected nulls, and value formats that clean test data doesn't expose, and these can cause a script that passed on staging to fail or produce incorrect results on production.

How do I get ChatGPT to handle type mapping correctly when migrating from MySQL to PostgreSQL?

Ask ChatGPT to explicitly list every type mapping it applies and explain the reasoning behind each choice. Then review that list against PostgreSQL documentation to catch semantic differences, particularly around boolean handling, timezone behavior in timestamps, and text length semantics.

📤 Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

📬 Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.