Database Migrations With AI: A Cautionary Tale
"Can you write a migration to add a 'deleted_at' column for soft deletes?" Simple request. The AI delivered a migration that looked perfect. Ran it in staging. Everything worked. Ran it in production. 3.2 million rows vanished.
Let me tell you about the longest Tuesday of my life.
The "Simple" Migration
Here's what I asked for:
"Write a PostgreSQL migration to add soft delete functionality to our users table"
Here's what the AI gave me:
-- Add soft delete column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;
-- Create index for performance
CREATE INDEX idx_users_deleted_at ON users(deleted_at);
-- Update existing rows
UPDATE users SET deleted_at = NULL;
-- Add check constraint
ALTER TABLE users ADD CONSTRAINT check_deleted_at
CHECK (deleted_at IS NULL OR deleted_at <= CURRENT_TIMESTAMP);
Looks good, right? I thought so too.
The Disaster Unfolds
What the AI didn't mention:
- Our ORM had default scopes that filter out soft-deleted records
- Adding the column triggered the ORM to apply the scope
- But we hadn't updated the queries yet
- So everything looked "deleted" to the application
Timeline of panic:
2:00 PM - Run migration
2:01 PM - "Looks good!"
2:02 PM - Support: "Users can't log in'
2:03 PM - Check database - all users still there
2:04 PM - Check application - WHERE deleted_at IS NULL everywhere
2:05 PM - All queries returning 0 rows
2:06 PM - 🚨🚨🚨
More AI Migration Disasters
The Index Optimizer
-- AI: "This will improve query performance"
DROP INDEX idx_users_email;
CREATE INDEX CONCURRENTLY idx_users_email_lower ON users(LOWER(email));
-- What actually happened:
-- 1. DROP INDEX locked the table
-- 2. Application queries piled up
-- 3. Connection pool exhausted
-- 4. Site down for 8 minutes
The Type "Improvement"
-- AI: "VARCHAR(255) is outdated, use TEXT"
ALTER TABLE products ALTER COLUMN sku TYPE TEXT;
-- Reality: sku column had a UNIQUE constraint
-- PostgreSQL had to rebuild the entire index
-- On a 50 million row table
-- During Black Friday
-- I still have nightmares
The Helpful Rename
-- Me: "Rename created_time to created_at"
-- AI: "I'll also update it to follow conventions!'
ALTER TABLE orders RENAME COLUMN created_time TO created_at;
ALTER TABLE orders ALTER COLUMN created_at TYPE TIMESTAMP WITH TIME ZONE;
-- Problem: Column was TIMESTAMP WITHOUT TIME ZONE
-- All times got converted to UTC
-- Customer orders "shifted" by hours
-- Accounting department wanted my head
Why AI Struggles With Migrations
1. No Understanding of Side Effects
- Doesn't know about triggers
- Doesn't understand ORM behavior
- Can't predict application-level impacts
- Doesn't consider cascade effects
2. Optimizes for "Correctness" Not Safety
-- AI version (technically correct)
ALTER TABLE users DROP COLUMN legacy_field;
-- Safe version
ALTER TABLE users RENAME COLUMN legacy_field TO legacy_field_deprecated;
-- Drop it next month after confirming nothing breaks
3. Doesn't Understand Your Specific Setup
- Your replication lag
- Your backup schedule
- Your peak traffic times
- Your specific PostgreSQL version quirks
The Rules for AI-Assisted Migrations
Rule 1: AI Writes, Human Reviews, DBA Approves
1. AI generates migration
2. Developer reviews for logic
3. Run EXPLAIN on affected queries
4. Test in dev environment
5. Test in staging with production data copy
6. DBA reviews execution plan
7. Schedule for low-traffic window
8. Have rollback ready
Rule 2: Always Make Migrations Reversible
-- AI usually forgets the DOWN migration
-- Always add:
-- UP
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;
-- DOWN
ALTER TABLE users DROP COLUMN deleted_at;
-- Better: Make it safer
-- DOWN
ALTER TABLE users RENAME COLUMN deleted_at TO deleted_at_backup;
Rule 3: Break Complex Migrations Into Steps
-- Instead of AI's all-in-one migration
-- Do this:
-- Migration 1: Add column (safe)
ALTER TABLE users ADD COLUMN new_field INTEGER;
-- Migration 2: Populate data (can monitor)
UPDATE users SET new_field = calculation WHERE ...;
-- Migration 3: Add constraints (after verification)
ALTER TABLE users ALTER COLUMN new_field SET NOT NULL;
What to Never Let AI Do
- Drop anything - Tables, columns, constraints, indexes
- Rename without aliases - Always keep old names temporarily
- Change column types - Especially timestamps, numerics
- Add NOT NULL without defaults - On existing tables
- Modify primary keys - Just... don't
The Safe Prompting Strategy
"Write a PostgreSQL migration to add soft deletes.
Requirements:
- Must be reversible
- Must not lock tables
- Must work with existing data
- Must include rollback plan
- Assume millions of rows
- Include timing estimates
- Note any application changes needed"
My Migration Checklist
After too many disasters:
â–¡ Can this be done without locking?
â–¡ What happens to existing queries?
â–¡ What's the rollback plan?
â–¡ Have I tested with production data volume?
â–¡ What monitoring do I need?
â–¡ Who needs to be on-call?
â–¡ Is there a maintenance window?
â–¡ Did I update the runbook?
â–¡ Am I really sure about this?
â–¡ Am I REALLY sure?
The Happy Ending
We recovered the "soft deleted" users by:
- Immediately removing the default scope from the ORM
- Running UPDATE users SET deleted_at = NULL
- Redeploying the application
- Pretending it was "scheduled maintenance"
- Implementing the migration properly over the weekend
Total downtime: 23 minutes
Lessons learned: Priceless
Trust in AI migrations: Zero
AI is great at writing migrations that work in theory. Production doesn't care about theory. Always assume your migration will run at the worst possible time, on more data than you expect, with side effects you didn't consider. Because it will.