AI Regex: Now You Have Two Problems
"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." - Jamie Zawinski
When you ask AI to write regex, you get three problems. The third is explaining to your team why the email validator is 3,000 characters long and requires 2GB of RAM.
The Simple Request
Me: "I need a regex to validate email addresses"
AI: "Here's a comprehensive email validation regex!"
/(?:[a-z0-9!#$%&'*+/=?^_`\{|\}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`\{|\}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")/
Me: "What have you done"
The Phone Number Catastrophe
// Request: "Validate US phone numbers"
// AI's response:
const phoneRegex = /^(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]\{2\})\s*(?:[.-]\s*)?([0-9]\{4\})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?$/;
// Matches:
// ✓ (555) 123-4567
// ✓ 555.123.4567
// ✓ +1-555-123-4567
// ✓ 555-GET-FOOD (wait, what?)
// ✓ 123456789012345 (that's... too many)
// ✓ My childhood trauma (somehow)
The URL Validator of Doom
// AI's URL regex (actual output, shortened for sanity)
/^(?:(?:(?:https?|ftp):)?\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d\{1,3\})\{3\})(?!(?:169\.254|192\.168)(?:\.\d\{1,3\})\{2\})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d\{1,3\})\{2\})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d\{1,2\}|2[0-4]\d|25[0-5]))\{2\}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]\{2,\})))(?::\d\{2,5\})?(?:[\/?#]\S*)?$/i
// Browser: *catches fire*
// CPU: "I need a vacation"
// Me: "Maybe just check for 'http' at the start?"
Real AI Regex Disasters
The Password Validator That Validates Everything
// Requirements: 8+ chars, 1 uppercase, 1 lowercase, 1 number, 1 special
// AI's regex:
/^(?=.*[a-z])(?=.*[A-Z])(?=.*d)(?=.*[@$!%*?&])[A-Za-zd@$!%*?&]{8,}$/
// Looks good until:
"Password1!" // ✓ Valid
"P@ssw0rd" // ✓ Valid
"????????" // ✓ Valid (8 special chars)
"AAAAAAAA1!" // ✓ Valid (no lowercase)
// The bug: lookaheads check the entire string, not each character
The Credit Card Validator
// AI regex for credit cards
/^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|6(?:011|5[0-9]{2})[0-9]{12}|(?:2131|1800|35d{3})d{11})$/
// Validates card numbers!
// Also validates my social security number
// And my phone number with extra digits
// But not actual valid test cards
The Catastrophic Backtracking Special
// AI's "simple" regex for parsing HTML (don't do this)
/<(w+)(s+w+s*=s*("[^"]*"|'[^']*'|[^>]*))*s*>/
// Test string: <div class="test" id="boom">
// Time: 0.1ms ✓
// Test string: <div class="test" id="boom" data-value="<nested>">
// Time: 47 seconds
// CPU: 100%
// Fans: Taking off
// Catastrophic backtracking has entered the chat
The International Disaster
// Me: "Validate international names"
// AI: "I'll handle all Unicode!"
/^[p{L}p{M}p{Zs}'-]{2,50}$/u
// Sounds good until:
"José" // ✓
"Māori" // ✓
"null" // ✓ (Actual name in some cultures)
"<script>alert('hi')</script>" // ✗ Good
"👨👩👧👦" // ✓ That's... a family emoji
"" // ✓ That's an invisible character
The Date Parser From Hell
// AI's date validation regex
/^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.)(?:0?[13-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d\{2\})$|^(?:29(\/|-|\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:[048]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d\{2\})$/
// Handles leap years!
// Also handles:
// 31/02/2024 (February 31st?)
// 99/99/9999 (The end times)
// My will to live (gone)
The Regex That Became Sentient
// AI tried to validate... everything
/^(?=.*[a-zA-Z])(?=.*\d)(?=.*[!@#$%^&*()_+\-=\[\]\{\};':'\\|,.<>\/?])(?=.*[^\w\s])(?!.*\s)(?!.*(.)\1\{2,\})(?!.*(012|123|234|345|456|567|678|789|890|098|987|876|765|654|543|432|321|210))(?!.*(abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz))(?!.*(password|123456|12345678|qwerty|abc123|monkey|1234567|letmein|trustno1|dragon|baseball|111111|iloveyou|master|sunshine|ashley|bailey|shadow|123123|654321|superman|michael)).*$/
// I asked for password validation
// AI delivered existential dread
How to Not Regex with AI
Option 1: Use a Library
// Instead of regex hell
import validator from 'validator';
if (validator.isEmail(email)) {
// Done. No regex. No tears.
}
Option 2: Simple and Sufficient
// Email: has @ and a dot after it
const isValidEmail = (email) => {
const parts = email.split('@');
return parts.length === 2 && parts[1].includes('.');
};
// Covers 99% of cases, readable by humans
Option 3: Progressive Enhancement
// Start simple
let emailRegex = /@/;
// Add complexity only if needed
emailRegex = /.+@.+/;
// Still readable
emailRegex = /.+@.+..+/;
// Stop here. Please.
AI Regex Prompt That Actually Works
"Create a SIMPLE regex for [purpose].
Requirements:
- Maximum 50 characters
- No lookaheads/lookbehinds unless essential
- Must be readable by humans
- Prefer false positives over complexity
- Include test cases
- Explain what each part does"
The Debugging Nightmare
// Coworker: "The regex isn't working"
// Me: "Which part?"
// The regex:
/^(?:(?:(?:(?:(?:(?:[a-zA-Z0-9])|(?:[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9])).)*(?:(?:[a-zA-Z0-9])|(?:[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9])))|[(?:(?:(?:(?:(?:[0-9])|(?:[1-9][0-9])|(?:1[0-9]{2})|(?:2[0-4][0-9])|(?:25[0-5])).){3}(?:(?:[0-9])|(?:[1-9][0-9])|(?:1[0-9]{2})|(?:2[0-4][0-9])|(?:25[0-5])))|(?:(?:(?:(?:(?:(?:[0-9a-fA-F]{1,4})):){6}(?:(?:(?:(?:(?:[0-9a-fA-F]{1,4})):(?:(?:[0-9a-fA-F]{1,4})))|(?:(?:(?:(?:(?:[0-9])|(?:[1-9][0-9])|(?:1[0-9]{2})|(?:2[0-4][0-9])|(?:25[0-5])).){3}(?:(?:[0-9])|(?:[1-9][0-9])|(?:1[0-9]{2})|(?:2[0-4][0-9])|(?:25[0-5])))))))|(?:::(?:(?:(?:[0-9a-fA-F]{1,4})):){5}(?:(?:(?:(?:(?:[0-9a-fA-F]{1,4})):(?:(?:[0-9a-fA-F]{1,4})))|(?:(?:(?:(?:(?:[0-9])|(?:[1-9][0-9])|(?:1[0-9]{2})|(?:2[0-4][0-9])|(?:25[0-5])).){3}(?:(?:[0-9])|(?:[1-9][0-9])|(?:1[0-9]{2})|(?:2[0-4][0-9])|(?:25[0-5]))))))))])$/
// Me: "Yes."
The Lessons Learned
- If you can't read it, you can't debug it
- AI doesn't understand "simple"
- Most validation doesn't need regex
- Test with edge cases, not happy paths
- Sometimes "good enough" is perfect
My Favorite AI Regex Moment
// Me: "I need to match a number"
// AI: "Here's a comprehensive number matcher!"
/^[+-]?(?:(?:(?:d{1,3}(?:(?:,d{3})|(?:.d{3}))*)|(?:d+))(?:.d+)?|.d+)(?:[eE][+-]?d+)?$/
// Me: "I meant like... d+"
// AI: "But what about scientific notation?"
// Me: "It's for a ZIP code"
// AI: "...oh"
AI writing regex is like using a Formula 1 car for your morning commute - technically it works, but you'll spend more time fixing problems than solving them. Regular expressions are already write-only code; adding AI to the mix creates write-never-read-never code. Sometimes the best regex is no regex. And if you must use regex, remember: the goal is to match patterns, not to prove the Riemann hypothesis. Keep it simple, keep it readable, and keep your sanity.