"We achieved 100% code coverage!" The junior dev was ecstatic. AI had generated 1,247 tests overnight. Every single line of code was tested. The coverage report was a beautiful sea of green.
Then we pushed a breaking change to production. Not a single test failed.
The Test That Tests Nothing
Here's what AI was generating:
// The function we're testing
function calculateDiscount(price, userType, promoCode) {
if (userType === 'premium') {
price *= 0.8;
}
if (promoCode === 'SAVE20') {
price *= 0.8;
}
return Math.max(price, 10); // Minimum $10
}
// AI-generated test
it('should calculate discount', () => {
const result = calculateDiscount(100, 'premium', 'SAVE20');
expect(result).toBeDefined();
expect(typeof result).toBe('number');
expect(result).toBeGreaterThan(0);
});
Technically correct. Completely useless. The test would pass if the function returned 0, 80, or 99.99. It's testing that the function exists and returns a number, not that it calculates discounts correctly.
The Coverage Illusion
Our coverage report showed 100%. Here's what was actually tested:
- ✅ Functions exist
- ✅ Functions return something
- ✅ No syntax errors
- ❌ Business logic
- ❌ Edge cases
- ❌ Integration between components
- ❌ Actual user scenarios
// Another AI masterpiece
describe('UserService', () => {
let mockDatabase;
beforeEach(() => {
mockDatabase = {
findUser: jest.fn().mockResolvedValue({ id: 1, name: 'Test' }),
saveUser: jest.fn().mockResolvedValue({ success: true })
};
});
it('should get user', async () => {
const user = await userService.getUser(1);
expect(user).toBeDefined();
expect(mockDatabase.findUser).toHaveBeenCalled();
});
});
The test passes even if getUserData is completely broken. We're testing our mock, not our code. The mock always returns success, so the test always passes.
The Snapshot Trap
// AI loves snapshot tests
it('renders correctly', () => {
const component = render(<UserProfile user={mockUser} />);
expect(component).toMatchSnapshot();
});
// Generated snapshot: 15,000 lines of DOM
// Includes timestamps, random IDs, implementation details
Now every tiny CSS change breaks 50 tests. Developers just update snapshots without looking. Real bugs hide in the noise of constant snapshot updates.
The Copy-Paste Pattern
// AI noticed a pattern and ran with it
it('should handle null input', () => {
expect(() => processData(null)).not.toThrow();
});
it('should handle undefined input', () => {
expect(() => processData(undefined)).not.toThrow();
});
it('should handle empty string input', () => {
expect(() => processData('')).not.toThrow();
});
it('should handle empty array input', () => {
expect(() => processData([])).not.toThrow();
});
// 47 more tests like this...
High coverage? Check. Useful tests? Not even close. The AI treats test writing as a coverage optimization problem, not a quality assurance problem.
The Mock Everything Approach
// AI's integration test
jest.mock('./database');
jest.mock('./emailService');
jest.mock('./logger');
jest.mock('./cache');
jest.mock('./metrics');
jest.mock('./auth');
it('should process order', async () => {
// Every dependency is mocked
const result = await processOrder(mockOrder);
expect(result.success).toBe(true);
});
// What's actually being tested? Nothing real.
When everything is mocked, you're not testing integration - you're testing your ability to write mocks.
The Real Cost
- False confidence: "We have tests!" (that test nothing)
- Maintenance burden: 1000 bad tests to update
- Slow CI/CD: Running useless tests takes time
- Hidden bugs: Real issues slip through
- Developer fatigue: Updating broken tests constantly
What Good Tests Actually Look Like
// Test actual behavior, not implementation
describe('calculateDiscount', () => {
it('applies 20% discount for premium users', () => {
expect(calculateDiscount(100, 'premium', null)).toBe(80);
});
it('applies promo code discount', () => {
expect(calculateDiscount(100, 'regular', 'SAVE20')).toBe(80);
});
it('stacks discounts for premium users with promo', () => {
expect(calculateDiscount(100, 'premium', 'SAVE20')).toBe(64);
});
it('enforces minimum price of $10', () => {
expect(calculateDiscount(20, 'premium', 'SAVE20')).toBe(10);
});
it('handles invalid promo codes', () => {
expect(calculateDiscount(100, 'regular', 'INVALID')).toBe(100);
});
});
Notice the difference? We're testing actual business rules, not just checking if functions exist.
The Right Way to Use AI for Testing
-
AI generates test scenarios (not code)
"What edge cases should I test for a discount calculation function?" - Negative prices - Extremely large numbers - Invalid user types - Null/undefined inputs - Multiple stacked discounts
-
Human reviews scenarios for completeness
- Are these realistic?
- What's missing?
- What's the business impact?
-
AI writes initial test code
- With specific scenarios
- Clear assertions
- Meaningful test names
-
Human verifies tests actually fail when code is broken
- Break the implementation
- Tests should catch it
- If not, fix the test
-
Keep only valuable tests
- Does it catch real bugs?
- Is it maintainable?
- Does it document behavior?
Red Flags in AI-Generated Tests
- Tests that never fail when you break the code
- Excessive mocking (especially of the thing being tested)
- Tests that just check types or existence
- Copy-pasted tests with minor variations
- No edge case testing
- No error case testing
- Snapshot tests of entire components
- Tests with no clear assertion
The Lesson
After our production incident, we deleted 80% of our AI-generated tests. The remaining 20% we rewrote to actually test behavior. Coverage dropped from 100% to 75%, but bug detection increased by 300%.
AI is great at generating test boilerplate. It's terrible at understanding what actually needs testing. Use it as a starting point, not the final word.
Remember: A test that always passes is worse than no test at all. At least with no test, you know you're not protected. Bad tests give you false confidence while bugs sneak into production.
The goal isn't coverage. The goal is confidence that your code works correctly. AI-generated tests optimize for the metric, not the goal.