The Day I Accidentally DDoSed OpenAI (And Got Rate Limited Into Oblivion)
"We need to process 50,000 documents by tomorrow." Famous last words. I had a brilliant idea: parallelize everything. What could go wrong with firing off 1000 concurrent API calls to OpenAI?
Spoiler: Everything. Everything could go wrong.
The Setup for Disaster
The task seemed simple: analyze 50,000 customer support tickets, categorize them, and extract insights. Sequential processing would take 30 hours. But I'm smart! I know about async/await! I know about Promise.all!
// What could possibly go wrong?
const results = await Promise.all(
tickets.map(ticket => analyzeTicket(ticket))
);
First 100 requests: Beautiful. Fast. This is working!
Next 200 requests: Still good, feeling like a genius.
Request 301: Rate limit error.
Requests 302-1000: Rate limit error.
Requests 1001-50000: You're funny if you think these even tried.
The Cascade of Failure
Here's what actually happened in those 30 seconds:
- Node.js happily spawned 1000 concurrent requests
- OpenAI's rate limiter saw 1000 requests from one API key in 2 seconds
- Rate limiter: "That's a paddlin'"
- My API key got temporarily banned
- Every request started returning 429 errors
- My retry logic kicked in
- Now I'm sending 2000 requests per second
- Rate limiter: "That's a bigger paddlin'"
The Retry Logic That Made Everything Worse
// My "smart" retry logic
async function makeRequest(data, retries = 3) {
try {
return await openai.complete(data);
} catch (error) {
if (error.status === 429 && retries > 0) {
await sleep(1000); // I thought 1 second was enough LOL
return makeRequest(data, retries - 1);
}
throw error;
}
}
See the problem? Every failed request waits 1 second then tries again. When you have 1000 failed requests, they all retry at the same time. It's like a thundering herd, but dumber.
The Email From OpenAI
Two hours later:
"We've noticed unusual activity from your API key. Your account has been temporarily restricted. This appears to be unintentional, but please implement rate limiting..."
"Appears to be unintentional" - even OpenAI knew I was just being stupid, not malicious.
What Rate Limits Actually Mean
Here's what I learned the hard way about OpenAI's rate limits:
- Requests Per Minute (RPM): Not suggestions, hard limits
- Tokens Per Minute (TPM): The sneaky one that gets you
- Requests Per Day (RPD): Yes, this exists too
- Concurrent requests: There's an undocumented limit here
The kicker? You can hit TPM limit even when under RPM limit. One large request can eat your entire token budget.
The Right Way to Process in Bulk
Here's what actually works:
// Batch processing with proper rate limiting
class RateLimiter {
constructor(requestsPerMinute) {
this.queue = [];
this.processing = false;
this.interval = 60000 / requestsPerMinute; // ms between requests
}
async add(fn) {
return new Promise((resolve, reject) => {
this.queue.push({ fn, resolve, reject });
if (!this.processing) this.process();
});
}
async process() {
this.processing = true;
while (this.queue.length > 0) {
const { fn, resolve, reject } = this.queue.shift();
try {
const result = await fn();
resolve(result);
} catch (error) {
reject(error);
}
await sleep(this.interval);
}
this.processing = false;
}
}
// Usage
const limiter = new RateLimiter(50); // 50 requests per minute
const results = await Promise.all(
tickets.map(ticket =>
limiter.add(() => analyzeTicket(ticket))
)
);
The Exponential Backoff That Actually Works
async function makeRequestWithBackoff(data, attempt = 0) {
try {
return await openai.complete(data);
} catch (error) {
if (error.status === 429 && attempt < 5) {
const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
console.log(`Rate limited, waiting ${delay}ms...`);
await sleep(delay + Math.random() * 1000); // Add jitter
return makeRequestWithBackoff(data, attempt + 1);
}
throw error;
}
}
The key? Exponential backoff WITH jitter. Without jitter, all your retries happen at the same time again.
Monitoring to Prevent Disaster
Now I always track:
let stats = {
requests: 0,
tokens: 0,
errors: 0,
rateLimit429s: 0,
startTime: Date.now()
};
// Before each request
stats.requests++;
stats.tokens += estimateTokens(prompt);
// Check if we're approaching limits
if (stats.requests > 40) {
console.warn('Approaching rate limit, slowing down...');
await sleep(2000);
}
The Batching Strategy
Instead of sending 1000 individual requests, batch them:
// Process in chunks
const BATCH_SIZE = 20;
const results = [];
for (let i = 0; i < tickets.length; i += BATCH_SIZE) {
const batch = tickets.slice(i, i + BATCH_SIZE);
const batchResults = await Promise.all(
batch.map(ticket => analyzeTicket(ticket))
);
results.push(...batchResults);
// Wait between batches
if (i + BATCH_SIZE < tickets.length) {
await sleep(60000 / 3); // 3 batches per minute
}
}
Lessons Learned the Hard Way
- Start slow, scale up - Test with 10 requests, then 100, then 1000
- Monitor everything - Track requests, tokens, errors in real-time
- Implement circuit breakers - Stop everything if error rate spikes
- Use queues, not Promise.all - Control the flow
- Add jitter to retries - Prevent thundering herd
- Respect the limits - They exist for a reason
The Happy Ending
After implementing proper rate limiting:
- Processed all 50,000 tickets in 18 hours
- Zero rate limit errors
- Actually faster than my chaotic approach
- OpenAI support even complimented my implementation
The moral? Sometimes "slower" is faster when it means not getting banned from the service you depend on.
Pro tip: If you're building something that needs high throughput, talk to OpenAI sales about rate limit increases. They're surprisingly accommodating if you demonstrate you know what you're doing. Which I clearly didn't at first.