The Day I Accidentally DDoSed OpenAI (And Got Rate Limited Into Oblivion)

August 1, 2024 Josh Butler Technical

"We need to process 50,000 documents by tomorrow." Famous last words. I had a brilliant idea: parallelize everything. What could go wrong with firing off 1000 concurrent API calls to OpenAI?

Spoiler: Everything. Everything could go wrong.

The Setup for Disaster

The task seemed simple: analyze 50,000 customer support tickets, categorize them, and extract insights. Sequential processing would take 30 hours. But I'm smart! I know about async/await! I know about Promise.all!

// What could possibly go wrong?
const results = await Promise.all(
  tickets.map(ticket => analyzeTicket(ticket))
);

First 100 requests: Beautiful. Fast. This is working!
Next 200 requests: Still good, feeling like a genius.
Request 301: Rate limit error.
Requests 302-1000: Rate limit error.
Requests 1001-50000: You're funny if you think these even tried.

The Cascade of Failure

Here's what actually happened in those 30 seconds:

Node.js happily spawned 1000 concurrent requests
OpenAI's rate limiter saw 1000 requests from one API key in 2 seconds
Rate limiter: "That's a paddlin'"
My API key got temporarily banned
Every request started returning 429 errors
My retry logic kicked in
Now I'm sending 2000 requests per second
Rate limiter: "That's a bigger paddlin'"

The Retry Logic That Made Everything Worse

// My "smart" retry logic
async function makeRequest(data, retries = 3) {
  try {
    return await openai.complete(data);
  } catch (error) {
    if (error.status === 429 && retries > 0) {
      await sleep(1000); // I thought 1 second was enough LOL
      return makeRequest(data, retries - 1);
    }
    throw error;
  }
}

See the problem? Every failed request waits 1 second then tries again. When you have 1000 failed requests, they all retry at the same time. It's like a thundering herd, but dumber.

The Email From OpenAI

Two hours later:

"We've noticed unusual activity from your API key. Your account has been temporarily restricted. This appears to be unintentional, but please implement rate limiting..."

"Appears to be unintentional" - even OpenAI knew I was just being stupid, not malicious.

What Rate Limits Actually Mean

Here's what I learned the hard way about OpenAI's rate limits:

Requests Per Minute (RPM): Not suggestions, hard limits
Tokens Per Minute (TPM): The sneaky one that gets you
Requests Per Day (RPD): Yes, this exists too
Concurrent requests: There's an undocumented limit here

The kicker? You can hit TPM limit even when under RPM limit. One large request can eat your entire token budget.

The Right Way to Process in Bulk

Here's what actually works:

// Batch processing with proper rate limiting
class RateLimiter {
  constructor(requestsPerMinute) {
    this.queue = [];
    this.processing = false;
    this.interval = 60000 / requestsPerMinute; // ms between requests
  }

  async add(fn) {
    return new Promise((resolve, reject) => {
      this.queue.push({ fn, resolve, reject });
      if (!this.processing) this.process();
    });
  }

  async process() {
    this.processing = true;
    while (this.queue.length > 0) {
      const { fn, resolve, reject } = this.queue.shift();
      try {
        const result = await fn();
        resolve(result);
      } catch (error) {
        reject(error);
      }
      await sleep(this.interval);
    }
    this.processing = false;
  }
}

// Usage
const limiter = new RateLimiter(50); // 50 requests per minute
const results = await Promise.all(
  tickets.map(ticket => 
    limiter.add(() => analyzeTicket(ticket))
  )
);

The Exponential Backoff That Actually Works

async function makeRequestWithBackoff(data, attempt = 0) {
  try {
    return await openai.complete(data);
  } catch (error) {
    if (error.status === 429 && attempt < 5) {
      const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
      console.log(`Rate limited, waiting ${delay}ms...`);
      await sleep(delay + Math.random() * 1000); // Add jitter
      return makeRequestWithBackoff(data, attempt + 1);
    }
    throw error;
  }
}

The key? Exponential backoff WITH jitter. Without jitter, all your retries happen at the same time again.

Monitoring to Prevent Disaster

Now I always track:

let stats = {
  requests: 0,
  tokens: 0,
  errors: 0,
  rateLimit429s: 0,
  startTime: Date.now()
};

// Before each request
stats.requests++;
stats.tokens += estimateTokens(prompt);

// Check if we're approaching limits
if (stats.requests > 40) {
  console.warn('Approaching rate limit, slowing down...');
  await sleep(2000);
}

The Batching Strategy

Instead of sending 1000 individual requests, batch them:

// Process in chunks
const BATCH_SIZE = 20;
const results = [];

for (let i = 0; i < tickets.length; i += BATCH_SIZE) {
  const batch = tickets.slice(i, i + BATCH_SIZE);
  const batchResults = await Promise.all(
    batch.map(ticket => analyzeTicket(ticket))
  );
  results.push(...batchResults);
  
  // Wait between batches
  if (i + BATCH_SIZE < tickets.length) {
    await sleep(60000 / 3); // 3 batches per minute
  }
}

Lessons Learned the Hard Way

Start slow, scale up - Test with 10 requests, then 100, then 1000
Monitor everything - Track requests, tokens, errors in real-time
Implement circuit breakers - Stop everything if error rate spikes
Use queues, not Promise.all - Control the flow
Add jitter to retries - Prevent thundering herd
Respect the limits - They exist for a reason

The Happy Ending

After implementing proper rate limiting:

Processed all 50,000 tickets in 18 hours
Zero rate limit errors
Actually faster than my chaotic approach
OpenAI support even complimented my implementation

The moral? Sometimes "slower" is faster when it means not getting banned from the service you depend on.

Pro tip: If you're building something that needs high throughput, talk to OpenAI sales about rate limit increases. They're surprisingly accommodating if you demonstrate you know what you're doing. Which I clearly didn't at first.

Industry Focus

Developer Options

Resources