Context Windows Are Lying to You

August 15, 2024 Josh Butler Technical

"Claude now supports 100k tokens!" "GPT-4 Turbo has 128k context!" Cool. Last week I fed Claude a 80k token codebase and asked it to refactor a function. It forgot the function existed and wrote a new one. With a different name. That did something completely different.

Let's talk about why those impressive context window numbers are basically marketing fiction.

The Great Context Window Scam

Here's what they don't tell you: just because a model accepts 128k tokens doesn't mean it can actually use them. It's like having a 1TB hard drive that corrupts any file over 50GB.

I ran an experiment. Fed various models increasingly large contexts and asked questions about content at different positions:

First 10k tokens: Near perfect recall
10k-30k tokens: Pretty good, occasional misses
30k-50k tokens: Hit or miss, depends on the day
50k+ tokens: "I don't see that in the context" (it's literally right there)

The Middle Child Problem

You know what's worse than information at the end being forgotten? Information in the middle. Models have this weird tendency to remember the beginning and end but completely forget the middle exists.

Real example from last month:

PROMPT STRUCTURE:
- Instructions (tokens 0-1k)
- Codebase context (tokens 1k-60k)
- Important configuration (tokens 60k-62k) 
- The actual task (tokens 62k-63k)

MODEL RESPONSE: "I'll help with that task, but I don't see any configuration in the context."

ME: *screams internally*

What Actually Works With Large Contexts

After tons of testing, here's the real usable context for different tasks:

Code Analysis: ~20-30k tokens max
Beyond this, the model starts missing important relationships between components. It'll see individual functions but miss how they connect.

Document Writing: ~10-15k tokens
Any more and it starts contradicting earlier sections or forgetting the document structure entirely.

Debugging/Troubleshooting: ~5-10k tokens
Need laser focus for debugging. Too much context and the model gets distracted by irrelevant code.

Refactoring: ~15-20k tokens
Enough to understand the surrounding architecture but not so much it loses track of what needs changing.

The Attention Degradation Curve

Here's what I've observed about how attention degrades:

Token Position    | Attention Quality
0-5k             | 100% - Crystal clear
5-10k            | 95% - Nearly perfect
10-20k           | 85% - Good enough
20-30k           | 70% - Starting to miss things
30-50k           | 50% - Coin flip whether it remembers
50-70k           | 30% - Probably forgot
70k+             | 10% - Might as well not exist

This isn't exact science, but it's close enough for planning purposes.

Context Window Hacks That Actually Work

1. The Summary Sandwich

Start: "Here's what we're doing: [brief summary]"
Middle: [Actual context]
End: "Remember, we're doing: [brief summary again]'

Redundant? Yes. Effective? Also yes.

2. The Chunking Strategy
Instead of one 80k token monster prompt, I do:

First prompt: Analyze structure (20k tokens)
Second prompt: Deep dive on specific area (20k tokens)
Third prompt: Implementation with focused context (20k tokens)

3. The Reference System

[REF-AUTH-001] Authentication system uses JWT with refresh tokens
[REF-DB-001] PostgreSQL with Prisma ORM
[REF-API-001] REST API with Express

Later in prompt: "Following REF-AUTH-001, implement logout..."

Creates anchors the model can latch onto.

The Context Compression Game

I've gotten good at compressing context without losing information:

Before (wasteful):

// Full 500-line React component with all imports, comments, styles

After (efficient):

UserDashboard component:
- Props: userId, refreshInterval
- State: userData, isLoading, error
- Effects: Fetches user data, auto-refreshes
- Renders: UserStats, ActivityChart, RecentActions
- Key functions: fetchUserData(), handleRefresh()

Same information, 90% fewer tokens.

When You Actually Need Large Context

Sometimes you legitimately need to feed massive context. Here's how to make it work:

1. Put critical info at start AND end
The model pays most attention to the beginning and end. Use both.

2. Use explicit markers

### CRITICAL: Database Schema ###
[schema here]
### END CRITICAL ###

3. Reference important sections
"As defined in the CRITICAL: Database Schema section above..."

4. Test with questions
Before the actual task, ask: "What's the primary key of the users table?" If it can't answer, your context is too large.

The Future (It's Not More Tokens)

Everyone's racing to have the biggest context window. Million token contexts! Infinite memory!

Here's the "hing: we don't need bigger contexts. We need smarter context usage. I"d take a model with perfect 30k token attention over sketchy 200k token attention any day.

What we really need:

Consistent attention across the entire window
The ability to specify "priority" sections
Dynamic context loading (fetch relevant parts as needed)
Better context compression techniques

My Context Window Survival Guide

Assume 30k tokens is your real limit
Put critical information within the first 10k tokens
Repeat important points at the end
Use chunking for large tasks
Test attention with sanity checks
Compress aggressively
Build context incrementally

Next time someone brags about their model's huge context window, ask them about attention degradation at 50k tokens. Watch them change the subject. Context windows are like vacation days - the number they advertise and the number you can actually use are very different things.

Industry Focus

Developer Options

Resources