"Claude now supports 100k tokens!" "GPT-4 Turbo has 128k context!" Cool. Last week I fed Claude a 80k token codebase and asked it to refactor a function. It forgot the function existed and wrote a new one. With a different name. That did something completely different.
Let's talk about why those impressive context window numbers are basically marketing fiction.
The Great Context Window Scam
Here's what they don't tell you: just because a model accepts 128k tokens doesn't mean it can actually use them. It's like having a 1TB hard drive that corrupts any file over 50GB.
I ran an experiment. Fed various models increasingly large contexts and asked questions about content at different positions:
- First 10k tokens: Near perfect recall
- 10k-30k tokens: Pretty good, occasional misses
- 30k-50k tokens: Hit or miss, depends on the day
- 50k+ tokens: "I don't see that in the context" (it's literally right there)
The Middle Child Problem
You know what's worse than information at the end being forgotten? Information in the middle. Models have this weird tendency to remember the beginning and end but completely forget the middle exists.
Real example from last month:
// 80k token codebase analysis
prompt = `
Here's our React codebase... [60k tokens of components]
In the middle somewhere, I defined a critical helper function:
function calculateShippingCost(weight, distance, priority) {
// Complex logic for shipping calculations
}
[20k more tokens of other components]
Question: Can you refactor calculateShippingCost to handle international shipping?
`
// Claude's response:
"I don't see a calculateShippingCost function in your codebase.
Let me create one for you..."
The function was literally right there, buried in token position 35,000.
Real Usable Context Limits (Based on Actual Testing)
After tons of testing, here's the real usable context for different tasks:
Code Analysis: ~20k tokens Beyond this, the model starts missing important relationships between components. It'll see individual functions but miss how they connect.
Documentation Writing: ~40k tokens Any more and it starts contradicting earlier sections or forgetting the document structure entirely.
Debugging: ~15k tokens Need laser focus for debugging. Too much context and the model gets distracted by irrelevant code.
Code Refactoring: ~25k tokens Enough to understand the surrounding architecture but not so much it loses track of what needs changing.
How Attention Actually Degrades
Here's what I've observed about how attention degrades:
// Attention degradation pattern
const attentionByPosition = {
"0-5k tokens": "Excellent retention (95%)",
"5k-15k tokens": "Good retention (85%)",
"15k-30k tokens": "Decent retention (70%)",
"30k-50k tokens": "Poor retention (40%)",
"50k+ tokens": "Mostly ignored (15%)"
};
This isn't exact science, but it's close enough for planning purposes.
The Smart Chunking Strategy
Instead of one 80k token monster prompt, I break it down:
// Multi-stage analysis
const stages = [
{
stage: 1,
purpose: "Analyze overall structure",
tokens: "20k",
focus: "High-level architecture, main components"
},
{
stage: 2,
purpose: "Deep dive on specific area",
tokens: "20k",
focus: "Detailed implementation of target component"
},
{
stage: 3,
purpose: "Implementation with focused context",
tokens: "20k",
focus: "Only relevant code + specifications"
}
];
Redundant? Yes. Effective? Also yes.
Context Compression Techniques
I've gotten good at compressing context without losing information:
Before (Verbose):
// UserDashboard.jsx - Full component (2,400 tokens)
import React, { useState, useEffect } from 'react';
import { UserStats } from './UserStats';
import { ActivityChart } from './ActivityChart';
import { RecentActions } from './RecentActions';
const UserDashboard = ({ userId, refreshInterval = 30000 }) => {
const [userData, setUserData] = useState(null);
const [isLoading, setIsLoading] = useState(true);
const [error, setError] = useState(null);
useEffect(() => {
const fetchUserData = async () => {
// ... 50 lines of implementation
};
// ... rest of component
}, [userId]);
return (
<div className="user-dashboard">
{/* ... lots of JSX */}
</div>
);
};
After (Compressed):
// UserDashboard component summary (180 tokens)
UserDashboard:
- Props: userId, refreshInterval
- State: userData, isLoading, error
- Effects: Fetches user data, auto-refreshes every 30s
- Renders: UserStats, ActivityChart, RecentActions
- Key functions: fetchUserData(), handleRefresh()
- Dependencies: React hooks, 3 child components
### CRITICAL: User data structure ###
interface UserData {
id: string;
stats: { logins: number; purchases: number };
activity: ActivityRecord[];
recent: Action[];
}
### END CRITICAL ###
The Section Marking Trick
This creates anchors the model can latch onto:
### CRITICAL: Database Schema ###
users: id, email, created_at, subscription_tier
orders: id, user_id, total, status, created_at
products: id, name, price, inventory
### END CRITICAL ###
[lots of other context]
### CRITICAL: API Endpoints ###
GET /api/users/:id - Returns user with stats
POST /api/orders - Creates new order
PUT /api/users/:id - Updates user profile
### END CRITICAL ###
The "CRITICAL" markers help the model remember what's important even when buried in the middle.
Tools for Context Management
// Simple token counter
function estimateTokens(text) {
// Rough approximation: 1 token ≈ 4 characters
return Math.ceil(text.length / 4);
}
// Smart context builder
class ContextBuilder {
constructor(maxTokens = 20000) {
this.maxTokens = maxTokens;
this.sections = [];
}
addCritical(content, label) {
this.sections.push({
type: 'critical',
content: `### CRITICAL: ${label} ###\n${content}\n### END CRITICAL ###`,
priority: 1
});
}
addSupporting(content, label) {
this.sections.push({
type: 'supporting',
content: `// ${label}\n${content}`,
priority: 2
});
}
build() {
// Sort by priority, fit within token limit
let totalTokens = 0;
let result = [];
const sorted = this.sections.sort((a, b) => a.priority - b.priority);
for (const section of sorted) {
const tokens = estimateTokens(section.content);
if (totalTokens + tokens <= this.maxTokens) {
result.push(section.content);
totalTokens += tokens;
}
}
return result.join('\n\n');
}
}
The Reality Check
Context windows aren't useless - they're just oversold. The real breakthroughs come from working with the limitations, not pretending they don't exist.
What works:
- Multiple focused conversations instead of one huge one
- Smart compression and summarization
- Critical section marking
- Token budgeting
What doesn't work:
- Dumping your entire codebase and expecting magic
- Assuming the model can juggle 50 different concerns at once
- Ignoring the middle-context attention problem
The 128k context window is like a sports car's top speed rating. Technically achievable under perfect conditions. Practically useless in real-world traffic.
Build your AI workflows around 20k token chunks, and you'll get consistently better results than trying to use the full "rated capacity."