Building Long-Term Memory for AI: My System That Actually Works

April 12, 2024 Josh Butler Technical

"As we discussed yesterday..." I typed for the hundredth time before realizing the AI had no idea what we discussed yesterday. Or five minutes ago. That's when I decided to build a memory system that actually works.

Six months later, my AI assistant remembers our entire project history. Here's how.

The Problem With Stateless AI

Every new chat session:

"We're using React with TypeScript"
"The database is PostgreSQL with Prisma\"
"We follow the Airbnb style guide"
"Remember, we can't use bleeding-edge features"
"As I mentioned before..." (I didn't, it was a different session)

I was spending 10 minutes per session just setting context. Multiply that by 20 sessions per week...

Version 1: The Naive Approach

My first attempt: Just prepend all previous conversations!

const context = await loadAllPreviousChats();
const prompt = context + "

" + currentQuery;

Problems appeared immediately:

Hit token limits after 3 conversations
AI got confused by contradictory old information
Costs skyrocketed (paying for the same context repeatedly)
Performance tanked (processing 50k tokens for a simple question)

Version 2: The Summary System

Next idea: Summarize conversations and use summaries as context.

// After each conversation
const summary = await ai.summarize(conversation);
await saveSummary(projectId, summary);

// For new conversations
const summaries = await loadSummaries(projectId);
const context = summaries.join('
');

Better, but summaries lost crucial details. "We discussed authentication" doesn't help when you need to remember we're using JWT with 15-minute expiry and refresh tokens in httpOnly cookies.

Version 3: The Smart Memory System

What finally worked: A hierarchical memory system with different types of memory.

1. Project Constants (permanent memory)

// PROJECT_MEMORY.json
{
  "tech_stack": {
    "frontend": "React 18, TypeScript 5.2, MUI 5",
    "backend": "Node.js 20, Express, PostgreSQL 15",
    "tools": "Vite, ESLint, Prettier"
  },
  "constraints": [
    "Must support IE11 (client requirement)",
    "Cannot use experimental features",
    "Must pass WCAG 2.1 AA compliance"
  ],
  "patterns": {
    "state_management": "Zustand (not Redux)",
    "styling": "Emotion (not styled-components)",
    "testing": "Vitest + React Testing Library"
  }
}

2. Decision Log (episodic memory)

// DECISIONS.md
## 2024-03-15: Authentication Strategy
Decided: JWT with refresh tokens
Rejected: Session-based auth
Reason: Need stateless for microservices

## 2024-03-20: Database Schema
Decided: Soft deletes with deleted_at
Rejected: Hard deletes
Reason: Audit requirements

## 2024-03-22: API Versioning
Decided: URL versioning (/api/v1)
Rejected: Header versioning
Reason: Easier for client teams

3. Context Snippets (working memory)

// Generated from recent conversations
{
  "current_task": "Implementing user dashboard",
  "recent_issues": [
    "Performance problem with user list (>1000 items)",
    "CORS issues with staging environment"
  ],
  "pending_decisions": [
    "Choose between virtualization or pagination for lists"
  ],
  "code_context": {
    "working_on": "src/components/Dashboard/UserList.tsx",
    "related_files": ["api/users.ts", "hooks/useUsers.ts"]
  }
}

The Memory Pipeline

class AIMemorySystem {
  async buildContext(query) {
    // 1. Load permanent memory (always included)
    const projectMemory = await this.loadProjectConstants();
    
    // 2. Find relevant decisions (semantic search)
    const relevantDecisions = await this.searchDecisions(query);
    
    // 3. Get recent context (last 3 sessions)
    const recentContext = await this.getRecentContext();
    
    // 4. Smart truncation to fit token limits
    return this.optimizeContext({
      projectMemory,
      relevantDecisions,
      recentContext,
      maxTokens: 4000 // Leave room for conversation
    });
  }
  
  async saveInteraction(query, response) {
    // Extract important information
    const insights = await this.extractInsights(query, response);
    
    // Update different memory types
    if (insights.hasDecision) {
      await this.logDecision(insights.decision);
    }
    
    if (insights.hasNewPattern) {
      await this.updatePatterns(insights.pattern);
    }
    
    // Update working memory
    await this.updateRecentContext(insights);
  }
}

The Semantic Search Layer

Key innovation: Not all memory is relevant to every query.

// Embedding-based retrieval
async function findRelevantMemory(query) {
  const queryEmbedding = await getEmbedding(query);
  
  // Search different memory stores
  const decisions = await searchDecisions(queryEmbedding, limit=5);
  const codePatterns = await searchPatterns(queryEmbedding, limit=3);
  const previousIssues = await searchIssues(queryEmbedding, limit=3);
  
  return {
    decisions: decisions.filter(d => d.similarity > 0.8),
    patterns: codePatterns.filter(p => p.similarity > 0.75),
    issues: previousIssues.filter(i => i.similarity > 0.7)
  };
}

The Auto-Update System

The magic: Memory that updates itself.

// After each conversation
async function updateMemory(conversation) {
  const analysis = await ai.analyze({
    prompt: `Extract key information:
    - New decisions made
    - Technical details discovered
    - Problems encountered
    - Patterns identified`,
    conversation
  });
  
  // Update appropriate memory stores
  for (const decision of analysis.decisions) {
    await appendToDecisionLog(decision);
  }
  
  for (const pattern of analysis.patterns) {
    await updateProjectPatterns(pattern);
  }
  
  // Prune old/irrelevant information
  await pruneWorkingMemory();
}

Memory Optimization Tricks

1. Compression Through Abstraction

// Instead of storing:
"We use React.useState for component state,
React.useEffect for side effects,
React.useMemo for expensive computations..."

// Store:
"Standard React hooks patterns (useState, useEffect, useMemo)"

2. Reference System

// Memory stores references
"Auth implementation: See AUTH_SPEC.md"
"Database schema: See schema.prisma"

// AI knows to ask for these files when needed

3. Expiring Memory

{
  "memory_type": "temporary",
  "content": "Working on fixing login bug",
  "expires": "2024-04-20",
  "priority": "high"
}

Results

Before memory system:

10 minutes setting context per session
Constant repetition of requirements
AI suggesting things we'd already rejected
No continuity between sessions

After memory system:

30 seconds to load relevant context
AI remembers all project decisions
Suggests solutions based on established patterns
Feels like working with a team member

The Files That Make It Work

project/
├── .ai/
│   ├── PROJECT_MEMORY.json      # Permanent facts
│   ├── DECISIONS.md            # Decision log
│   ├── PATTERNS.json           # Code patterns
│   ├── CONTEXT.json            # Current working memory
│   └── embeddings.db           # Semantic search index
├── src/
└── ...

Getting Started With Your Own System

Start with a simple PROJECT_MEMORY.json
Log decisions as you make them
Build context loading into your AI workflow
Add semantic search when simple loading isn't enough
Automate memory updates based on conversations

The key insight: AI doesn't need to remember everything, just the right things at the right time.

Building memory for AI is like building a second brain for your project. It takes effort to set up, but once it's running, you'll wonder how you ever worked without it. Start simple, iterate based on what you find yourself repeating, and soon you'll have an AI that truly understands your project.

Industry Focus

Developer Options

Resources