claude-recall

Overview Schema Related Servers Score Discussions

architecture-evolution.mdx•32 KiB

--- title: "Architecture Evolution" description: "How claude-recall evolved from v3 to v5+" --- # Architecture Evolution ## The Problem We Solved **Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists. **Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow? This is the story of how claude-recall evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work. --- ## v5.x: Maturity and User Experience After establishing the solid v4 architecture, v5.x focused on user experience, visualization, and polish. ### v5.1.2: Theme Toggle (November 2025) **What Changed**: Added light/dark mode theme toggle to viewer UI **New Features**: - User-selectable theme preference (light, dark, system) - Persistent theme settings in localStorage - Smooth theme transitions - System preference detection **Implementation**: ```typescript // Theme context with persistence const ThemeProvider = ({ children }) => { const [theme, setTheme] = useState<'light' | 'dark' | 'system'>(() => { return localStorage.getItem('claude-recall-theme') || 'system'; }); useEffect(() => { localStorage.setItem('claude-recall-theme', theme); }, [theme]); return ( <ThemeContext.Provider value={{ theme, setTheme }}> {children} </ThemeContext.Provider> ); }; ``` **Why It Matters**: Users working in different lighting conditions can now customize the viewer for comfort. ### v5.1.1: Worker Startup Fix (November 2025) - Now Deprecated **Note**: This section describes a historical PM2-based approach that has been replaced with Bun in later versions. **The Problem**: Worker startup failed on Windows with ENOENT error when using PM2 **Historical Solution**: Used full path to PM2 binary instead of relying on PATH **Current Approach**: The project now uses Bun for process management, which provides better cross-platform compatibility and eliminates these PATH-related issues. **Impact**: Cross-platform compatibility restored, Windows users can now use claude-recall without issues. ### v5.1.0: Web-Based Viewer UI (October 2025) **The Breakthrough**: Real-time visualization of memory stream **What We Built**: - React-based web UI at http://localhost:37777 - Server-Sent Events (SSE) for real-time updates - Infinite scroll pagination - Project filtering - Settings persistence (sidebar state, selected project) - Auto-reconnection with exponential backoff - GPU-accelerated animations **New Worker Endpoints** (8 additions): ``` GET / # Serves viewer HTML GET /stream # SSE real-time updates GET /api/prompts # Paginated user prompts GET /api/observations # Paginated observations GET /api/summaries # Paginated session summaries GET /api/stats # Database statistics GET /api/settings # User settings POST /api/settings # Save settings ``` **Database Enhancements**: ```typescript // New SessionStore methods for viewer getRecentPrompts(limit, offset, project?) getRecentObservations(limit, offset, project?) getRecentSummaries(limit, offset, project?) getStats() getUniqueProjects() ``` **React Architecture**: ``` src/ui/viewer/ ├── components/ │ ├── Header.tsx # Navigation + stats │ ├── Sidebar.tsx # Project filter │ ├── Feed.tsx # Infinite scroll │ └── cards/ │ ├── ObservationCard.tsx │ ├── PromptCard.tsx │ ├── SummaryCard.tsx │ └── SkeletonCard.tsx ├── hooks/ │ ├── useSSE.ts # Real-time events │ ├── usePagination.ts # Infinite scroll │ ├── useSettings.ts # Persistence │ └── useStats.ts # Statistics └── utils/ ├── merge.ts # Data deduplication └── format.ts # Display formatting ``` **Build Process**: ```typescript // esbuild bundles everything into single HTML file esbuild.build({ entryPoints: ['src/ui/viewer/index.tsx'], bundle: true, outfile: 'extension/web/viewer.html', loader: { '.tsx': 'tsx', '.woff2': 'dataurl' }, define: { 'process.env.NODE_ENV': '"production"' }, }); ``` **Why It Matters**: Users can now see exactly what's being captured in real-time, making the memory system transparent and debuggable. ### v5.0.3: Smart Install Caching (October 2025) **The Problem**: `npm install` ran on every SessionStart (2-5 seconds) **The Insight**: Dependencies rarely change between sessions **The Solution**: Version-based caching ```typescript // Check version marker before installing const currentVersion = getPackageVersion(); const installedVersion = readFileSync('.install-version', 'utf-8'); if (currentVersion !== installedVersion) { // Only install if version changed await runNpmInstall(); writeFileSync('.install-version', currentVersion); } ``` **Cached Check Logic**: 1. Does `node_modules` exist? 2. Does `.install-version` match `package.json` version? 3. Is `better-sqlite3` present? (Legacy: now uses bun:sqlite which requires no installation) **Impact**: - SessionStart hook: 2-5 seconds → 10ms (99.5% faster) - Only installs on: first run, version change, missing deps - Better Windows error messages with build tool help ### v5.0.2: Worker Health Checks (October 2025) **What Changed**: More robust worker startup and monitoring **New Features**: ```typescript // Health check endpoint app.get('/health', (req, res) => { res.json({ status: 'ok', uptime: process.uptime(), port: WORKER_PORT, memory: process.memoryUsage(), }); }); // Smart worker startup async function ensureWorkerHealthy() { const healthy = await isWorkerHealthy(1000); if (!healthy) { await startWorker(); await waitForWorkerHealth(10000); } } ``` **Benefits**: - Graceful degradation when worker is down - Auto-recovery from crashes - Better error messages for debugging ### v5.0.1: Stability Improvements (October 2025) **What Changed**: Various bug fixes and stability enhancements **Key Fixes**: - Fixed race conditions in observation queue processing - Improved error handling in SDK worker - Better cleanup of stale worker processes - Enhanced logging for debugging ### v5.0.0: Hybrid Search Architecture (October 2025) **The Evolution**: SQLite FTS5 + Chroma vector search **What We Added**: ``` ┌─────────────────────────────────────────────────────────┐ │ HYBRID SEARCH │ │ │ │ Text Query → SQLite FTS5 (keyword matching) │ │ ↓ │ │ Chroma Vector Search (semantic) │ │ ↓ │ │ Merge + Re-rank Results │ └─────────────────────────────────────────────────────────┘ ``` **New Dependencies**: - `chromadb` - Vector database for semantic search - Python 3.8+ - Required by chromadb **MCP Tools Enhancement**: ```typescript // Chroma-backed semantic search search_observations({ query: "authentication bug", useSemanticSearch: true // Uses Chroma }); // Falls back to FTS5 if Chroma unavailable ``` **Why Hybrid**: - FTS5: Fast keyword matching, no dependencies - Chroma: Semantic understanding, finds related concepts - Graceful degradation: Works without Chroma (FTS5 only) **Trade-offs**: - Added Python dependency (optional) - Increased installation complexity - Better search relevance --- ## MCP Architecture Simplification (December 2025) ### The Problem: Complex MCP Implementation **Before:** ``` 9+ MCP tools registered at session start: - search_observations - find_by_type - find_by_file - find_by_concept - get_recent_context - get_observation - get_session - get_prompt - help Problems: - Overlapping operations (search_observations vs find_by_type) - Complex parameter schemas (~2,500 tokens in tool definitions) - No built-in workflow guidance - High cognitive load for Claude (which tool to use?) - Code size: ~2,718 lines in mcp-server.ts ``` **The Insight:** Progressive disclosure should be built into tool design itself, not something Claude has to remember. ### The Solution: 3-Layer Workflow **After:** ``` 4 MCP tools following 3-layer workflow: 1. __IMPORTANT - Workflow documentation (always visible) "3-LAYER WORKFLOW (ALWAYS FOLLOW): 1. search(query) → Get index with IDs 2. timeline(anchor=ID) → Get context 3. get_observations([IDs]) → Fetch details NEVER fetch full details without filtering first." 2. search - Layer 1: Get index with IDs (~50-100 tokens/result) 3. timeline - Layer 2: Get chronological context 4. get_observations - Layer 3: Fetch full details (~500-1,000 tokens/result) Benefits: - Progressive disclosure enforced by tool structure - No overlapping operations - Simple schemas (additionalProperties: true) - Clear workflow pattern - Code size: ~312 lines in mcp-server.ts (88% reduction) - ~10x token savings ``` ### Migration: Skill-Based Search Removed **Previously:** Used skill-based search - history-search skill invoked via natural language - HTTP API called directly via curl - Progressive disclosure through skill loading - 17 skill documentation files **Now:** Removed skill-based approach - MCP-only architecture - Native MCP protocol (better Claude integration) - Works with both Claude Desktop and Claude Code - Simpler to maintain (no skill files) - All 19 history-search skill files removed (~2,744 lines) ### Key Architectural Changes **MCP Server Refactor:** Before: ```typescript // Complex parameter schemas { name: "search_observations", inputSchema: { type: "object", properties: { query: { type: "string", description: "..." }, type: { type: "array", items: { enum: [...] } }, format: { enum: ["index", "full"] }, limit: { type: "number", minimum: 1, maximum: 100 }, // ... many more parameters } } } ``` After: ```typescript // Simple schemas with workflow guidance { name: "search", description: "Step 1: Search memory. Returns index with IDs.", inputSchema: { type: "object", properties: {}, additionalProperties: true // Accept any parameters } } ``` **Workflow Enforcement:** Before: Claude had to remember progressive disclosure pattern After: Tool structure makes it impossible to skip steps - Can't get details without IDs from search - Can't search without seeing __IMPORTANT reminder - Timeline provides middle ground (context without full details) ### Impact **Token Efficiency:** ``` Traditional: Fetch 20 observations upfront → 10,000-20,000 tokens → Only 2 observations relevant (90% waste) 3-Layer Workflow: → search (20 results): ~1,000-2,000 tokens → Review index, identify 3 relevant IDs → get_observations (3 IDs): ~1,500-3,000 tokens → Total: 2,500-5,000 tokens (50-75% savings) ``` **Code Simplicity:** - MCP server: 2,718 lines → 312 lines (88% reduction) - Removed: 19 skill files (~2,744 lines) - Net reduction: ~5,150 lines of code removed **User Experience:** - Same natural language interaction - Better token efficiency - Clearer architecture - Works identically on Claude Desktop and Claude Code ### Design Philosophy **Progressive Disclosure Through Structure:** The 3-layer workflow embodies progressive disclosure at the architectural level: 1. **Layer 1 (Index)** - "What exists?" - Cheap survey of options 2. **Layer 2 (Timeline)** - "What was happening?" - Context around specific points 3. **Layer 3 (Details)** - "Tell me everything" - Full details only when justified Each layer provides a decision point where Claude can: - Stop if irrelevant - Get more context if uncertain - Dive deep if confident This makes it structurally difficult to waste tokens. --- ## v1-v2: The Naive Approach ### The First Attempt: Dump Everything **Architecture:** ``` PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup ``` **What we learned:** - ❌ Context pollution (thousands of tokens of irrelevant data) - ❌ No compression (raw tool outputs are verbose) - ❌ No search (had to scan everything linearly) - ✅ Proved the concept: Memory across sessions is valuable **Example of what went wrong:** ``` SessionStart loaded: - 150 file read operations - 80 grep searches - 45 bash commands - Total: ~35,000 tokens - Relevant to current task: ~500 tokens (1.4%) ``` --- ## v3: Smart Compression, Wrong Architecture ### The Breakthrough: AI-Powered Compression **New idea:** Use Claude itself to compress observations **Architecture:** ``` PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights ``` **What we added:** 1. **Claude Agent SDK integration** - Use AI to compress observations 2. **Background worker** - Don't block main session 3. **Structured observations** - Extract facts, decisions, insights 4. **Session summaries** - Generate comprehensive summaries **What worked:** - ✅ Compression ratio: 10:1 to 100:1 - ✅ Semantic understanding (not just keyword matching) - ✅ Background processing (hooks stayed fast) - ✅ Search became useful **What didn't work:** - ❌ Still loaded everything upfront - ❌ Session ID management was broken - ❌ Aggressive cleanup interrupted summaries - ❌ Multiple SDK sessions per Claude Code session --- ## The Key Realizations ### Realization 1: Progressive Disclosure **Problem:** Even compressed observations can pollute context if you load them all. **Insight:** Humans don't read everything before starting work. Why should AI? **Solution:** Show an index first, fetch details on-demand. ``` ❌ Old: Load 50 observations (8,500 tokens) ✅ New: Show index of 50 observations (800 tokens) Agent fetches 2-3 relevant ones (300 tokens) Total: 1,100 tokens vs 8,500 tokens ``` **Impact:** - 87% reduction in context usage - 100% relevance (only fetch what's needed) - Agent autonomy (decides what's relevant) ### Realization 2: Session ID Chaos **Problem:** SDK session IDs change on every turn. **What we thought:** ```typescript // ❌ Wrong assumption UserPromptSubmit → Capture session ID once → Use forever ``` **Reality:** ```typescript // ✅ Actual behavior Turn 1: session_abc123 Turn 2: session_def456 Turn 3: session_ghi789 ``` **Why this matters:** - Can't resume sessions without tracking ID updates - Session state gets lost between turns - Observations get orphaned **Solution:** ```typescript // Capture from system init message for await (const msg of response) { if (msg.type === 'system' && msg.subtype === 'init') { sdkSessionId = msg.session_id; await updateSessionId(sessionId, sdkSessionId); } } ``` ### Realization 3: Graceful vs Aggressive Cleanup **v3 approach:** ```typescript // ❌ Aggressive: Kill worker immediately SessionEnd → DELETE /worker/session → Worker stops ``` **Problems:** - Summary generation interrupted mid-process - Pending observations lost - Race conditions everywhere **v4 approach:** ```typescript // ✅ Graceful: Let worker finish SessionEnd → Mark session complete → Worker finishes → Exit naturally ``` **Benefits:** - Summaries complete successfully - No lost observations - Clean state transitions **Code:** ```typescript // v3: Aggressive async function sessionEnd(sessionId: string) { await fetch(`http://localhost:37777/sessions/${sessionId}`, { method: 'DELETE' }); } // v4: Graceful async function sessionEnd(sessionId: string) { await db.run( 'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?', [Date.now(), sessionId] ); } ``` ### Realization 4: One Session, Not Many **Problem:** We were creating multiple SDK sessions per Claude Code session. **What we thought:** ``` Claude Code session → Create SDK session per observation → 100+ SDK sessions ``` **Reality should be:** ``` Claude Code session → ONE long-running SDK session → Streaming input ``` **Why this matters:** - SDK maintains conversation state - Context accumulates naturally - Much more efficient **Implementation:** ```typescript // ✅ Streaming Input Mode async function* messageGenerator(): AsyncIterable<UserMessage> { // Initial prompt yield { role: "user", content: "You are a memory assistant..." }; // Then continuously yield observations while (session.status === 'active') { const observations = await pollQueue(); for (const obs of observations) { yield { role: "user", content: formatObservation(obs) }; } await sleep(1000); } } const response = query({ prompt: messageGenerator(), options: { maxTurns: 1000 } }); ``` --- ## v4: The Architecture That Works ### The Core Design ``` ┌─────────────────────────────────────────────────────────┐ │ CLAUDE CODE SESSION │ │ User → Claude → Tools (Read, Edit, Write, Bash) │ │ ↓ │ │ PostToolUse Hook │ │ (queues observation) │ └─────────────────────────────────────────────────────────┘ ↓ SQLite queue ┌─────────────────────────────────────────────────────────┐ │ SDK WORKER PROCESS │ │ ONE streaming session per Claude Code session │ │ │ │ AsyncIterable<UserMessage> │ │ → Yields observations from queue │ │ → SDK compresses via AI │ │ → Parses XML responses │ │ → Stores in database │ └─────────────────────────────────────────────────────────┘ ↓ SQLite storage ┌─────────────────────────────────────────────────────────┐ │ NEXT SESSION │ │ SessionStart Hook │ │ → Queries database │ │ → Returns progressive disclosure index │ │ → Agent fetches details via MCP │ └─────────────────────────────────────────────────────────┘ ``` ### The Five Hook Architecture <Tabs> <Tab title="SessionStart"> **Purpose:** Inject context from previous sessions **Timing:** When Claude Code starts **What it does:** - Queries last 10 session summaries - Formats as progressive disclosure index - Injects into context via stdout **Key change from v3:** - ✅ Index format (not full details) - ✅ Token counts visible - ✅ MCP search instructions included </Tab> <Tab title="UserPromptSubmit"> **Purpose:** Initialize session tracking **Timing:** Before Claude processes prompt **What it does:** - Creates session record - Saves raw user prompt (v4.2.0+) - Starts worker if needed **Key change from v3:** - ✅ Stores raw prompts for search - ✅ Auto-starts worker service </Tab> <Tab title="PostToolUse"> **Purpose:** Capture tool observations **Timing:** After every tool execution **What it does:** - Enqueues observation in database - Returns immediately **Key change from v3:** - ✅ Just enqueues (doesn't process) - ✅ Worker handles all AI calls </Tab> <Tab title="Summary"> **Purpose:** Generate session summaries **Timing:** Worker-triggered (mid-session) **What it does:** - Gathers observations - Sends to Claude for summarization - Stores structured summary **Key change from v3:** - ✅ Multiple summaries per session - ✅ Summaries are checkpoints, not endings </Tab> <Tab title="SessionEnd"> **Purpose:** Graceful cleanup **Timing:** When session ends **What it does:** - Marks session complete - Lets worker finish processing **Key change from v3:** - ✅ Graceful (not aggressive) - ✅ No DELETE requests - ✅ Worker finishes naturally </Tab> </Tabs> ### Database Schema Evolution **v3 schema:** ```sql -- Simple, flat structure CREATE TABLE observations ( id INTEGER PRIMARY KEY, session_id TEXT, text TEXT, created_at INTEGER ); ``` **v4 schema:** ```sql -- Rich, structured schema CREATE TABLE observations ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, project TEXT NOT NULL, -- Progressive disclosure metadata title TEXT NOT NULL, subtitle TEXT, type TEXT NOT NULL, -- decision, bugfix, feature, etc. -- Content narrative TEXT NOT NULL, facts TEXT, -- JSON array -- Searchability concepts TEXT, -- JSON array of tags files_read TEXT, -- JSON array files_modified TEXT, -- JSON array -- Timestamps created_at TEXT NOT NULL, created_at_epoch INTEGER NOT NULL, FOREIGN KEY(session_id) REFERENCES sdk_sessions(id) ); -- FTS5 for full-text search CREATE VIRTUAL TABLE observations_fts USING fts5( title, subtitle, narrative, facts, concepts, content=observations ); -- Auto-sync triggers CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts) VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts); END; ``` **What changed:** - ✅ Structured fields (title, subtitle, type) - ✅ FTS5 full-text search - ✅ Project-scoped queries - ✅ Rich metadata for progressive disclosure ### Worker Service Redesign **v3 worker:** ```typescript // Multiple short SDK sessions app.post('/process', async (req, res) => { const response = await query({ prompt: buildPrompt(req.body), options: { maxTurns: 1 } }); for await (const msg of response) { // Process single observation } res.json({ success: true }); }); ``` **v4 worker:** ```typescript // ONE long-running SDK session async function runWorker(sessionId: string) { const response = query({ prompt: messageGenerator(), // AsyncIterable options: { maxTurns: 1000 } }); for await (const msg of response) { if (msg.type === 'text') { parseObservations(msg.content); parseSummaries(msg.content); } } } ``` **Benefits:** - Maintains conversation state - SDK handles context automatically - More efficient (fewer API calls) - Natural multi-turn flow --- ## Critical Fixes Along the Way ### Fix 1: Context Injection Pollution (v4.3.1) **Problem:** SessionStart hook output polluted with npm install logs ```bash # Hook output contained: npm WARN deprecated ... npm WARN deprecated ... {"hookSpecificOutput": {"additionalContext": "..."}} ``` **Why it broke:** - Claude Code expects clean JSON or plain text - stderr/stdout from npm install mixed with hook output - Context didn't inject properly **Solution:** ```json { "command": "npm install --loglevel=silent && node context-hook.js" } ``` **Result:** Clean JSON output, context injection works ### Fix 2: Double Shebang Issue (v4.3.1) **Problem:** Hook executables had duplicate shebangs ```javascript #!/usr/bin/env node #!/usr/bin/env node // ← Duplicate! // Rest of code... ``` **Why it happened:** - Source files had shebang - esbuild added another shebang during build **Solution:** ```typescript // Remove shebangs from source files // Let esbuild add them during build ``` **Result:** Clean executables, no parsing errors ### Fix 3: FTS5 Injection Vulnerability (v4.2.3) **Problem:** User input passed directly to FTS5 query ```typescript // ❌ Vulnerable const results = db.query( `SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'` ); ``` **Attack:** ```typescript userQuery = "'; DROP TABLE observations; --" ``` **Solution:** ```typescript // ✅ Safe: Use parameterized queries const results = db.query( 'SELECT * FROM observations_fts WHERE observations_fts MATCH ?', [userQuery] ); ``` ### Fix 4: NOT NULL Constraint Violation (v4.2.8) **Problem:** Session creation failed when prompt was empty ```sql INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...) VALUES ('abc123', NULL, ...) -- ❌ user_prompt is NOT NULL ``` **Solution:** ```typescript // Allow NULL user_prompts user_prompt: input.prompt ?? null ``` **Schema change:** ```sql -- Before user_prompt TEXT NOT NULL -- After user_prompt TEXT -- Nullable ``` --- ## Performance Improvements ### Optimization 1: Prepared Statements **Before:** ```typescript for (const obs of observations) { db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]); } ``` **After:** ```typescript const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`); for (const obs of observations) { stmt.run([obs.id, obs.text, ...]); } stmt.finalize(); ``` **Impact:** 5x faster bulk inserts ### Optimization 2: FTS5 Indexing **Before:** ```typescript // Manual full-text search const results = db.query( `SELECT * FROM observations WHERE text LIKE '%${query}%'` ); ``` **After:** ```typescript // FTS5 virtual table const results = db.query( `SELECT * FROM observations_fts WHERE observations_fts MATCH ?`, [query] ); ``` **Impact:** 100x faster searches on large datasets ### Optimization 3: Index Format Default **Before:** ```typescript // Always return full observations search_observations({ query: "hooks" }); // Returns: 5,000 tokens ``` **After:** ```typescript // Default to index format search_observations({ query: "hooks", format: "index" }); // Returns: 200 tokens // Fetch full only when needed search_observations({ query: "hooks", format: "full", limit: 1 }); // Returns: 150 tokens ``` **Impact:** 25x reduction in average search result size --- ## What We Learned ### Lesson 1: Context is Precious **Principle:** Every token you put in context window costs attention. **Application:** - Progressive disclosure reduces waste by 87% - Index-first approach gives agent control - Token counts make costs visible ### Lesson 2: Session State is Complicated **Principle:** Distributed state is hard. SDK handles it better than we can. **Application:** - Use SDK's built-in session resumption - Don't try to manually reconstruct state - Track session IDs from init messages ### Lesson 3: Graceful Beats Aggressive **Principle:** Let processes finish their work before terminating. **Application:** - Graceful cleanup prevents data loss - Workers finish important operations - Clean state transitions reduce bugs ### Lesson 4: AI is the Compressor **Principle:** Don't compress manually. Let AI do semantic compression. **Application:** - 10:1 to 100:1 compression ratios - Semantic understanding, not keyword extraction - Structured outputs (XML parsing) ### Lesson 5: Progressive Everything **Principle:** Show metadata first, fetch details on-demand. **Application:** - Progressive disclosure in context injection - Index format in search results - Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details) --- ## The Road Ahead ### Planned: Adaptive Index Size ```typescript SessionStart({ source: "startup" }): → Show last 10 sessions (normal) SessionStart({ source: "resume" }): → Show only current session (minimal) SessionStart({ source: "compact" }): → Show last 20 sessions (comprehensive) ``` ### Planned: Relevance Scoring ```typescript // Use embeddings to pre-sort index by semantic relevance search_observations({ query: "authentication bug", sort: "relevance" // Based on embeddings }); ``` ### Planned: Multi-Project Context ```typescript // Cross-project pattern recognition search_observations({ query: "API rate limiting", projects: ["api-gateway", "user-service", "billing-service"] }); ``` ### Planned: Collaborative Memory ```typescript // Team-shared observations (optional) createObservation({ title: "Rate limit: 100 req/min", scope: "team" // vs "user" }); ``` --- ## Migration Guide: v3 → v5 ### Step 1: Backup Database ```bash cp ~/.claude-recall/claude-recall.db ~/.claude-recall/claude-recall-v3-backup.db ``` ### Step 2: Update Plugin ```bash cd ~/.claude/plugins/marketplaces/nhevers git pull ``` ### Step 3: Update Plugin ```bash /plugin update claude-recall ``` **What happens automatically:** - Dependencies update (including new ones like chromadb for v5.0.0+) - Database schema migrations run automatically - Worker service restarts with new code - Smart install caching activates (v5.0.3+) ### Step 4: Test ```bash # Start Claude Code claude # Check that context is injected # (Should see progressive disclosure index with v5 viewer link) # Open viewer UI (v5.1.0+) open http://localhost:37777 # Submit a prompt and watch real-time updates in viewer ``` ### Step 5: Explore New Features ```bash # View memory stream in browser (v5.1.0+) open http://localhost:37777 # Toggle theme (v5.1.2+) # Click theme button in viewer header # Check worker health npm run worker:status curl http://localhost:37777/health ``` --- ## Key Metrics ### v3 Performance | Metric | Value | |--------|-------| | Context usage per session | ~25,000 tokens | | Relevant context | ~2,000 tokens (8%) | | Hook execution time | ~200ms | | Search latency | ~500ms (LIKE queries) | ### v4 Performance | Metric | Value | |--------|-------| | Context usage per session | ~1,100 tokens | | Relevant context | ~1,100 tokens (100%) | | Hook execution time | ~45ms | | Search latency | ~15ms (FTS5) | ### v5 Performance | Metric | Value | |--------|-------| | Context usage per session | ~1,100 tokens | | Relevant context | ~1,100 tokens (100%) | | Hook execution time | ~10ms (cached install) | | Search latency | ~12ms (FTS5) or ~25ms (hybrid) | | Viewer UI load time | ~50ms (bundled HTML) | | SSE update latency | ~5ms (real-time) | **v3 → v4 Improvements:** - 96% reduction in context waste - 12x increase in relevance - 4x faster hooks - 33x faster search **v4 → v5 Improvements:** - 78% faster hooks (smart caching) - Real-time visualization (viewer UI) - Better search relevance (hybrid) - Enhanced UX (theme toggle, persistence) --- ## Conclusion The journey from v3 to v5 was about understanding these fundamental truths: 1. **Context is finite** - Progressive disclosure respects attention budget 2. **AI is the compressor** - Semantic understanding beats keyword extraction 3. **Agents are smart** - Let them decide what to fetch 4. **State is hard** - Use SDK's built-in mechanisms 5. **Graceful wins** - Let processes finish cleanly The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time. **v5 adds visibility**: Now users CAN see the memory system working if they want (via viewer UI), but it's still non-intrusive. --- ## Further Reading - [Progressive Disclosure](progressive-disclosure) - The philosophy behind v4 - [Hooks Architecture](hooks-architecture) - How hooks power the system - [Context Engineering](context-engineering) - Foundational principles - [Worker Service](/architecture/worker-service) - Real-time visualization (v5.1.0+) --- *This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v5 is the architecture that emerged from understanding what actually works - and making it visible to users.*

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nhevers/claude-recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

architecture-evolution.mdx•32 KiB