MCP Memory Service - TypeScript

ASYNC_EMBEDDING_OPTIMIZATION.md•9.23 KiB

# Async Embedding Optimization ## Overview The async embedding optimization reduces perceived latency for memory creation operations by **90-95%** (from ~500-2000ms to ~50ms) by deferring embedding generation to a background queue. ## Problem Statement Prior to this optimization: - Memory creation blocked waiting for OpenAI API response (500-2000ms) - Users experienced slow response times - High latency even though embedding isn't needed immediately - Background queue existed but wasn't used as primary strategy ## Solution Added three embedding generation modes: 1. **`'sync'`** (default, backward compatible) - Generate embedding before returning 2. **`'async'`** (optimized) - Queue embedding generation, return immediately 3. **`false`** (disabled) - Skip embedding generation entirely ## Performance Impact | Mode | Response Time | Use Case | |------|--------------|----------| | Sync | 500-2000ms | When semantic search is needed immediately | | Async | ~50ms | Standard operations (recommended) | | Disabled | ~30ms | When embeddings are not needed | **Speedup: 10-40x faster response time with async mode** ## Usage ### TypeScript API ```typescript import { MemoryCore } from 'mcp-memory-ts'; const memoryCore = new MemoryCore(db, embeddings, userEmail); // Option 1: Async mode (recommended, fast) await memoryCore.addMemory('Title', 'Content', MemoryType.MEMORY, { generateEmbedding: 'async', // ~50ms response tags: ['example'], }); // Option 2: Sync mode (backward compatible, slow) await memoryCore.addMemory('Title', 'Content', MemoryType.MEMORY, { generateEmbedding: true, // or 'sync', ~500-2000ms response tags: ['example'], }); // Option 3: Disabled (no embedding) await memoryCore.addMemory('Title', 'Content', MemoryType.MEMORY, { generateEmbedding: false, // ~30ms response tags: ['example'], }); ``` ### MCP Server (Claude Desktop) The MCP server now uses async mode by default: ```typescript // In desktop-mcp-server.ts await this.memoryCore.addMemory(title, content, type, { generateEmbedding: 'async', // Default changed to async }); ``` Users will see: ``` ✅ Memory stored successfully! 🔄 embedding queued (generating in background) ID: 12345 Content: Your memory content Type: personal Importance: 0.5 ``` ## How It Works ### Sync Mode (Traditional) ``` 1. Create memory data → 2. Call OpenAI API (500-2000ms) → 3. Save to DB → 4. Return Total: ~500-2000ms ``` ### Async Mode (Optimized) ``` 1. Create memory data → 2. Save to DB → 3. Return ↓ (background) 4. Queue embedding update Total: ~50ms (user waits) Background: Embedding generated asynchronously ``` ## Implementation Details ### Type Definitions ```typescript // src/types/base.ts export interface AddMemoryOptions { generateEmbedding?: boolean | 'sync' | 'async'; } ``` ### Core Logic ```typescript // src/core/memory-core.ts private normalizeEmbeddingOption( option?: boolean | 'sync' | 'async' ): 'sync' | 'async' | 'disabled' { if (option === false) return 'disabled'; if (option === 'async') return 'async'; return 'sync'; // Default for backward compatibility } ``` ### Background Processing The embedding updater service handles queued embeddings: - Processes in batches (default: 10 items) - Automatic retry on failure (default: 3 attempts) - Rate limiting to avoid API limits - Monitors for missing embeddings periodically ## Backward Compatibility ✅ **Fully backward compatible** - existing code continues to work: ```typescript // All these work identically to before: addMemory(title, content) // Default is sync addMemory(title, content, type, { generateEmbedding: true }) addMemory(title, content, type, { generateEmbedding: false }) ``` ## Search Behavior ### With Async Embeddings 1. **Immediate text search**: Memory is searchable by text immediately 2. **Background embedding**: Generated within seconds (typically < 5s) 3. **Semantic search**: Available once embedding completes 4. **No user action required**: Automatic upgrade to semantic search ### Example Timeline ``` t=0ms: Memory created, text search available t=50ms: API returns success to user t=2000ms: Embedding generated in background t=2001ms: Semantic search now available ``` ## Monitoring and Debugging ### Response Data ```typescript { status: 'success', message: 'Memory added successfully', data: { id: '12345', title: 'My Memory', hasEmbedding: false, // True if sync, false if async embeddingQueued: true // True if queued for background processing } } ``` ### Logs ``` [MemoryCore] 🚀 Async mode: Creating memory "Title" without embedding (will be queued) [MemoryCore] 🔄 Queuing memory 12345 for embedding update (async mode) [EmbeddingUpdater] Updated embedding for memory 12345 ``` ### Debug Mode ```bash MCP_DEBUG=1 mcp-memory server ``` Shows detailed timing and queue information. ## Configuration ### Embedding Updater Options ```typescript const embeddingUpdater = new EmbeddingUpdater(db, apiKey, { batchSize: 10, // Process 10 at a time retryAttempts: 3, // Retry failed embeddings 3 times retryDelay: 1000, // Wait 1s between retries }); ``` ### Enable Monitoring ```bash # .env or config ENABLE_EMBEDDING_MONITOR=true EMBEDDING_MONITOR_INTERVAL=60000 # Check every 60s ``` ## Best Practices ### When to Use Each Mode **Async Mode (Recommended):** - Standard memory creation - Interactive user operations - When low latency is important - Most MCP tool calls **Sync Mode:** - When semantic search is needed immediately - Batch operations where latency doesn't matter - Testing and development **Disabled Mode:** - Temporary/ephemeral data - When semantic search is never needed - High-volume operations where embeddings are wasteful ### Recommendations 1. **Default to async**: Use async mode for 95% of operations 2. **Monitor queue**: Ensure embedding updater is running 3. **Check completion**: Use logs to verify embeddings complete 4. **Fallback gracefully**: Text search works immediately if embedding fails ## Migration Guide ### For Existing Code No changes required! Existing code continues to work: ```typescript // Before: Sync by default await addMemory(title, content); // After: Still sync by default (backward compatible) await addMemory(title, content); ``` ### To Adopt Async Mode Simply add the option: ```typescript // Change this: await addMemory(title, content, type, { generateEmbedding: true }); // To this: await addMemory(title, content, type, { generateEmbedding: 'async' }); ``` ### For MCP Server The default has been changed in `desktop-mcp-server.ts`: ```typescript // Old: generateEmbedding: true // New: generateEmbedding: 'async' ``` ## Testing ### Unit Tests ```typescript it('should queue embedding in async mode', async () => { const result = await memoryCore.addMemory('Test', 'Content', MemoryType.MEMORY, { generateEmbedding: 'async' }); expect(result.status).toBe('success'); expect(result.data.hasEmbedding).toBe(false); expect(result.data.embeddingQueued).toBe(true); }); ``` ### Integration Tests ```bash npm test tests/integration/memory-core.test.ts ``` ### Performance Tests Run the demo to see actual timings: ```bash npm run build node examples/async-embedding-demo.js ``` ## Troubleshooting ### Embeddings Not Generated **Symptoms**: `embeddingQueued: true` but embeddings never appear **Solutions**: 1. Check embedding updater is initialized 2. Verify OpenAI API key is valid 3. Check logs for queue processing errors 4. Ensure monitor is enabled: `ENABLE_EMBEDDING_MONITOR=true` ### Slow Background Processing **Symptoms**: Embeddings take minutes to generate **Solutions**: 1. Reduce batch size if hitting rate limits 2. Increase monitor interval if too frequent 3. Check OpenAI API quota and limits 4. Review retry settings (may be waiting on failures) ### Memory Not Found in Semantic Search **Symptoms**: Text search works but semantic search doesn't find memory **Solutions**: 1. Wait a few seconds for embedding to complete 2. Check if embedding was generated: `hasEmbedding` field 3. Lower similarity threshold: `threshold: 0.3` 4. Verify embedding updater is running ## Future Enhancements Potential improvements: - Priority queue for important memories - Batch embedding generation for multiple memories - Configurable per-user embedding strategies - Real-time embedding status API - Webhook notifications when embeddings complete ## References - [Embedding Service Implementation](/Users/masa/Projects/managed/mcp-memory-ts/src/utils/embeddings.ts) - [Memory Core Implementation](/Users/masa/Projects/managed/mcp-memory-ts/src/core/memory-core.ts) - [Embedding Updater Service](/Users/masa/Projects/managed/mcp-memory-ts/src/services/embedding-updater.ts) - [Type Definitions](/Users/masa/Projects/managed/mcp-memory-ts/src/types/base.ts) ## Version History - **v1.6.0** (2025-10-08): Initial implementation of async embedding optimization - Added `'async'` mode to `generateEmbedding` option - Changed MCP server default to async mode - Added `embeddingQueued` to response data - Maintained full backward compatibility --- **Performance Improvement: 10-40x faster response time** **Backward Compatible: 100%** **Recommended: Use async mode for all interactive operations**

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bobmatnyc/mcp-memory-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ASYNC_EMBEDDING_OPTIMIZATION.md•9.23 KiB