MCP Memory Service - TypeScript

mcp-memory-ts
docs
development

SEMANTIC-SEARCH-FINDINGS.md•11.5 KiB

# Semantic Search Investigation - Findings & Fixes ## Summary **Investigation Result: ✅ Vector embeddings ARE being used correctly** The MCP Memory TypeScript system implements semantic search properly using OpenAI embeddings and cosine similarity. The perceived issue was due to a high similarity threshold (0.7) that filtered out borderline semantic matches and silent fallback to text search that masked failures. --- ## What Was Investigated 1. **searchMemories method** in `/Users/masa/Projects/managed/mcp-memory-ts/src/core/memory-core.ts` 2. **vectorSearchMemories implementation** (lines 599-646) 3. **Cosine similarity calculation** in `/Users/masa/Projects/managed/mcp-memory-ts/src/utils/embeddings.ts` 4. **findMostSimilar algorithm** (lines 126-142) 5. **Database text search** in `/Users/masa/Projects/managed/mcp-memory-ts/src/database/operations.ts` --- ## Key Findings ### ✅ Semantic Search IS Working **Evidence:** 1. **Query embeddings are generated** (line 264): ```typescript const queryEmbedding = await this.embeddings.generateEmbedding(query); ``` 2. **Vector search happens FIRST** (line 266): ```typescript vectorResults = await this.vectorSearchMemories(userId, queryEmbedding, { threshold: options.threshold || 0.7, limit, }); ``` 3. **Cosine similarity is calculated correctly** (embeddings.ts, lines 71-121): ```typescript static cosineSimilarity(a: number[], b: number[]): number { let dotProduct = 0, normA = 0, normB = 0; for (let i = 0; i < a.length; i++) { dotProduct += a[i] * b[i]; normA += a[i] * a[i]; normB += b[i] * b[i]; } return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB)); } ``` 4. **Results are sorted by similarity** (embeddings.ts, lines 137-138): ```typescript .sort((a, b) => b.similarity - a.similarity) ``` ### ⚠️ Why It Appeared Broken 1. **High Threshold (0.7)**: Semantic matches scoring below 0.7 were filtered out - "artificial intelligence" vs "machine learning" might score 0.65 - Result: No matches returned, falls back to text search 2. **Silent Fallback**: Vector search failures were not visible ```typescript try { vectorResults = await this.vectorSearchMemories(...); } catch (error) { console.error('Vector search failed, falling back to text search:', error); // Users see text results but don't know vector search failed } ``` 3. **No Result Metadata**: Users couldn't see: - Whether results came from vector or text search - Similarity scores - Embedding coverage percentage 4. **Text Search Uses Keywords**: Database fallback uses SQL LIKE ```sql WHERE LOWER(title) LIKE LOWER(?) OR LOWER(content) LIKE LOWER(?) ``` This is pure keyword matching, reinforcing the perception that semantic search wasn't working. --- ## Fixes Applied ### 1. Lower Default Threshold (0.7 → 0.6) **File:** `src/core/memory-core.ts`, line 267 **Change:** ```typescript // Before threshold: options.threshold || 0.7 // After threshold: options.threshold || 0.6 // Lower threshold for better semantic recall ``` **Rationale:** 0.6 provides better recall for semantic matches while maintaining reasonable precision. **Impact:** More semantically related results will be returned. ### 2. Enhanced Search Result Messages **File:** `src/core/memory-core.ts`, lines 313-325 **Before:** ```typescript message: `Found ${allResults.length} memories` ``` **After:** ```typescript let message = `Found ${allResults.length} memories`; if (vectorSearchUsed && textSearchUsed) { message += ` (${vectorResults.length} via semantic search, ${textResults.length} via text search)`; } else if (vectorSearchUsed) { message += ` (semantic search)`; } else if (textSearchUsed) { message += ` (text search)`; } if (vectorSearchError) { message += ` [Vector search error: ${vectorSearchError}]`; } ``` **Example Output:** - `"Found 5 memories (3 via semantic search, 2 via text search)"` - `"Found 8 memories (semantic search)"` - `"Found 2 memories (text search) [Vector search error: OpenAI API limit exceeded]"` **Impact:** Users can now see exactly how their results were obtained. ### 3. Vector Search Health Monitoring **File:** `src/core/memory-core.ts`, lines 825-833 **Added to statistics endpoint:** ```typescript vectorSearchHealth: { enabled: this.embeddings !== null && (this.embeddings as any).openai !== null, memoriesWithValidEmbeddings: memoriesWithEmbeddings.length, memoriesWithoutEmbeddings: memories.length - memoriesWithEmbeddings.length, coveragePercentage: embeddingCoverage, recommendation: embeddingCoverage < 90 ? 'Consider running updateMissingEmbeddings() to improve semantic search coverage' : 'Vector search coverage is healthy', } ``` **Example Output:** ```json { "totalMemories": 50, "memoriesWithEmbeddings": 48, "embeddingCoverage": "96%", "vectorSearchHealth": { "enabled": true, "memoriesWithValidEmbeddings": 48, "memoriesWithoutEmbeddings": 2, "coveragePercentage": 96, "recommendation": "Vector search coverage is healthy" } } ``` **Impact:** Users can monitor embedding coverage and diagnose issues. ### 4. Debug Logging **File:** `src/core/memory-core.ts`, lines 273-275, 294-296 **Added:** ```typescript if (process.env.MCP_DEBUG) { console.log(`[SearchMemories] Vector search returned ${vectorResults.length} results`); console.log(`[SearchMemories] Text search returned ${textResults.length} results`); } ``` **Usage:** ```bash export MCP_DEBUG=1 npm run mcp-server ``` **Impact:** Developers can troubleshoot search behavior in real-time. --- ## Verification ### Created Test Files 1. **Comprehensive Test Suite** - File: `/Users/masa/Projects/managed/mcp-memory-ts/tests/semantic-search-verification.test.ts` - Tests semantic matching, ranking, and non-keyword queries - Requires real OpenAI API key to run 2. **Manual Test Script** - File: `/Users/masa/Projects/managed/mcp-memory-ts/scripts/test-semantic-search.ts` - Interactive demonstration of semantic search - Creates test memories and shows search results **Run Manual Test:** ```bash export MCP_DEBUG=1 npx tsx scripts/test-semantic-search.ts ``` **Expected Output:** ``` 🔬 Semantic Search Verification Script ============================================================ ✓ Environment variables configured ✓ Database connected ✓ Memory core initialized ✓ Created test user STEP 1: Creating test memories with embeddings ============================================================ ✓ Created memory: "Machine Learning Research" ✓ Created memory: "Coffee Brewing Technique" STEP 2: Verifying embeddings were generated ============================================================ ✓ ML memory has embedding with 1536 dimensions ✓ Coffee memory has embedding with 1536 dimensions TEST 1: Search for "artificial intelligence" Expected: ML memory should rank first (semantic similarity) ============================================================ Status: success Message: Found 2 memories (2 via semantic search) Found 2 results: 1. Machine Learning Research ← ML (Expected) Content: Working on deep neural networks and artificial intelligence... Tags: ["ml","ai","deep-learning"] 2. Coffee Brewing Technique ← Coffee Content: Pour-over coffee method: Heat water to 200°F... Tags: ["coffee","brewing","recipe"] ✅ SUCCESS: Semantic search is working! ML memory ranked first. ``` ### Documentation Created comprehensive analysis: - File: `/Users/masa/Projects/managed/mcp-memory-ts/docs/SEMANTIC-SEARCH-ANALYSIS.md` - Detailed code analysis - Mathematical verification of cosine similarity - Recommendations for users and developers --- ## How to Use Enhanced Features ### 1. Monitor Vector Search Health ```typescript const statsResult = await memoryCore.getStatistics(userId); if (statsResult.status === 'success') { const health = statsResult.data.vectorSearchHealth; console.log(`Vector search enabled: ${health.enabled}`); console.log(`Embedding coverage: ${health.coveragePercentage}%`); console.log(`Recommendation: ${health.recommendation}`); } ``` ### 2. Adjust Search Threshold ```typescript // More relaxed threshold for broader semantic matching const results = await memoryCore.searchMemories('artificial intelligence', { threshold: 0.5, // Lower = more results limit: 20 }); // Stricter threshold for precise matches const results = await memoryCore.searchMemories('artificial intelligence', { threshold: 0.8, // Higher = fewer, more precise results limit: 10 }); ``` ### 3. Enable Debug Logging ```bash # In terminal export MCP_DEBUG=1 # Or in .env file MCP_DEBUG=1 ``` ### 4. Update Missing Embeddings ```typescript // Check coverage const stats = await memoryCore.getStatistics(userId); console.log(`Coverage: ${stats.data.embeddingCoverage}`); // Update if needed if (stats.data.vectorSearchHealth.coveragePercentage < 90) { const updateResult = await memoryCore.updateMissingEmbeddings(); console.log(updateResult.message); } ``` --- ## Recommendations ### For Users 1. **Check statistics regularly** to monitor embedding coverage 2. **Enable debug mode** if search results seem unexpected 3. **Adjust threshold** based on your needs (0.5-0.7 recommended) 4. **Run embedding updates** when adding many new memories ### For Developers 1. **Monitor vector search health** in production metrics 2. **Log search method usage** (semantic vs text search ratios) 3. **Consider A/B testing** different threshold values 4. **Add similarity scores** to UI for transparency ### Future Enhancements 1. **Hybrid scoring**: Combine semantic similarity with recency and importance 2. **Query expansion**: Use synonyms and related terms automatically 3. **Result explanations**: Show why each result matched 4. **Semantic clustering**: Group similar memories 5. **Caching**: Cache query embeddings for common searches --- ## Conclusion **Vector embeddings are working as designed.** The investigation revealed: ✅ **Correct Implementation:** - Embeddings generated with OpenAI - Cosine similarity calculated properly - Vector search happens before text search - Results sorted by semantic similarity ⚠️ **Perceived Issues:** - High threshold (0.7) filtered borderline matches - Silent fallback masked failures - No visibility into search method used ✅ **Fixes Applied:** - Lower default threshold (0.6) - Enhanced result messages - Vector search health monitoring - Debug logging **Next Steps:** 1. Run `npx tsx scripts/test-semantic-search.ts` to see it in action 2. Review `/Users/masa/Projects/managed/mcp-memory-ts/docs/SEMANTIC-SEARCH-ANALYSIS.md` for details 3. Enable `MCP_DEBUG=1` in production to monitor behavior 4. Check statistics regularly to ensure healthy embedding coverage --- ## Files Modified 1. `/Users/masa/Projects/managed/mcp-memory-ts/src/core/memory-core.ts` - Lowered default threshold to 0.6 - Enhanced search result messages - Added vector search health to statistics - Added debug logging ## Files Created 1. `/Users/masa/Projects/managed/mcp-memory-ts/tests/semantic-search-verification.test.ts` - Comprehensive test suite for semantic search 2. `/Users/masa/Projects/managed/mcp-memory-ts/scripts/test-semantic-search.ts` - Manual test script for demonstration 3. `/Users/masa/Projects/managed/mcp-memory-ts/docs/SEMANTIC-SEARCH-ANALYSIS.md` - Detailed technical analysis 4. `/Users/masa/Projects/managed/mcp-memory-ts/SEMANTIC-SEARCH-FINDINGS.md` - This summary document --- **Investigation completed by Claude Code on 2025-10-01**

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bobmatnyc/mcp-memory-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SEMANTIC-SEARCH-FINDINGS.md•11.5 KiB