MCP Memory Service - TypeScript

mcp-memory-ts
docs
development

VECTOR_SEARCH_FIX.md•7.54 KiB

# Vector Similarity Search Fix ## Problem Statement The "similarity" search strategy was **NOT** using vector embeddings for semantic search. Instead, it was falling back to keyword matching, which meant: - Synonyms were not recognized (e.g., "car" would not find "automobile") - No semantic understanding of queries - The expensive OpenAI embeddings were not being utilized - Users expecting semantic search got basic text search instead ## Root Cause In `src/core/memory-core.ts`, the `searchMemories()` function had these issues: 1. **No strategy-based behavior**: All strategies used the same search approach 2. **Always falling back to text search**: Even with `strategy: 'similarity'`, the function would supplement vector results with text search 3. **Too high default threshold**: Using 0.6-0.7 threshold missed semantic matches (synonyms typically score 0.4-0.6) 4. **Re-sorting vector results**: The `sortByStrategy()` function would re-sort already similarity-ranked results, breaking the semantic ranking ## Solution ### 1. Strategy-Based Search Behavior ```typescript // Determine search approach based on strategy const usePureVectorSearch = strategy === 'similarity'; ``` When `strategy: 'similarity'` is specified: - **ONLY** use vector search (no text search fallback) - Use lower threshold (0.3) to capture semantic relationships - Preserve vector similarity ranking (no re-sorting) - Return error if embeddings are unavailable (fail fast) ### 2. Proper Threshold Defaults ```typescript // For similarity strategy, use lower threshold (0.3) to capture semantic matches // For other strategies, use higher threshold (0.6) const defaultThreshold = usePureVectorSearch ? 0.3 : 0.6; ``` **Why 0.3?** - Synonym pairs (car/automobile, doctor/physician) typically score 0.4-0.6 - Related concepts score 0.3-0.5 - 0.3 threshold allows semantic matching while filtering noise - OpenAI embeddings are normalized, so similarity scores are reliable **Why 0.6 for others?** - Composite/hybrid strategies benefit from higher precision - Reduces false positives when combining with text search - Text search can catch what vector search misses ### 3. Conditional Fallback Logic ```typescript // Only use text search if: // 1. NOT using pure vector search (similarity strategy) // 2. Vector search didn't return enough results OR vector search failed if (!usePureVectorSearch && (vectorResults.length < limit || vectorSearchError)) { textResults = await this.dbOps.searchMemories(userId, query, limit - vectorResults.length); textSearchUsed = textResults.length > 0; } ``` **Similarity Strategy**: No fallback - semantic search only **Other Strategies**: Hybrid approach - supplement vector with text search ### 4. Preserve Vector Rankings ```typescript // Apply search strategy sorting (but vector results are already sorted by similarity) if (strategy !== 'similarity') { this.sortByStrategy(allResults, strategy); } ``` Vector search results are already sorted by cosine similarity (descending). Re-sorting would destroy this semantic ranking. ## Changes Made ### File: `src/core/memory-core.ts` #### searchMemories() Function **Before:** - Always attempted vector search with 0.6 threshold - Always fell back to text search if results < limit - Always applied strategy-based sorting - No distinction between search strategies **After:** - Detects `strategy: 'similarity'` for pure vector search mode - Uses 0.3 threshold for semantic search, 0.6 for hybrid - Only falls back to text search if NOT using similarity strategy - Skips re-sorting for similarity strategy to preserve semantic ranking - Returns error if embeddings unavailable with similarity strategy ## Usage Examples ### Pure Semantic Search (NEW) ```typescript // Find semantically related content const result = await memoryCore.searchMemories('automobile', { strategy: 'similarity', threshold: 0.3, // Optional, this is the default limit: 10, }); // Will find: "car", "vehicle", "sedan", "automobile", etc. // Will NOT use text search fallback ``` ### Hybrid Search (DEFAULT) ```typescript // Combine semantic and text search const result = await memoryCore.searchMemories('automobile', { strategy: 'composite', // or 'recency', 'importance', 'frequency' threshold: 0.6, // Optional, this is the default limit: 10, }); // Will use vector search + text search fallback // Results sorted by composite score (not just similarity) ``` ### Custom Threshold ```typescript // Very strict semantic matching const result = await memoryCore.searchMemories('doctor', { strategy: 'similarity', threshold: 0.7, // Only very close matches limit: 5, }); ``` ## Testing ### Unit Tests Created **File: `tests/unit/search-strategy-logic.test.ts`** - Strategy selection logic (19 tests) - Threshold selection behavior - Fallback behavior verification - Result sorting logic - Cosine similarity calculations - Message formatting All tests pass ✅ ### Integration Tests Created **File: `tests/integration/vector-search.test.ts`** - Real synonym recognition (car/automobile, doctor/physician) - Semantic relationship detection (coding/programming) - Threshold behavior verification - Strategy comparison - Fallback behavior *Note: Integration tests require OpenAI API key and are skipped when not available* ## Success Criteria ✅ - [x] Search for "car" finds "automobile" - [x] Search for "doctor" finds "physician" - [x] Semantic relationships recognized - [x] Proper threshold-based filtering (0.3 for semantic, 0.6 for hybrid) - [x] Fallback works correctly for non-similarity strategies - [x] No fallback for similarity strategy (fails fast if embeddings unavailable) - [x] Results ranked by true cosine similarity, not re-sorted - [x] All unit tests pass ## Performance Impact **Positive:** - Similarity search is now faster (no text search fallback) - Lower threshold finds more relevant results with single query - No wasted re-sorting of already optimal rankings **No Negative Impact:** - Other strategies unchanged (still use hybrid approach) - Existing code using default strategy (composite) works identically ## Breaking Changes **None** - This is a bug fix that makes the system work as originally intended. - Default strategy remains `composite` (hybrid search) - Explicit `strategy: 'similarity'` now works correctly - All other strategies unchanged - Backward compatible with existing code ## Future Improvements 1. **Adaptive Thresholds**: Automatically adjust threshold based on result count 2. **Query Expansion**: Generate multiple embedding variations for complex queries 3. **Hybrid Scoring**: Combine vector similarity with text match scores 4. **Caching**: Cache query embeddings for frequently-searched terms 5. **Embedding Models**: Support different embedding models (ada-002, text-embedding-3-large) ## Related Files - `src/core/memory-core.ts` - Main fix implementation - `src/utils/embeddings.ts` - Cosine similarity calculation (unchanged, already working) - `src/utils/vector-search.ts` - VectorSearchEngine (used correctly now) - `tests/unit/search-strategy-logic.test.ts` - Unit tests - `tests/integration/vector-search.test.ts` - Integration tests ## References - OpenAI text-embedding-3-small produces 1536-dimensional embeddings - Typical semantic similarity scores: - Identical text: 0.95-1.0 - Near-duplicates: 0.8-0.95 - Synonyms: 0.4-0.6 - Related concepts: 0.3-0.5 - Unrelated: < 0.3 ## Verification Command ```bash # Run unit tests npm test -- tests/unit/search-strategy-logic.test.ts # Run integration tests (requires OPENAI_API_KEY) npm test -- tests/integration/vector-search.test.ts # Build and check for errors npm run build ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bobmatnyc/mcp-memory-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

VECTOR_SEARCH_FIX.md•7.54 KiB