MCP Memory Service

Overview Schema Related Servers Score Discussions

memory-quality-guide.md•15.5 KiB

# Memory Quality System Guide > **Version**: 8.45.0 > **Status**: Production Ready > **Feature**: Memento-Inspired Quality System (Issue #260) ## Overview The **Memory Quality System** transforms MCP Memory Service from static storage to a learning memory system. It automatically evaluates memory quality using AI-driven scoring and uses these scores to improve retrieval precision, consolidation efficiency, and overall system intelligence. ### Key Benefits - ✅ **40-70% improvement** in retrieval precision (top-5 useful rate: 50% → 70-85%) - ✅ **Zero cost** with local SLM (privacy-preserving, offline-capable) - ✅ **Smarter consolidation** - Preserve high-quality memories longer - ✅ **Quality-boosted search** - Prioritize best memories in results - ✅ **Automatic learning** - System improves from usage patterns ## How It Works ### Multi-Tier AI Scoring (Local-First) The system evaluates memory quality (0.0-1.0 score) using a multi-tier fallback chain: | Tier | Provider | Cost | Latency | Privacy | Default | |------|----------|------|---------|---------|---------| | **1** | **Local SLM (ONNX)** | **$0** | **50-100ms** | ✅ Full | ✅ Yes | | 2 | Groq API | ~$0.30/mo | 900ms | ❌ External | ❌ Opt-in | | 3 | Gemini API | ~$0.40/mo | 2000ms | ❌ External | ❌ Opt-in | | 4 | Implicit Signals | $0 | 10ms | ✅ Full | Fallback | **Default setup**: Local SLM only (zero cost, full privacy, no external API calls) ### Quality Score Components ``` quality_score = ( local_slm_score × 0.50 + # Cross-encoder evaluation implicit_signals × 0.50 # Usage patterns ) implicit_signals = ( access_frequency × 0.40 + # How often retrieved recency × 0.30 + # When last accessed retrieval_ranking × 0.30 # Average position in results ) ``` ### Local SLM (Tier 1 - Primary) **Model**: `ms-marco-MiniLM-L-6-v2` (23MB) **Architecture**: Cross-encoder (processes query + memory together) **Performance**: - CPU: 50-100ms per evaluation - GPU (CUDA/MPS/DirectML): 10-20ms per evaluation **Scoring Process**: 1. Tokenize: `[CLS] query [SEP] memory [SEP]` 2. Run ONNX inference (local, private) 3. Return relevance score 0.0-1.0 **GPU Acceleration** (automatic): - CUDA (NVIDIA) - CoreML/MPS (Apple Silicon) - DirectML (Windows) - CPU fallback (always works) ## Installation & Setup ### 1. Basic Setup (Local SLM Only) **Zero configuration required** - The quality system works out of the box with local SLM: ```bash # Install MCP Memory Service (if not already installed) pip install mcp-memory-service # Quality system is enabled by default with local SLM # No API keys needed, no external calls ``` ### 2. Optional: Cloud APIs (Opt-In) If you want cloud-based scoring (Groq or Gemini): ```bash # Enable Groq API (fast, cheap) export GROQ_API_KEY="your-groq-api-key" export MCP_QUALITY_AI_PROVIDER=groq # or "auto" to try all tiers # Enable Gemini API (Google) export GOOGLE_API_KEY="your-gemini-api-key" export MCP_QUALITY_AI_PROVIDER=gemini ``` ### 3. Configuration Options ```bash # Quality System Core export MCP_QUALITY_SYSTEM_ENABLED=true # Default: true export MCP_QUALITY_AI_PROVIDER=local # local|groq|gemini|auto|none # Local SLM Configuration (Tier 1) export MCP_QUALITY_LOCAL_MODEL=ms-marco-MiniLM-L-6-v2 # Model name export MCP_QUALITY_LOCAL_DEVICE=auto # auto|cpu|cuda|mps|directml # Quality-Boosted Search (Opt-In) export MCP_QUALITY_BOOST_ENABLED=false # Default: false (opt-in) export MCP_QUALITY_BOOST_WEIGHT=0.3 # 0.0-1.0 (30% quality, 70% semantic) # Quality-Based Retention (Consolidation) export MCP_QUALITY_RETENTION_HIGH=365 # Days for quality ≥0.7 export MCP_QUALITY_RETENTION_MEDIUM=180 # Days for quality 0.5-0.7 export MCP_QUALITY_RETENTION_LOW_MIN=30 # Min days for quality <0.5 export MCP_QUALITY_RETENTION_LOW_MAX=90 # Max days for quality <0.5 ``` ## Using the Quality System ### 1. Automatic Quality Scoring Quality scores are calculated automatically when memories are retrieved: ```bash # Normal retrieval - quality scoring happens in background claude /memory-recall "what did I work on yesterday" # Quality score is updated in metadata (non-blocking) ``` ### 2. Manual Rating (Optional) Override AI scores with manual ratings: ```bash # Rate a memory (MCP tool) rate_memory( content_hash="abc123...", rating=1, # -1 (bad), 0 (neutral), 1 (good) feedback="This was very helpful!" ) # Manual ratings weighted 60%, AI scores weighted 40% ``` **HTTP API**: ```bash curl -X POST http://127.0.0.1:8000/api/quality/memories/{hash}/rate \ -H "Content-Type: application/json" \ -d '{"rating": 1, "feedback": "Helpful!"}' ``` ### 3. Quality-Boosted Search Enable quality-based reranking for better results: **Method 1: Global Configuration** ```bash export MCP_QUALITY_BOOST_ENABLED=true claude /memory-recall "search query" # Uses quality boost ``` **Method 2: Per-Query (MCP Tool)** ```bash # Search with quality boost (MCP tool) retrieve_with_quality_boost( query="search query", n_results=10, quality_weight=0.3 # 30% quality, 70% semantic ) ``` **Algorithm**: 1. Over-fetch 3× candidates (30 results for top 10) 2. Rerank by: `0.7 × semantic_similarity + 0.3 × quality_score` 3. Return top N results **Performance**: <100ms total (50ms semantic search + 20ms reranking + 30ms quality scoring) ### 4. View Quality Metrics **MCP Tool**: ```bash get_memory_quality(content_hash="abc123...") # Returns: # - quality_score: Current composite score (0.0-1.0) # - quality_provider: Which tier scored it (ONNXRankerModel, etc.) # - access_count: Number of retrievals # - last_accessed_at: Last access timestamp # - ai_scores: Historical AI evaluation scores # - user_rating: Manual rating if present ``` **HTTP API**: ```bash curl http://127.0.0.1:8000/api/quality/memories/{hash} ``` ### 5. Quality Analytics **MCP Tool**: ```bash analyze_quality_distribution(min_quality=0.0, max_quality=1.0) # Returns: # - total_memories: Total count # - high_quality_count: Score ≥0.7 # - medium_quality_count: 0.5 ≤ score < 0.7 # - low_quality_count: Score < 0.5 # - average_score: Mean quality score # - provider_breakdown: Count by provider # - top_10_memories: Highest scoring # - bottom_10_memories: Lowest scoring ``` **Dashboard** (http://127.0.0.1:8000/): - Quality badges on all memory cards (color-coded by tier) - Analytics view with distribution charts - Provider breakdown pie chart - Top/bottom performers lists ## Quality-Based Memory Management ### 1. Quality-Based Forgetting (Consolidation) High-quality memories are preserved longer during consolidation: | Quality Tier | Score Range | Retention Period | |--------------|-------------|------------------| | **High** | ≥0.7 | 365 days inactive | | **Medium** | 0.5-0.7 | 180 days inactive | | **Low** | <0.5 | 30-90 days inactive (scaled by score) | **How it works**: - Weekly consolidation scans inactive memories - Applies quality-based thresholds - Archives low-quality memories sooner - Preserves high-quality memories longer ### 2. Quality-Weighted Decay High-quality memories decay slower in relevance scoring: ``` decay_multiplier = 1.0 + (quality_score × 0.5) # High quality (0.9): 1.45× multiplier # Medium quality (0.5): 1.25× multiplier # Low quality (0.2): 1.10× multiplier final_relevance = base_relevance × decay_multiplier ``` **Effect**: High-quality memories stay relevant longer in search results. ## Privacy & Cost ### Privacy Modes | Mode | Configuration | Privacy | Cost | |------|---------------|---------|------| | **Local Only** | `MCP_QUALITY_AI_PROVIDER=local` | ✅ Full (no external calls) | $0 | | **Hybrid** | `MCP_QUALITY_AI_PROVIDER=auto` | ⚠️ Cloud fallback | ~$0.30/mo | | **Cloud** | `MCP_QUALITY_AI_PROVIDER=groq` | ❌ External API | ~$0.30/mo | | **Implicit Only** | `MCP_QUALITY_AI_PROVIDER=none` | ✅ Full (no AI) | $0 | ### Cost Comparison (3500 memories, 100 retrievals/day) | Provider | Monthly Cost | Notes | |----------|--------------|-------| | **Local SLM** | **$0** | Free forever, runs locally | | Groq (Kimi K2) | ~$0.30-0.50 | Fast, good quality | | Gemini Flash | ~$0.40-0.60 | Slower, free tier available | | Implicit Only | $0 | No AI scoring, usage patterns only | **Recommendation**: Use default local SLM (zero cost, full privacy, fast). ## Performance Benchmarks | Operation | Latency | Notes | |-----------|---------|-------| | **Local SLM Scoring (CPU)** | 50-100ms | Per memory evaluation | | **Local SLM Scoring (GPU)** | 10-20ms | With CUDA/MPS/DirectML | | **Quality-Boosted Search** | <100ms | Over-fetch + rerank | | **Implicit Signals** | <10ms | Always fast | | **Quality Metadata Update** | <5ms | Storage backend write | **Target Metrics**: - Quality calculation overhead: <10ms - Search latency with boost: <100ms total - No user-facing blocking (async scoring) ## Troubleshooting ### Local SLM Not Working **Symptom**: `quality_provider: ImplicitSignalsEvaluator` (should be `ONNXRankerModel`) **Fixes**: 1. Check ONNX Runtime installed: ```bash pip install onnxruntime ``` 2. Check model downloaded: ```bash ls ~/.cache/mcp_memory/onnx_models/ms-marco-MiniLM-L-6-v2/ # Should contain: model.onnx, tokenizer.json ``` 3. Check logs for errors: ```bash tail -f logs/mcp_memory_service.log | grep quality ``` ### Quality Scores Always 0.5 **Symptom**: All memories have `quality_score: 0.5` (neutral default) **Cause**: Quality scoring not triggered yet (memories haven't been retrieved) **Fix**: Retrieve memories to trigger scoring: ```bash claude /memory-recall "any search query" # Quality scoring happens in background after retrieval ``` ### GPU Not Detected **Symptom**: Local SLM uses CPU despite having GPU **Fixes**: 1. Install GPU-enabled ONNX Runtime: ```bash # NVIDIA CUDA pip install onnxruntime-gpu # DirectML (Windows) pip install onnxruntime-directml ``` 2. Force device selection: ```bash export MCP_QUALITY_LOCAL_DEVICE=cuda # or mps, directml ``` ### Quality Boost Not Working **Symptom**: Search results don't show quality reranking **Checks**: 1. Verify enabled: ```bash echo $MCP_QUALITY_BOOST_ENABLED # Should be "true" ``` 2. Use explicit MCP tool: ```bash retrieve_with_quality_boost(query="test", quality_weight=0.5) ``` 3. Check debug info in results: ```python result.debug_info['reranked'] # Should be True result.debug_info['quality_score'] # Should exist ``` ## Best Practices ### 1. Start with Defaults Use local SLM (default) for: - Zero cost - Full privacy - Offline capability - Good accuracy (80%+ correlation with human ratings) ### 2. Enable Quality Boost Gradually ```bash # Week 1: Collect quality scores (boost disabled) export MCP_QUALITY_BOOST_ENABLED=false # Week 2: Test with low weight export MCP_QUALITY_BOOST_ENABLED=true export MCP_QUALITY_BOOST_WEIGHT=0.2 # 20% quality # Week 3+: Increase if helpful export MCP_QUALITY_BOOST_WEIGHT=0.3 # 30% quality (recommended) ``` ### 3. Monitor Quality Distribution Check analytics regularly: ```bash analyze_quality_distribution() # Target distribution (Pareto principle): # - High quality (≥0.7): 20-30% of memories # - Medium quality (0.5-0.7): 50-60% # - Low quality (<0.5): 10-20% ``` ### 4. Manual Rating for Edge Cases Rate important memories manually: ```bash # After finding a very helpful memory rate_memory(content_hash="abc123...", rating=1, feedback="Critical info!") # After finding unhelpful memory rate_memory(content_hash="def456...", rating=-1, feedback="Outdated") ``` Manual ratings weighted 60%, AI scores 40%. ### 5. Periodic Review Monthly checklist: - [ ] Check quality distribution (analytics dashboard) - [ ] Review top 10 performers (should be genuinely helpful) - [ ] Review bottom 10 (candidates for deletion) - [ ] Verify provider breakdown (mostly local SLM) - [ ] Check average quality score (target: 0.6-0.7) ## Advanced Configuration ### Custom Retention Policy ```bash # Conservative: Preserve longer export MCP_QUALITY_RETENTION_HIGH=730 # 2 years for high quality export MCP_QUALITY_RETENTION_MEDIUM=365 # 1 year for medium export MCP_QUALITY_RETENTION_LOW_MIN=90 # 90 days minimum for low # Aggressive: Archive sooner export MCP_QUALITY_RETENTION_HIGH=180 # 6 months for high export MCP_QUALITY_RETENTION_MEDIUM=90 # 3 months for medium export MCP_QUALITY_RETENTION_LOW_MIN=14 # 2 weeks minimum for low ``` ### Custom Quality Boost Weight ```bash # Semantic-first (default) export MCP_QUALITY_BOOST_WEIGHT=0.3 # 30% quality, 70% semantic # Balanced export MCP_QUALITY_BOOST_WEIGHT=0.5 # 50% quality, 50% semantic # Quality-first export MCP_QUALITY_BOOST_WEIGHT=0.7 # 70% quality, 30% semantic ``` **Recommendation**: Start with 0.3, increase if quality boost improves results. ### Hybrid Cloud Strategy Use local SLM primarily, cloud APIs as fallback: ```bash export MCP_QUALITY_AI_PROVIDER=auto # Try all available tiers export GROQ_API_KEY="your-key" # Groq as Tier 2 fallback ``` **Behavior**: 1. Try local SLM (99% success rate) 2. If fails, try Groq API 3. If fails, try Gemini API 4. Ultimate fallback: Implicit signals only ## Success Metrics (Phase 1 Targets) From Issue #260 and #261 roadmap: | Metric | Target | Measurement | |--------|--------|-------------| | **Retrieval Precision** | >70% useful (top-5) | Up from ~50% baseline | | **Quality Coverage** | >30% memories scored | Within 3 months | | **Quality Distribution** | 20-30% high-quality | Pareto principle | | **Search Latency** | <100ms with boost | SQLite-vec backend | | **Monthly Cost** | <$0.50 or $0 | Groq API or local SLM | | **Local SLM Usage** | >95% of scoring | Tier 1 success rate | ## FAQ ### Q: Do I need API keys for the quality system? **A**: No! The default local SLM works with zero configuration, no API keys, and no external calls. ### Q: How much does it cost? **A**: $0 with the default local SLM. Optional cloud APIs cost ~$0.30-0.50/month for typical usage. ### Q: Does quality scoring slow down searches? **A**: No. Scoring happens asynchronously in the background. Quality-boosted search adds <20ms overhead. ### Q: Can I disable the quality system? **A**: Yes, set `MCP_QUALITY_SYSTEM_ENABLED=false`. System works normally without quality scores. ### Q: How accurate is the local SLM? **A**: 80%+ correlation with human quality ratings. Good enough for ranking and retention decisions. ### Q: What if the local SLM fails to download? **A**: System falls back to implicit signals (access patterns). No failures, degraded gracefully. ### Q: Can I use my own quality scoring model? **A**: Yes! Implement the `QualityEvaluator` interface and configure via `MCP_QUALITY_AI_PROVIDER`. ### Q: Does this work offline? **A**: Yes! Local SLM works fully offline. No internet required for quality scoring. ## Related Documentation - [Issue #260](https://github.com/doobidoo/mcp-memory-service/issues/260) - Quality System Specification - [Issue #261](https://github.com/doobidoo/mcp-memory-service/issues/261) - Roadmap (Quality → Agentic RAG) - [Consolidation Guide](./memory-consolidation-guide.md) - Detailed consolidation documentation - [API Reference](../api/quality-endpoints.md) - HTTP API documentation ## Changelog **v8.45.0** (2025-01-XX): - Initial release of Memory Quality System - Local SLM (ONNX) as primary tier - Quality-based forgetting in consolidation - Quality-boosted search with reranking - Dashboard UI with quality badges and analytics - Comprehensive MCP tools and HTTP API --- **Need help?** Open an issue at https://github.com/doobidoo/mcp-memory-service/issues

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/doobidoo/mcp-memory-service'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

memory-quality-guide.md•15.5 KiB