MCP Memory Service

Overview Schema Related Servers Score Discussions

quality-system-configs.md•13.3 KiB

# Memory Quality System - Example Configurations > **Version**: 8.49.0 > **Updated**: December 8, 2025 > **See Also**: [Memory Quality Guide](../guides/memory-quality-guide.md), [Evaluation Report](https://github.com/doobidoo/mcp-memory-service/wiki/Memory-Quality-System-Evaluation) This document provides tested configuration examples for different use cases. **v8.49.0 introduces NVIDIA DeBERTa quality classifier** as the new default model, eliminating self-matching bias and providing absolute quality assessment. --- ## Configuration 1: Default (Recommended for Most Users - v8.49.0+) **Use case**: General usage with accurate quality assessment ```bash # .env configuration (v8.49.0+ defaults) MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local # Local ONNX only MCP_QUALITY_LOCAL_MODEL=nvidia-quality-classifier-deberta # DeBERTa (default) MCP_QUALITY_BOOST_ENABLED=true # Recommended with DeBERTa MCP_QUALITY_BOOST_WEIGHT=0.3 # 30% quality, 70% semantic # Benefits over v8.48.x (MS-MARCO): # ✅ No self-matching bias # ✅ Absolute quality assessment (query-independent) # ✅ Uniform distribution (mean: 0.60-0.70) # ✅ Fewer false positives (<5% perfect scores) ``` **Why this works**: - Zero cost, full privacy (runs locally) - Accurate absolute quality assessment - Suitable for archival and retention decisions - GPU acceleration for fast inference (20-40ms) **When to use**: - ✅ **All new installations** (default) - ✅ **All database sizes** (small to large) - ✅ **Cost-conscious users** (zero API costs) - ✅ **Privacy-focused setups** (no external calls) --- ## Configuration 2: Quality-Boosted (Large Databases) **Use case**: Large memory databases where quality variance is high ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local MCP_QUALITY_BOOST_ENABLED=true # Enable quality boost MCP_QUALITY_BOOST_WEIGHT=0.3 # 30% quality, 70% semantic # Note: Only provides 0-3% ranking improvement in v8.48.3 # Most beneficial when: # - Database has >10,000 memories # - Quality variance is high # - Searching for "best practices" or "authoritative" content ``` **Why this works**: - Minimal latency impact (+17%, 6.5ms) - Still zero cost with local ONNX - Small but measurable improvement in large databases **When to use**: - >10,000 memories - Diverse quality levels - Research or documentation heavy usage --- ## Configuration 3: Privacy-First (No AI Scoring) **Use case**: Maximum privacy, implicit signals only ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=none # Disable all AI scoring # No ONNX model download, no API calls # Quality scores based purely on: # - Access frequency (40%) # - Recency (30%) # - Retrieval ranking (30%) ``` **Why this works**: - No AI models, no external calls - Still benefits from usage patterns - Zero overhead (implicit signals are fast) **When to use**: - Air-gapped environments - Strict privacy requirements - Resource-constrained systems --- ## Configuration 4: Hybrid Strategy (Phase 2 - Coming Soon) **Use case**: Combine ONNX + implicit signals for better scores ```bash # Future configuration (Issue #268) MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local MCP_QUALITY_HYBRID_ENABLED=true # Enable hybrid scoring # Hybrid formula (proposed): # quality = 0.30*onnx + 0.25*access + 0.20*recency + 0.15*tags + 0.10*completeness ``` **Why this will work better**: - Reduces reliance on single model (ONNX) - Incorporates actual usage patterns - Mitigates self-matching bias **Status**: Planned for Phase 2 (1-2 weeks) --- ## Configuration 5: Cloud-Enhanced (Opt-In) **Use case**: Users who want better quality assessment and don't mind API costs ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=auto # Try all tiers GROQ_API_KEY="your-groq-api-key" # Groq as Tier 2 # Behavior: # 1. Try local ONNX (99% success rate) # 2. Fallback to Groq API if needed # 3. Ultimate fallback: Implicit signals # Cost: ~$0.30-0.50/month for typical usage ``` **Why this works**: - Local-first, cloud as fallback - Better quality assessment with Groq - Still privacy-preserving (API calls are opt-in) **When to use**: - Professional/commercial usage - Budget for cloud APIs (~$0.50/month) - Want better quality assessment --- ## Configuration 6: LLM-as-Judge (Phase 3 - Coming Soon) **Use case**: Absolute quality assessment for high-value memories ```bash # Future configuration (Issue #268) MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=llm_judge # LLM-as-judge mode GROQ_API_KEY="your-groq-api-key" # Use Groq for quality # LLM evaluates memories with structured prompts: # - Specificity: Is it detailed and actionable? # - Accuracy: Is the information correct? # - Completeness: Does it cover the topic fully? # - Relevance: Is it still current/applicable? # Cost: ~$0.05-0.10 per 1,000 memories (batch evaluation) ``` **Why this will work better**: - LLM understands context and nuance - Absolute quality assessment (not just relevance) - No self-matching bias **Status**: Planned for Phase 3 (1-3 months) --- ## Configuration 7: Conservative Retention (Preserve More) **Use case**: Never want to lose potentially valuable memories ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local # Conservative retention periods MCP_QUALITY_RETENTION_HIGH=730 # 2 years for high quality MCP_QUALITY_RETENTION_MEDIUM=365 # 1 year for medium MCP_QUALITY_RETENTION_LOW_MIN=180 # 6 months minimum for low # Warning: Given self-matching bias, be extra conservative # Many "low quality" memories may be false negatives ``` **Why this works**: - Accounts for potential scoring bias - Preserves memories longer - Safer with uncertain quality scores **When to use**: - Critical knowledge bases - Archival/research projects - When in doubt, preserve --- ## Configuration 8: Aggressive Cleanup (Minimize Storage) **Use case**: Want to keep only highest quality memories ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local # Aggressive retention periods MCP_QUALITY_RETENTION_HIGH=180 # 6 months for high MCP_QUALITY_RETENTION_MEDIUM=90 # 3 months for medium MCP_QUALITY_RETENTION_LOW_MIN=30 # 1 month minimum for low # ⚠️ DANGER: Given self-matching bias, this may delete valuable memories # Recommendation: Manually review candidates before archival # See: analyze_quality_distribution() -> Review bottom 10 ``` **Why this is risky**: - May archive genuinely useful memories (false negatives) - ONNX scores not validated for absolute quality - No user feedback to verify **When to use**: - Storage-constrained environments - Ephemeral/temporary knowledge - **ONLY after manual validation** --- ## Configuration 9: DeBERTa Quality Classifier (Recommended - v8.49.0+) **Use case**: Absolute quality assessment without self-matching bias ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local MCP_QUALITY_LOCAL_MODEL=nvidia-quality-classifier-deberta # Default v8.49.0+ MCP_QUALITY_LOCAL_DEVICE=auto # Auto-detect GPU # Quality boost recommended with DeBERTa (more accurate scores) MCP_QUALITY_BOOST_ENABLED=true MCP_QUALITY_BOOST_WEIGHT=0.3 # Expected improvements: # - Mean score: 0.60-0.70 (vs 0.469 with MS-MARCO) # - Perfect 1.0 scores: <5% (vs 20% with MS-MARCO) # - Uniform distribution (vs bimodal clustering) # - No self-matching bias ``` **Why this works best**: - ✅ **Eliminates self-matching bias** - Query-independent evaluation - ✅ **Absolute quality assessment** - Designed for quality scoring - ✅ **Uniform distribution** - More realistic score spread - ✅ **Fewer false positives** - <5% perfect scores - ✅ **Still zero cost** - Runs locally with GPU acceleration **Performance**: - Model size: 450MB (one-time download) - CPU: 80-150ms per evaluation - GPU (CUDA/MPS/DirectML): 20-40ms per evaluation - ~20% slower than MS-MARCO but significantly more accurate **Migration from MS-MARCO**: ```bash # Export DeBERTa model (one-time) python scripts/quality/export_deberta_onnx.py # Re-evaluate existing memories python scripts/quality/migrate_to_deberta.py # Verify improved distribution curl -ks https://127.0.0.1:8000/api/quality/distribution | python3 -m json.tool ``` **When to use**: - ✅ **All new installations** (default in v8.49.0+) - ✅ **Upgrading from v8.48.x** (migration script available) - ✅ **When quality accuracy matters** (archival decisions, retention policies) - ✅ **Large databases** (>5,000 memories) **When NOT to use**: - Extremely limited disk space (<500MB available) - Legacy systems requiring MS-MARCO compatibility - When 450MB model download is not feasible **Fallback to MS-MARCO** (not recommended): ```bash # Override to legacy model (only if needed) export MCP_QUALITY_LOCAL_MODEL=ms-marco-MiniLM-L-6-v2 ``` --- ## Monitoring & Validation Regardless of configuration, **always validate** quality scores before making decisions: ### 1. Weekly Check ```bash # Run analytics analyze_quality_distribution() # Expected output (v8.48.3): # - High (≥0.7): 32.2% (includes ~25% false positives) # - Medium (0.5-0.7): 27.4% # - Low (<0.5): 40.4% ``` ### 2. Monthly Review ```bash # 1. Check top performers (verify genuinely helpful) analyze_quality_distribution() | grep "Top 10" # 2. Check bottom performers (candidates for archival) analyze_quality_distribution() | grep "Bottom 10" # 3. Manually validate before deleting rate_memory(content_hash="...", rating=-1, feedback="Actually useful, ONNX scored wrong") ``` ### 3. Quarterly Audit ```bash # Sample random memories across quality tiers # Manually rate them # Compare AI scores vs human ratings # Track correlation: # - Target: >0.7 correlation # - Current (v8.48.3): ~0.5-0.6 (due to self-matching bias) ``` --- ## Migration Path ### From v8.45.0 to v8.48.3 (Current) **Changes**: - Added ONNX limitations documentation - Updated best practices with manual validation - Kept quality boost as opt-in (good default) **Action required**: None (backward compatible) **Recommendation**: Review [Evaluation Report](https://github.com/doobidoo/mcp-memory-service/wiki/Memory-Quality-System-Evaluation) ### To Phase 2 (Hybrid Scoring) **Planned changes** (Issue #268): - Hybrid quality scoring (ONNX + implicit signals) - User feedback system (manual ratings) - A/B test framework **Migration**: ```bash # New config options export MCP_QUALITY_HYBRID_ENABLED=true export MCP_QUALITY_USER_FEEDBACK_ENABLED=true # Existing scores will be recalculated with hybrid formula ``` ### To Phase 3 (LLM-as-Judge) **Planned changes** (Issue #268): - LLM-based absolute quality assessment - Quality-driven memory lifecycle - Improved query generation **Migration**: ```bash # New config options export MCP_QUALITY_AI_PROVIDER=llm_judge export GROQ_API_KEY="your-key" # Batch re-evaluation of existing memories (opt-in) ``` --- ## Troubleshooting Common Issues ### Issue 1: All Memories Have Score 1.0 **Symptom**: Most memories have perfect quality scores **Cause**: Self-matching bias from tag-generated queries **Solution**: ```bash # 1. Understand this is expected behavior (v8.48.3) # 2. Use scores for relative ranking only # 3. Combine with implicit signals # 4. Wait for Phase 2 (hybrid scoring) ``` ### Issue 2: Average Score Too Low (0.469) **Symptom**: `analyze_quality_distribution()` shows avg 0.469 **Cause**: Bimodal distribution (many 1.0, many 0.0, few middle scores) **Solution**: ```bash # This is expected in v8.48.3 # Not a bug, but a model limitation # Solutions coming in Phase 2/3 (Issue #268) ``` ### Issue 3: Quality Boost Not Improving Results **Symptom**: Enabling quality boost doesn't change ranking **Cause**: Top results already high-quality (0-3% difference measured) **Solution**: ```bash # Keep boost disabled (default) export MCP_QUALITY_BOOST_ENABLED=false # Only enable for large databases or specific searches retrieve_with_quality_boost(query="best practices", quality_weight=0.5) ``` --- ## Best Practices Summary **v8.49.0+ (DeBERTa)**: 1. **Use defaults** (Configuration 1 or 9) - DeBERTa provides accurate quality assessment 2. **Enable quality boost** - More effective with DeBERTa's accurate scores 3. **Trust quality scores** - Suitable for archival and retention decisions 4. **Monitor distribution monthly** with `analyze_quality_distribution()` 5. **Provide manual ratings** for important memories (enhances learning) 6. **Migrate from MS-MARCO** if upgrading: `python scripts/quality/migrate_to_deberta.py` **v8.48.x and earlier (MS-MARCO)**: 1. **Upgrade to v8.49.0** for DeBERTa improvements 2. If staying on MS-MARCO: Use scores for **relative ranking only** 3. **Manually validate** before archival decisions (self-matching bias) 4. **Monitor false positives** (20% perfect 1.0 scores) --- ## Related Documentation - [Memory Quality Guide](../guides/memory-quality-guide.md) - Comprehensive guide - [Evaluation Report](https://github.com/doobidoo/mcp-memory-service/wiki/Memory-Quality-System-Evaluation) - Full analysis - [Issue #268](https://github.com/doobidoo/mcp-memory-service/issues/268) - Planned improvements - [CLAUDE.md](../../CLAUDE.md) - Quick reference --- **Questions?** Open an issue at https://github.com/doobidoo/mcp-memory-service/issues

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/doobidoo/mcp-memory-service'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

quality-system-configs.md•13.3 KiB