Skip to main content
Glama
quality-system-configs.md13.7 kB
# Memory Quality System - Example Configurations > **Version**: 8.49.0 > **Updated**: December 8, 2025 > **See Also**: [Memory Quality Guide](../guides/memory-quality-guide.md), [Evaluation Report](https://github.com/doobidoo/mcp-memory-service/wiki/Memory-Quality-System-Evaluation) This document provides tested configuration examples for different use cases. **v8.49.0 introduces NVIDIA DeBERTa quality classifier** as the new default model, eliminating self-matching bias and providing absolute quality assessment. --- ## Configuration 1: Default (Recommended for Most Users - v8.49.0+) **Use case**: General usage with accurate quality assessment ```bash # .env configuration (v8.49.0+ defaults) MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local # Local ONNX only MCP_QUALITY_LOCAL_MODEL=nvidia-quality-classifier-deberta # DeBERTa (default) MCP_QUALITY_BOOST_ENABLED=true # Recommended with DeBERTa MCP_QUALITY_BOOST_WEIGHT=0.3 # 30% quality, 70% semantic # Benefits over v8.48.x (MS-MARCO): # ✅ No self-matching bias # ✅ Absolute quality assessment (query-independent) # ✅ Uniform distribution (mean: 0.60-0.70) # ✅ Fewer false positives (<5% perfect scores) ``` **Why this works**: - Zero cost, full privacy (runs locally) - Accurate absolute quality assessment - Suitable for archival and retention decisions - GPU acceleration for fast inference (20-40ms) **When to use**: - ✅ **All new installations** (default) - ✅ **All database sizes** (small to large) - ✅ **Cost-conscious users** (zero API costs) - ✅ **Privacy-focused setups** (no external calls) --- ## Configuration 2: Quality-Boosted (Large Databases) **Use case**: Large memory databases where quality variance is high ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local MCP_QUALITY_BOOST_ENABLED=true # Enable quality boost MCP_QUALITY_BOOST_WEIGHT=0.3 # 30% quality, 70% semantic # Note: Only provides 0-3% ranking improvement in v8.48.3 # Most beneficial when: # - Database has >10,000 memories # - Quality variance is high # - Searching for "best practices" or "authoritative" content ``` **Why this works**: - Minimal latency impact (+17%, 6.5ms) - Still zero cost with local ONNX - Small but measurable improvement in large databases **When to use**: - >10,000 memories - Diverse quality levels - Research or documentation heavy usage --- ## Configuration 3: Privacy-First (No AI Scoring) **Use case**: Maximum privacy, implicit signals only ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=none # Disable all AI scoring # No ONNX model download, no API calls # Quality scores based purely on: # - Access frequency (40%) # - Recency (30%) # - Retrieval ranking (30%) ``` **Why this works**: - No AI models, no external calls - Still benefits from usage patterns - Zero overhead (implicit signals are fast) **When to use**: - Air-gapped environments - Strict privacy requirements - Resource-constrained systems --- ## Configuration 4: Hybrid Strategy (Phase 2 - Coming Soon) **Use case**: Combine ONNX + implicit signals for better scores ```bash # Future configuration (Issue #268) MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local MCP_QUALITY_HYBRID_ENABLED=true # Enable hybrid scoring # Hybrid formula (proposed): # quality = 0.30*onnx + 0.25*access + 0.20*recency + 0.15*tags + 0.10*completeness ``` **Why this will work better**: - Reduces reliance on single model (ONNX) - Incorporates actual usage patterns - Mitigates self-matching bias **Status**: Planned for Phase 2 (1-2 weeks) --- ## Configuration 5: Cloud-Enhanced (Opt-In) **Use case**: Users who want better quality assessment and don't mind API costs ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=auto # Try all tiers GROQ_API_KEY="your-groq-api-key" # Groq as Tier 2 # Behavior: # 1. Try local ONNX (99% success rate) # 2. Fallback to Groq API if needed # 3. Ultimate fallback: Implicit signals # Cost: ~$0.30-0.50/month for typical usage ``` **Why this works**: - Local-first, cloud as fallback - Better quality assessment with Groq - Still privacy-preserving (API calls are opt-in) **When to use**: - Professional/commercial usage - Budget for cloud APIs (~$0.50/month) - Want better quality assessment --- ## Configuration 6: LLM-as-Judge (Phase 3 - Coming Soon) **Use case**: Absolute quality assessment for high-value memories ```bash # Future configuration (Issue #268) MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=llm_judge # LLM-as-judge mode GROQ_API_KEY="your-groq-api-key" # Use Groq for quality # LLM evaluates memories with structured prompts: # - Specificity: Is it detailed and actionable? # - Accuracy: Is the information correct? # - Completeness: Does it cover the topic fully? # - Relevance: Is it still current/applicable? # Cost: ~$0.05-0.10 per 1,000 memories (batch evaluation) ``` **Why this will work better**: - LLM understands context and nuance - Absolute quality assessment (not just relevance) - No self-matching bias **Status**: Planned for Phase 3 (1-3 months) --- ## Configuration 7: Conservative Retention (Preserve More) **Use case**: Never want to lose potentially valuable memories ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local # Conservative retention periods MCP_QUALITY_RETENTION_HIGH=730 # 2 years for high quality MCP_QUALITY_RETENTION_MEDIUM=365 # 1 year for medium MCP_QUALITY_RETENTION_LOW_MIN=180 # 6 months minimum for low # Warning: Given self-matching bias, be extra conservative # Many "low quality" memories may be false negatives ``` **Why this works**: - Accounts for potential scoring bias - Preserves memories longer - Safer with uncertain quality scores **When to use**: - Critical knowledge bases - Archival/research projects - When in doubt, preserve --- ## Configuration 8: Aggressive Cleanup (Minimize Storage) **Use case**: Want to keep only highest quality memories ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local # Aggressive retention periods MCP_QUALITY_RETENTION_HIGH=180 # 6 months for high MCP_QUALITY_RETENTION_MEDIUM=90 # 3 months for medium MCP_QUALITY_RETENTION_LOW_MIN=30 # 1 month minimum for low # ⚠️ DANGER: Given self-matching bias, this may delete valuable memories # Recommendation: Manually review candidates before archival # See: analyze_quality_distribution() -> Review bottom 10 ``` **Why this is risky**: - May archive genuinely useful memories (false negatives) - ONNX scores not validated for absolute quality - No user feedback to verify **When to use**: - Storage-constrained environments - Ephemeral/temporary knowledge - **ONLY after manual validation** --- ## Configuration 9: DeBERTa Quality Classifier (Recommended - v8.49.0+) **Use case**: Absolute quality assessment without self-matching bias ```bash # .env configuration MCP_QUALITY_SYSTEM_ENABLED=true MCP_QUALITY_AI_PROVIDER=local MCP_QUALITY_LOCAL_MODEL=nvidia-quality-classifier-deberta # Default v8.49.0+ MCP_QUALITY_LOCAL_DEVICE=auto # Auto-detect GPU # Quality boost recommended with DeBERTa (more accurate scores) MCP_QUALITY_BOOST_ENABLED=true MCP_QUALITY_BOOST_WEIGHT=0.3 # Expected improvements: # - Mean score: 0.60-0.70 (vs 0.469 with MS-MARCO) # - Perfect 1.0 scores: <5% (vs 20% with MS-MARCO) # - Uniform distribution (vs bimodal clustering) # - No self-matching bias ``` **Why this works best**: - ✅ **Eliminates self-matching bias** - Query-independent evaluation - ✅ **Absolute quality assessment** - Designed for quality scoring - ✅ **Uniform distribution** - More realistic score spread - ✅ **Fewer false positives** - <5% perfect scores - ✅ **Still zero cost** - Runs locally with GPU acceleration **Performance**: - Model size: 450MB (one-time download) - CPU: 80-150ms per evaluation - GPU (CUDA/MPS/DirectML): 20-40ms per evaluation - ~20% slower than MS-MARCO but significantly more accurate **Migration from MS-MARCO**: ```bash # Export DeBERTa model (one-time) python scripts/quality/export_deberta_onnx.py # Re-evaluate existing memories python scripts/quality/migrate_to_deberta.py # Verify improved distribution curl -ks https://127.0.0.1:8000/api/quality/distribution | python3 -m json.tool ``` **When to use**: - ✅ **All new installations** (default in v8.49.0+) - ✅ **Upgrading from v8.48.x** (migration script available) - ✅ **When quality accuracy matters** (archival decisions, retention policies) - ✅ **Large databases** (>5,000 memories) **When NOT to use**: - Extremely limited disk space (<500MB available) - Legacy systems requiring MS-MARCO compatibility - When 450MB model download is not feasible **Fallback to MS-MARCO** (not recommended): ```bash # Override to legacy model (only if needed) export MCP_QUALITY_LOCAL_MODEL=ms-marco-MiniLM-L-6-v2 ``` --- ## Monitoring & Validation Regardless of configuration, **always validate** quality scores before making decisions: ### 1. Weekly Check ```bash # Run analytics analyze_quality_distribution() # Expected output (v8.48.3): # - High (≥0.7): 32.2% (includes ~25% false positives) # - Medium (0.5-0.7): 27.4% # - Low (<0.5): 40.4% ``` ### 2. Monthly Review ```bash # 1. Check top performers (verify genuinely helpful) analyze_quality_distribution() | grep "Top 10" # 2. Check bottom performers (candidates for archival) analyze_quality_distribution() | grep "Bottom 10" # 3. Manually validate before deleting rate_memory(content_hash="...", rating=-1, feedback="Actually useful, ONNX scored wrong") ``` ### 3. Quarterly Audit ```bash # Sample random memories across quality tiers # Manually rate them # Compare AI scores vs human ratings # Track correlation: # - Target: >0.7 correlation # - Current (v8.48.3): ~0.5-0.6 (due to self-matching bias) ``` --- ## Migration Path ### From v8.45.0 to v8.48.3 (Current) **Changes**: - Added ONNX limitations documentation - Updated best practices with manual validation - Kept quality boost as opt-in (good default) **Action required**: None (backward compatible) **Recommendation**: Review [Evaluation Report](https://github.com/doobidoo/mcp-memory-service/wiki/Memory-Quality-System-Evaluation) ### To Phase 2 (Hybrid Scoring) **Planned changes** (Issue #268): - Hybrid quality scoring (ONNX + implicit signals) - User feedback system (manual ratings) - A/B test framework **Migration**: ```bash # New config options export MCP_QUALITY_HYBRID_ENABLED=true export MCP_QUALITY_USER_FEEDBACK_ENABLED=true # Existing scores will be recalculated with hybrid formula ``` ### To Phase 3 (LLM-as-Judge) **Planned changes** (Issue #268): - LLM-based absolute quality assessment - Quality-driven memory lifecycle - Improved query generation **Migration**: ```bash # New config options export MCP_QUALITY_AI_PROVIDER=llm_judge export GROQ_API_KEY="your-key" # Batch re-evaluation of existing memories (opt-in) ``` --- ## Troubleshooting Common Issues ### Issue 1: All Memories Have Score 1.0 **Symptom**: Most memories have perfect quality scores **Cause**: Self-matching bias from tag-generated queries **Solution**: ```bash # 1. Understand this is expected behavior (v8.48.3) # 2. Use scores for relative ranking only # 3. Combine with implicit signals # 4. Wait for Phase 2 (hybrid scoring) ``` ### Issue 2: Average Score Too Low (0.469) **Symptom**: `analyze_quality_distribution()` shows avg 0.469 **Cause**: Bimodal distribution (many 1.0, many 0.0, few middle scores) **Solution**: ```bash # This is expected in v8.48.3 # Not a bug, but a model limitation # Solutions coming in Phase 2/3 (Issue #268) ``` ### Issue 3: Quality Boost Not Improving Results **Symptom**: Enabling quality boost doesn't change ranking **Cause**: Top results already high-quality (0-3% difference measured) **Solution**: ```bash # Keep boost disabled (default) export MCP_QUALITY_BOOST_ENABLED=false # Only enable for large databases or specific searches retrieve_with_quality_boost(query="best practices", quality_weight=0.5) ``` --- ## Best Practices Summary **v8.49.0+ (DeBERTa)**: 1. **Use defaults** (Configuration 1 or 9) - DeBERTa provides accurate quality assessment 2. **Enable quality boost** - More effective with DeBERTa's accurate scores 3. **Trust quality scores** - Suitable for archival and retention decisions 4. **Monitor distribution monthly** with `analyze_quality_distribution()` 5. **Provide manual ratings** for important memories (enhances learning) 6. **Migrate from MS-MARCO** if upgrading: `python scripts/quality/migrate_to_deberta.py` **v8.48.x and earlier (MS-MARCO)**: 1. **Upgrade to v8.49.0** for DeBERTa improvements 2. If staying on MS-MARCO: Use scores for **relative ranking only** 3. **Manually validate** before archival decisions (self-matching bias) 4. **Monitor false positives** (20% perfect 1.0 scores) --- ## Related Documentation - [Memory Quality Guide](../guides/memory-quality-guide.md) - Comprehensive guide - [Evaluation Report](https://github.com/doobidoo/mcp-memory-service/wiki/Memory-Quality-System-Evaluation) - Full analysis - [Issue #268](https://github.com/doobidoo/mcp-memory-service/issues/268) - Planned improvements - [CLAUDE.md](../../CLAUDE.md) - Quick reference --- **Questions?** Open an issue at https://github.com/doobidoo/mcp-memory-service/issues

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/doobidoo/mcp-memory-service'

If you have feedback or need assistance with the MCP directory API, please join our Discord server