Skip to main content
Glama

MCP Memory Server

by hannesnortje
MCP_MEMORY_SERVER_GUIDE.md•9.16 kB
# MCP Memory Server - Comprehensive Guide This guide provides a detailed overview of the Memory Context Protocol (MCP) server and common operations, troubleshooting, and best practices. ## Table of Contents 1. [Overview](#overview) 2. [Server Modes](#server-modes) 3. [Basic Operations](#basic-operations) 4. [Content Ingestion](#content-ingestion) 5. [Memory Types](#memory-types) 6. [Querying Memory](#querying-memory) 7. [Troubleshooting](#troubleshooting) 8. [Best Practices](#best-practices) ## Overview The MCP Memory Server is a vector database system that stores markdown content in a way that allows AI agents to perform semantic search and recall information. It's designed for: - Storing reference documentation - Recording learned insights and patterns - Maintaining agent-specific context - Enforcing policies and compliance - Providing long-term memory for AI agents The system uses [Qdrant](https://qdrant.tech/) as the vector database backend and [Sentence Transformers](https://www.sbert.net/) for generating embeddings. ## Server Modes The server can run in three modes: 1. **Full mode** (default) - Both prompts and tools available 2. **Prompts-only mode** - Only prompts exposed (best for Cursor) 3. **Tools-only mode** - Only tools exposed (best for programmatic use) You can specify the mode when starting the server: ```bash # Full mode (default) python memory_server.py # Prompts-only mode python memory_server.py --prompts-only # OR PROMPTS_ONLY=1 python memory_server.py # Tools-only mode python memory_server.py --tools-only # OR TOOLS_ONLY=1 python memory_server.py ``` ## Basic Operations ### Starting the Server ```bash # Basic startup cd /path/to/mcp poetry run python memory_server.py # With custom configuration QDRANT_HOST=localhost QDRANT_PORT=6333 poetry run python memory_server.py ``` ### Configuration The server can be configured using: - Environment variables - Config file (`config.yaml`) - Command-line arguments Example configuration: ```yaml # config.yaml server: name: "memory-server-prod" version: "1.0.0" description: "Production MCP Memory Server" logging: level: "INFO" format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s" file: "/app/logs/server.log" qdrant: mode: "remote" host: "qdrant" port: 6333 timeout: 60 embedding: model_name: "all-MiniLM-L6-v2" dimension: 384 device: "cpu" cache_folder: "/app/data/embeddings" ``` ## Content Ingestion The server provides three main ways to ingest markdown content: ### 1. Processing Individual Files ```python result = memory_tool_handlers.handle_process_markdown_file({ "path": "/path/to/file.md", "memory_type": "global", # Optional, can auto-suggest "auto_suggest": True, # Whether to auto-suggest memory type "agent_id": None # Optional, for agent-specific memory }) ``` ### 2. Batch Processing Multiple Files ```python result = memory_tool_handlers.handle_batch_process_markdown_files({ "file_assignments": [ {"path": "/path/to/file1.md", "memory_type": "global"}, {"path": "/path/to/file2.md", "memory_type": "learned"}, {"path": "/path/to/file3.md"} # Will use default or auto-suggest ], "default_memory_type": "global" # Optional default type }) ``` ### 3. Processing Entire Directories ```python result = memory_tool_handlers.handle_batch_process_directory({ "directory": "/path/to/docs", "memory_type": "global", # Optional, can auto-suggest per file "recursive": True, # Whether to scan subdirectories "agent_id": None # Optional, for agent-specific memory }) ``` ## Memory Types The system supports three main memory types: 1. **Global Memory** (`global`) - Documentation, references, specs, standards - Shared across all agents - Permanent, factual information 2. **Learned Memory** (`learned`) - Insights, patterns, lessons learned - Shared knowledge that evolves over time - Best practices and observations 3. **Agent Memory** (`agent`) - Agent-specific context and preferences - Personal notes and drafts - Session-specific information The markdown processor can automatically suggest the appropriate memory type based on content analysis. ## Querying Memory To retrieve information from memory: ```python result = memory_tool_handlers.handle_query_memory({ "query": "How to configure Qdrant connection?", "memory_types": ["global", "learned", "agent"], # Types to search "limit": 10, # Maximum results to return "min_score": 0.3 # Minimum similarity score (0-1) }) ``` For learned patterns comparison: ```python result = memory_tool_handlers.handle_compare_against_learned_memory({ "situation": "The server is failing to connect to Qdrant", "comparison_type": "troubleshooting", "limit": 5 # Maximum results to return }) ``` ## Troubleshooting ### Common Issues #### Qdrant Connection Problems **Symptom**: "Failed to initialize Qdrant: ConnectionError" **Solutions**: 1. Check if Qdrant is running: ```bash curl http://localhost:6333/health ``` 2. Verify configuration in `config.yaml` or environment variables: ```bash export QDRANT_HOST=localhost export QDRANT_PORT=6333 ``` 3. Check if the port is open and available: ```bash netstat -ln | grep 6333 ``` 4. Start Qdrant manually: ```bash docker run -p 6333:6333 -p 6334:6334 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ qdrant/qdrant:latest ``` #### Embedding Model Issues **Symptom**: "Failed to load embedding model" **Solutions**: 1. Check internet connection (required for first download) 2. Try alternative models: ```bash export EMBEDDING_MODEL="all-MiniLM-L6-v2" ``` 3. Clear model cache if corrupted: ```bash rm -rf ~/.cache/torch/sentence_transformers/ ``` 4. Check disk space (models can be 400MB+): ```bash df -h ~/.cache/ ``` #### High Memory Usage **Symptom**: System running slowly, high RAM usage **Solutions**: 1. Monitor system resources: ```bash free -h top -p $(pgrep -f memory_server) ``` 2. Optimize configuration: ```yaml embedding: model_name: "all-MiniLM-L6-v2" # Smaller, faster model device: "cpu" # Use "cuda" if you have GPU markdown: chunk_size: 500 # Reduce for lower memory usage chunk_overlap: 100 ``` #### Slow Query Performance **Solutions**: 1. Tune similarity thresholds: ```yaml deduplication: similarity_threshold: 0.85 # Higher = faster, fewer results ``` 2. Optimize query parameters: ```python # Search specific memory types instead of all result = memory_tool_handlers.handle_query_memory({ "query": "specific search terms", "memory_types": ["global"], # Instead of all types "limit": 5 # Reduce result set size }) ``` ### System Health Check You can run a system health check to diagnose issues: ```python result = memory_tool_handlers.handle_system_health({}) ``` This will return a detailed report about the state of all components. ### Reset Collections If you need to completely reset the database (CAUTION: this will lose all data): ```bash # Stop the server pkill -f memory_server # Remove Qdrant data rm -rf ./qdrant_storage/* # Restart Qdrant docker restart qdrant # Restart the server poetry run python memory_server.py ``` ## Best Practices ### Memory Usage 1. **Choose the right memory type**: - Global: Documentation, specs, APIs, standards - Learned: Insights, patterns, lessons, best practices - Agent: Personal context, preferences, session state 2. **Optimize query parameters**: - Start with specific queries before broadening - Use technical terms when possible - Filter by memory type to narrow results 3. **Content chunking**: - The system automatically chunks long documents - Default chunk size is 900 tokens (configurable) - Headers are preserved for context 4. **Deduplication**: - The system checks for duplicates before storing - Similarity threshold is configurable (default 0.9) - Near-miss detection flags potential duplicates ### Performance Optimization 1. **Batch processing**: - Use batch_process_directory for processing many files - Process related files together for context 2. **Query optimization**: - Similarity thresholds: 0.9+ exact, 0.8-0.9 related, 0.7-0.8 discovery - Progressive queries: start specific, broaden if needed - Keyword optimization: technical terms, action words, context markers 3. **Resource usage**: - Use smaller embedding models for speed - Enable GPU acceleration if available - Adjust chunk sizes for memory constraints ### Maintenance 1. **Regular collection cleanup**: - Periodically remove outdated or duplicate entries - Archive rarely used memories 2. **Monitor system health**: - Check error logs and rates - Monitor disk usage for Qdrant storage - Track response times for queries 3. **Backups**: - Backup the Qdrant storage directory regularly - Document memory organization and structure

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hannesnortje/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server