Qdrant RAG MCP Server

enhanced-qdrant-rag-guide.md•8.87 KiB

# Enhanced Qdrant RAG Server Guide This guide includes the latest updates and improvements to the Qdrant RAG MCP Server, with special focus on Apple Silicon support and local model management. ## Recent Improvements 1. **Improved Environment Variable Handling** - Added proper .env file loading with python-dotenv - Enhanced environment variable inheritance in Docker - Fixed environment variable conflicts between Docker and local execution 2. **Enhanced Embedding Model Handling** - Better model fallback mechanisms - Improved model cache directory configuration - Added support for local models with proper cache paths 3. **Apple Silicon Optimization** - Added Metal Performance Shaders (MPS) support for M-series chips - Device detection for optimal performance - Clear error messaging when MPS isn't available 4. **Dual-Mode Execution** - New Docker mode (compatibility with all platforms) - New Local mode (uses macOS native features like MPS) - Flexible switching between modes ## Setup Quick Start Here's how to get started with the enhanced server: ```bash # Clone the repository (if not already done) cd ~/mcp-servers git clone <repository-url> qdrant-rag cd qdrant-rag # Make scripts executable chmod +x scripts/*.sh chmod +x docker/start_with_env.sh # Configure environment (review and edit as needed) cp .env.example .env ``` ## Running the Server ### Docker Mode (Best for compatibility) This mode runs everything in Docker containers: ```bash # Start using the enhanced script ./docker/start_with_env.sh # Or use docker compose directly cd docker docker compose up -d ``` ### Local Mode (Best for Apple Silicon) This mode runs Qdrant in Docker but the RAG server directly on your Mac: ```bash # Run with local flag to use MPS acceleration ./docker/start_with_env.sh --local ``` ## Configuring Embedding Models ### Model Setup for Apple Silicon To get maximum performance on M-series Macs, use these settings in your `.env` file: ```bash # For macOS with Apple Silicon (M1/M2/M3): EMBEDDING_MODEL=all-MiniLM-L12-v2 TOKENIZERS_PARALLELISM=false # Avoid tokenizer warnings MPS_DEVICE_ENABLE=1 # Enable Metal Performance Shaders # Model Cache Directory SENTENCE_TRANSFORMERS_HOME=~/mcp-servers/qdrant-rag/data/models # Local path ``` ### Understanding Docker vs Local Mode 1. **Docker Mode** - Runs everything in containers - Environment variables from `.env` pass through Docker - MPS acceleration is NOT available (containers run Linux) - Model cache is stored in container volumes 2. **Local Mode** - Runs only Qdrant in Docker - RAG server runs natively on your Mac - MPS acceleration IS available (when enabled) - Model cache is stored in your local filesystem - Uses your local Python installation ## Metal Performance Shaders (MPS) Support MPS enables your RAG server to use the GPU on Apple Silicon devices: 1. **Requirements** - Apple Silicon Mac (M1/M2/M3) - macOS 12.3+ - PyTorch with MPS support 2. **Setup** ```bash # Install PyTorch with MPS support pip install torch torchvision # Set environment variables MPS_DEVICE_ENABLE=1 ``` 3. **Verification** - When running in local mode with debug logging - You should see "Using Apple Metal Performance Shaders (MPS) backend" in logs - And "Using device: mps" when the model is loaded 4. **Troubleshooting** - If you see "Using device: cpu" even with MPS_DEVICE_ENABLE=1: - Check that you're running in local mode - Verify PyTorch is installed correctly - Ensure you're on macOS 12.3+ ## Environment Variables in Detail Here are the most important environment variables: ```bash # Qdrant Configuration QDRANT_HOST=localhost # Use "qdrant" in Docker mode QDRANT_PORT=6333 QDRANT_API_KEY= # Optional # Server Configuration SERVER_PORT=8080 LOG_LEVEL=INFO # Embedding Model Configuration EMBEDDING_MODEL=all-MiniLM-L12-v2 # Or other model name TOKENIZERS_PARALLELISM=false # Recommended setting MPS_DEVICE_ENABLE=1 # For Apple Silicon GPUs # Model Cache Directory SENTENCE_TRANSFORMERS_HOME=~/mcp-servers/qdrant-rag/data/models # Local path # SENTENCE_TRANSFORMERS_HOME=/app/data/models # Docker path ``` ## Example Workflows ### Indexing a Project ```bash # If using Docker mode docker exec rag_mcp_server python -c "from src.indexers import index_directory; index_directory('/claude-code/my-project')" # If using Local mode cd /Users/antoncoleman/Documents/repos/mcp-servers/qdrant-rag python -c "from src.indexers import index_directory; index_directory('/path/to/my-project')" ``` ### Searching with the RAG Server ```bash curl -X POST http://localhost:8080/search \ -H "Content-Type: application/json" \ -d '{"query": "implement authentication middleware", "n_results": 5}' ``` ### Switching Models ```bash # 1. Update .env sed -i '' 's/EMBEDDING_MODEL=.*/EMBEDDING_MODEL=all-mpnet-base-v2/' .env # 2. Restart the server ./docker/start_with_env.sh # Or for local mode: ./docker/start_with_env.sh --local ``` ## Troubleshooting ### Common Issues 1. **"MPS_DEVICE_ENABLE=1 but running on Linux aarch64"** - This message appears when running Docker mode. This is expected as Docker runs in Linux which doesn't support MPS. - Solution: Use local mode if you need MPS acceleration. 2. **"Error loading model X: Y"** - Model couldn't be loaded from cache or downloaded. - Check internet connection, disk space, and model name spelling. 3. **Container keeps restarting** - Health checks might be failing. - Check logs with `docker logs rag_mcp_server`. ### Docker Mode Logs ```bash # View service logs docker logs rag_mcp_server ``` ### Local Mode Logs Logs appear directly in the terminal when running in local mode. ## Advanced Configuration ### Custom Model Download Location ```bash # In .env SENTENCE_TRANSFORMERS_HOME=/custom/path/to/models # Then run mkdir -p /custom/path/to/models chmod 755 /custom/path/to/models ``` ### Pre-downloading Models ```python # Create a script: download_model.py from sentence_transformers import SentenceTransformer import os model_name = "all-MiniLM-L12-v2" cache_dir = os.path.expanduser("~/mcp-servers/qdrant-rag/data/models") print(f"Downloading {model_name}...") model = SentenceTransformer(model_name_or_path=model_name, cache_folder=cache_dir) print(f"Successfully downloaded model with dimension: {model.get_sentence_embedding_dimension()}") ``` Then run: ```bash python download_model.py ``` ## Best Practices 1. **Use Local Mode for Development on Mac** - Local mode gives you the benefit of MPS acceleration - Better performance on Apple Silicon - More direct debugging capability 2. **Use Docker Mode for Production/Deployment** - More consistent environment - Better isolation - Easier to deploy to non-Mac systems 3. **Balance Model Size and Performance** - Smaller models (all-MiniLM-L6-v2) are faster but less accurate - Larger models (all-mpnet-base-v2) are more accurate but slower - Test different models to find the right balance ## Complete Updates List 1. **In qdrant_mcp_context_aware.py** - Added python-dotenv loading - Improved environment variable handling - Enhanced embedding model initialization - Added MPS support for Apple Silicon - Better error handling for model loading - Improved fallback mechanisms - Integrated specialized indexers for code and config files - Enhanced search methods with better filtering - Improved directory indexing with exclude patterns 2. **In docker-compose.yml** - Added environment variable pass-through - Added model cache volume mapping - Improved Docker networking config - Fixed health check issues 3. **In Dockerfile** - Added python-dotenv installation - Added .env file handling - Improved container security - Enhanced error handling 4. **Added start_with_env.sh script** - Dual-mode execution (Docker and Local) - Simplified startup process - Better environment variable handling - Clearer error messages ## Enhanced Indexers Integration With the specialized indexers now integrated, the RAG server provides: 1. **Improved Code Indexing** - Language-specific parsing (Python, JavaScript, Java, etc.) - Structure-aware chunking (functions, classes, methods) - Line number tracking for precise locations - Rich metadata about imports, dependencies, and code structure 2. **Advanced Config Handling** - Support for multiple formats (JSON, XML, YAML, TOML, INI, ENV) - Hierarchical structure preservation - Path-based navigation and filtering - Schema extraction for better understanding 3. **Superior Search Experience** - Filter by language, chunk type, file type - Search within specific config paths - Get more meaningful results with structural context - Better result formatting with line numbers and previews These enhancements significantly improve the quality of RAG responses by providing more precise and contextually relevant results.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ancoleman/qdrant-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

enhanced-qdrant-rag-guide.md•8.87 KiB