Personal RAG MCP Server

README.md•8.88 kB

# Personal RAG MCP Server A Model Context Protocol (MCP) server that provides personal knowledge base with RAG (Retrieval-Augmented Generation) capabilities. Share context across Claude Desktop, Claude Code, VS Code, and Open WebUI. ## Features - **Hybrid Storage**: SQLite for full-text documents + Qdrant for semantic search - **Rich Metadata**: Comprehensive metadata capture for future extensibility - **Dual Transport**: stdio (for Claude Desktop/VS Code) + HTTP Streaming (for Open WebUI) - **Forward-Compatible**: Strategy pattern allows adding advanced RAG features without refactoring - **Containerized**: Runs in Docker, connects to existing Qdrant/Ollama/LiteLLM infrastructure ## Architecture ``` User Input → MCP Tool ↓ [1] Generate embedding (Ollama) ↓ [2] Store full text + metadata in SQLite ↓ [3] Store vector in Qdrant ↓ Return confirmation Search Query ↓ [1] Embed query (Ollama) ↓ [2] Search Qdrant (semantic search) ↓ [3] Retrieve full text from SQLite ↓ [4] Generate response (LiteLLM) ↓ Return answer + sources ``` ## MCP Tools ### 1. `store_memory` Store notes, documents, or snippets in the knowledge base. ```python store_memory( text="Your content here", namespace="notes/personal", # Hierarchical organization tags=["tag1", "tag2"], title="Optional Title", category="personal", # work, personal, family content_type="note" # note, document, snippet ) ``` ### 2. `search_memory` Semantic search across your knowledge base. ```python search_memory( query="What did I learn about X?", namespace="notes/personal", # Optional filter limit=5, content_type="note" # Optional filter ) ``` ### 3. `ask_with_context` Ask questions with RAG (retrieval + generation). ```python ask_with_context( question="What are my thoughts on X?", namespace="notes/personal", # Optional filter limit=5 # Context chunks to retrieve ) ``` ## Project Structure ``` personal-rag-mcp/ ├── Dockerfile ├── requirements.txt ├── README.md ├── config/ │ ├── pipeline.yaml # RAG pipeline config │ └── server.yaml # Server config ├── personal_rag_mcp/ │ ├── server.py # MCP server entry point │ ├── storage/ │ │ ├── sqlite_store.py # SQLite document storage │ │ ├── qdrant_store.py # Qdrant vector storage │ │ └── schema.py # Pydantic metadata models │ ├── pipeline/ │ │ ├── retriever.py # Retrieval strategies │ │ ├── reranker.py # Reranking strategies │ │ ├── expander.py # Query expansion │ │ ├── generator.py # LLM generation │ │ └── pipeline.py # RAG orchestration │ └── utils/ │ ├── embeddings.py # Ollama embedding client │ └── chunking.py # Text chunking ├── scripts/ │ ├── init_db.py # Initialize database │ └── backup.py # Backup utility └── tests/ ``` ## Environment Variables ```bash # Transport TRANSPORT=http # or stdio PORT=8765 # Storage SQLITE_PATH=/app/data/documents.db QDRANT_URL=http://qdrant:6333 # AI Services OLLAMA_URL=http://ollama:11434 LITELLM_URL=http://litellm:4000 ``` ## Development ### Setup ```bash # Create virtual environment python -m venv venv source venv/bin/activate # or `venv\Scripts\activate` on Windows # Install dependencies pip install -r requirements.txt ``` ### Run Locally (stdio) ```bash export SQLITE_PATH=./data/documents.db export QDRANT_URL=http://localhost:6333 export OLLAMA_URL=http://localhost:11434 export LITELLM_URL=http://localhost:4000 python -m personal_rag_mcp.server ``` ### Run Locally (HTTP) ```bash export TRANSPORT=http export PORT=8765 python -m personal_rag_mcp.server ``` ## Docker Deployment ### Prerequisites This MCP server depends on the following AI infrastructure services: - **Qdrant** (vector database) - Port 6333 - **Ollama** (embeddings) - Port 11434 - **LiteLLM** (LLM proxy) - Port 4000/8000 ### Example Docker Compose Integration ```yaml services: # Required: Qdrant vector database qdrant: image: qdrant/qdrant:latest container_name: qdrant ports: - "6333:6333" volumes: - qdrant-data:/qdrant/storage restart: unless-stopped # Required: Ollama for embeddings ollama: image: ollama/ollama:latest container_name: ollama ports: - "11434:11434" volumes: - ollama-data:/root/.ollama restart: unless-stopped # Required: LiteLLM proxy for LLM access litellm-proxy: image: ghcr.io/berriai/litellm:main-latest container_name: litellm-proxy ports: - "4080:8000" volumes: - ./litellm_config.yaml:/app/config.yaml environment: - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} - AWS_REGION=${AWS_REGION} - OLLAMA_API_BASE=http://ollama:11434 entrypoint: ["litellm", "--config", "/app/config.yaml", "--port", "8000"] depends_on: - ollama restart: unless-stopped # Personal RAG MCP Server personal-rag-mcp: build: ./personal-rag-mcp container_name: personal-rag-mcp ports: - "8765:8765" environment: - TRANSPORT=http - PORT=8765 - QDRANT_URL=http://qdrant:6333 - OLLAMA_URL=http://ollama:11434 - LITELLM_URL=http://litellm-proxy:8000 - OPENAI_API_KEY=${LITELLM_API_KEY} # LiteLLM auth - SQLITE_PATH=/app/data/documents.db volumes: - personal-rag-data:/app/data - ./config/personal-rag:/app/config:ro depends_on: - qdrant - ollama - litellm-proxy restart: unless-stopped volumes: qdrant-data: ollama-data: personal-rag-data: ``` ### LiteLLM Configuration Example The MCP server uses **LiteLLM** as a unified proxy, which means you can use any LLM provider: - **Local**: Ollama (llama3, deepseek, qwen, etc.) - **Cloud**: OpenAI, Anthropic Claude, Google Gemini, Cohere - **AWS Bedrock**: Claude, Llama, Mistral, etc. - **Azure OpenAI**: GPT-4, GPT-3.5 - **100+ other providers**: See [LiteLLM docs](https://docs.litellm.ai/docs/providers) Simply configure your preferred models in `litellm_config.yaml`: ```yaml model_list: # Local Ollama models (no API key needed) - model_name: deepseek-r1-1.5b litellm_params: model: ollama/deepseek-r1:1.5b api_base: http://ollama:11434 # AWS Bedrock models - model_name: bedrock-claude-3-5-sonnet-v2 litellm_params: model: bedrock/us.anthropic.claude-3-5-sonnet-20241022-v2:0 aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY aws_region_name: us-east-2 # OpenAI models - model_name: gpt-4o litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY # Anthropic Claude - model_name: claude-3-5-sonnet litellm_params: model: anthropic/claude-3-5-sonnet-20241022 api_key: os.environ/ANTHROPIC_API_KEY # Embedding model (for semantic search) - model_name: nomic-embed-text litellm_params: model: ollama/nomic-embed-text api_base: http://ollama:11434 general_settings: master_key: sk-1234 # Set LITELLM_API_KEY in .env ``` The server defaults to using whatever model is configured in LiteLLM. You can easily switch between local and cloud models without changing the MCP server code. ### Environment File (.env) ```bash # LiteLLM API Key LITELLM_API_KEY=sk-1234 # AWS Credentials (optional, for Bedrock models) AWS_ACCESS_KEY_ID=your_access_key AWS_SECRET_ACCESS_KEY=your_secret_key AWS_REGION=us-east-2 ``` ### First-Time Setup 1. **Pull required Ollama models:** ```bash docker exec ollama ollama pull nomic-embed-text docker exec ollama ollama pull deepseek-r1:1.5b ``` 2. **Verify services are running:** ```bash curl http://localhost:6333/collections # Qdrant curl http://localhost:11434/api/tags # Ollama curl -H "Authorization: Bearer sk-1234" http://localhost:4080/v1/models # LiteLLM ``` 3. **Test the MCP server:** ```bash docker exec personal-rag-mcp python /app/scripts/test_e2e.py ``` For complete infrastructure setup, see the parent repository. ## Roadmap ### Phase 1 (Current) - ✅ Hybrid SQLite + Qdrant storage - ✅ Basic RAG pipeline (vector retrieval) - ✅ MCP tools (store, search, ask) - ✅ Dual transport (stdio + HTTP) ### Phase 2 (Future) - [ ] Advanced RAG features (reranking, hybrid search, query expansion) - [ ] Bulk document ingestion (PDF, DOCX parsing) - [ ] Conversation history capture - [ ] Multi-user support with authentication ## License MIT

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/timerickson/personal-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server