README.md•8.88 kB
# Personal RAG MCP Server
A Model Context Protocol (MCP) server that provides personal knowledge base with RAG (Retrieval-Augmented Generation) capabilities. Share context across Claude Desktop, Claude Code, VS Code, and Open WebUI.
## Features
- **Hybrid Storage**: SQLite for full-text documents + Qdrant for semantic search
- **Rich Metadata**: Comprehensive metadata capture for future extensibility
- **Dual Transport**: stdio (for Claude Desktop/VS Code) + HTTP Streaming (for Open WebUI)
- **Forward-Compatible**: Strategy pattern allows adding advanced RAG features without refactoring
- **Containerized**: Runs in Docker, connects to existing Qdrant/Ollama/LiteLLM infrastructure
## Architecture
```
User Input → MCP Tool
↓
[1] Generate embedding (Ollama)
↓
[2] Store full text + metadata in SQLite
↓
[3] Store vector in Qdrant
↓
Return confirmation
Search Query
↓
[1] Embed query (Ollama)
↓
[2] Search Qdrant (semantic search)
↓
[3] Retrieve full text from SQLite
↓
[4] Generate response (LiteLLM)
↓
Return answer + sources
```
## MCP Tools
### 1. `store_memory`
Store notes, documents, or snippets in the knowledge base.
```python
store_memory(
text="Your content here",
namespace="notes/personal", # Hierarchical organization
tags=["tag1", "tag2"],
title="Optional Title",
category="personal", # work, personal, family
content_type="note" # note, document, snippet
)
```
### 2. `search_memory`
Semantic search across your knowledge base.
```python
search_memory(
query="What did I learn about X?",
namespace="notes/personal", # Optional filter
limit=5,
content_type="note" # Optional filter
)
```
### 3. `ask_with_context`
Ask questions with RAG (retrieval + generation).
```python
ask_with_context(
question="What are my thoughts on X?",
namespace="notes/personal", # Optional filter
limit=5 # Context chunks to retrieve
)
```
## Project Structure
```
personal-rag-mcp/
├── Dockerfile
├── requirements.txt
├── README.md
├── config/
│ ├── pipeline.yaml # RAG pipeline config
│ └── server.yaml # Server config
├── personal_rag_mcp/
│ ├── server.py # MCP server entry point
│ ├── storage/
│ │ ├── sqlite_store.py # SQLite document storage
│ │ ├── qdrant_store.py # Qdrant vector storage
│ │ └── schema.py # Pydantic metadata models
│ ├── pipeline/
│ │ ├── retriever.py # Retrieval strategies
│ │ ├── reranker.py # Reranking strategies
│ │ ├── expander.py # Query expansion
│ │ ├── generator.py # LLM generation
│ │ └── pipeline.py # RAG orchestration
│ └── utils/
│ ├── embeddings.py # Ollama embedding client
│ └── chunking.py # Text chunking
├── scripts/
│ ├── init_db.py # Initialize database
│ └── backup.py # Backup utility
└── tests/
```
## Environment Variables
```bash
# Transport
TRANSPORT=http # or stdio
PORT=8765
# Storage
SQLITE_PATH=/app/data/documents.db
QDRANT_URL=http://qdrant:6333
# AI Services
OLLAMA_URL=http://ollama:11434
LITELLM_URL=http://litellm:4000
```
## Development
### Setup
```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install dependencies
pip install -r requirements.txt
```
### Run Locally (stdio)
```bash
export SQLITE_PATH=./data/documents.db
export QDRANT_URL=http://localhost:6333
export OLLAMA_URL=http://localhost:11434
export LITELLM_URL=http://localhost:4000
python -m personal_rag_mcp.server
```
### Run Locally (HTTP)
```bash
export TRANSPORT=http
export PORT=8765
python -m personal_rag_mcp.server
```
## Docker Deployment
### Prerequisites
This MCP server depends on the following AI infrastructure services:
- **Qdrant** (vector database) - Port 6333
- **Ollama** (embeddings) - Port 11434
- **LiteLLM** (LLM proxy) - Port 4000/8000
### Example Docker Compose Integration
```yaml
services:
# Required: Qdrant vector database
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
ports:
- "6333:6333"
volumes:
- qdrant-data:/qdrant/storage
restart: unless-stopped
# Required: Ollama for embeddings
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
restart: unless-stopped
# Required: LiteLLM proxy for LLM access
litellm-proxy:
image: ghcr.io/berriai/litellm:main-latest
container_name: litellm-proxy
ports:
- "4080:8000"
volumes:
- ./litellm_config.yaml:/app/config.yaml
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- AWS_REGION=${AWS_REGION}
- OLLAMA_API_BASE=http://ollama:11434
entrypoint: ["litellm", "--config", "/app/config.yaml", "--port", "8000"]
depends_on:
- ollama
restart: unless-stopped
# Personal RAG MCP Server
personal-rag-mcp:
build: ./personal-rag-mcp
container_name: personal-rag-mcp
ports:
- "8765:8765"
environment:
- TRANSPORT=http
- PORT=8765
- QDRANT_URL=http://qdrant:6333
- OLLAMA_URL=http://ollama:11434
- LITELLM_URL=http://litellm-proxy:8000
- OPENAI_API_KEY=${LITELLM_API_KEY} # LiteLLM auth
- SQLITE_PATH=/app/data/documents.db
volumes:
- personal-rag-data:/app/data
- ./config/personal-rag:/app/config:ro
depends_on:
- qdrant
- ollama
- litellm-proxy
restart: unless-stopped
volumes:
qdrant-data:
ollama-data:
personal-rag-data:
```
### LiteLLM Configuration Example
The MCP server uses **LiteLLM** as a unified proxy, which means you can use any LLM provider:
- **Local**: Ollama (llama3, deepseek, qwen, etc.)
- **Cloud**: OpenAI, Anthropic Claude, Google Gemini, Cohere
- **AWS Bedrock**: Claude, Llama, Mistral, etc.
- **Azure OpenAI**: GPT-4, GPT-3.5
- **100+ other providers**: See [LiteLLM docs](https://docs.litellm.ai/docs/providers)
Simply configure your preferred models in `litellm_config.yaml`:
```yaml
model_list:
# Local Ollama models (no API key needed)
- model_name: deepseek-r1-1.5b
litellm_params:
model: ollama/deepseek-r1:1.5b
api_base: http://ollama:11434
# AWS Bedrock models
- model_name: bedrock-claude-3-5-sonnet-v2
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20241022-v2:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-2
# OpenAI models
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
# Anthropic Claude
- model_name: claude-3-5-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
# Embedding model (for semantic search)
- model_name: nomic-embed-text
litellm_params:
model: ollama/nomic-embed-text
api_base: http://ollama:11434
general_settings:
master_key: sk-1234 # Set LITELLM_API_KEY in .env
```
The server defaults to using whatever model is configured in LiteLLM. You can easily switch between local and cloud models without changing the MCP server code.
### Environment File (.env)
```bash
# LiteLLM API Key
LITELLM_API_KEY=sk-1234
# AWS Credentials (optional, for Bedrock models)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-2
```
### First-Time Setup
1. **Pull required Ollama models:**
```bash
docker exec ollama ollama pull nomic-embed-text
docker exec ollama ollama pull deepseek-r1:1.5b
```
2. **Verify services are running:**
```bash
curl http://localhost:6333/collections # Qdrant
curl http://localhost:11434/api/tags # Ollama
curl -H "Authorization: Bearer sk-1234" http://localhost:4080/v1/models # LiteLLM
```
3. **Test the MCP server:**
```bash
docker exec personal-rag-mcp python /app/scripts/test_e2e.py
```
For complete infrastructure setup, see the parent repository.
## Roadmap
### Phase 1 (Current)
- ✅ Hybrid SQLite + Qdrant storage
- ✅ Basic RAG pipeline (vector retrieval)
- ✅ MCP tools (store, search, ask)
- ✅ Dual transport (stdio + HTTP)
### Phase 2 (Future)
- [ ] Advanced RAG features (reranking, hybrid search, query expansion)
- [ ] Bulk document ingestion (PDF, DOCX parsing)
- [ ] Conversation history capture
- [ ] Multi-user support with authentication
## License
MIT