README.md•9.22 kB
# RAG Server with MCP Integration
A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.
## Features
### Core Capabilities
- **Document Storage**: Upload and store text (.txt) and Markdown (.md) documents
- **Hierarchical Chunking**: Structure-aware chunking for markdown that preserves document hierarchy
- **Vector Search**: Efficient similarity search using Qdrant vector database
- **Google AI Integration**: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
- **REST API**: FastAPI-based REST API with automatic OpenAPI documentation
- **MCP Server**: Model Context Protocol server for seamless integration with Claude and other MCP clients
### Advanced Features
- **Tag-Based Organization**: Organize documents with multiple tags for easy categorization
- **Section-Aware Retrieval**: Query specific sections of documentation (e.g., "Installation > Prerequisites")
- **Markdown Structure Preservation**: Automatic extraction of heading hierarchy with breadcrumb paths
- **Context-Enhanced Answers**: LLM receives section context for more accurate responses
- **Flexible Filtering**: Filter documents by tags and/or section paths during queries
- **Document Structure API**: Explore table of contents and section organization
## Project Structure
```
rag/
config/
__init__.py
settings.py # Configuration and settings
rag_server/
__init__.py
models.py # Pydantic models for API
rag_system.py # Core RAG system logic
server.py # FastAPI server
mcp_server/
__init__.py
server.py # MCP server implementation
utils/
__init__.py
document_processor.py # Document processing
embeddings.py # Google AI embeddings
text_chunker.py # Text chunking utility
vector_store.py # Qdrant vector store wrapper
.env.example # Example environment variables
.gitignore
main.py
pyproject.toml
README.md
```
## Installation
### Prerequisites
- Python 3.13 or higher
- Google AI Studio API key ([Get one here](https://makersuite.google.com/app/apikey))
### Setup
1. **Clone or navigate to the project directory**
2. **Install dependencies**
```bash
# Using pip
pip install -e .
# Or using uv (recommended)
uv pip install -e .
```
3. **Configure environment variables**
```bash
# Copy the example env file
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
```
## Usage
### Running the FastAPI Server
Start the REST API server:
```bash
python -m rag_server.server
```
The server will start at `http://localhost:8000`. Visit `http://localhost:8000/docs` for interactive API documentation.
#### API Endpoints
- **POST /documents** - Upload a document
- **POST /query** - Query the RAG system
- **GET /documents** - List all documents
- **DELETE /documents/{doc_id}** - Delete a document
- **GET /stats** - Get system statistics
- **GET /health** - Health check
#### Example Usage with curl
```bash
# Upload a document
curl -X POST "http://localhost:8000/documents" \
-F "file=@example.txt"
# Query the RAG system
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic of the documents?", "top_k": 5}'
# List documents
curl "http://localhost:8000/documents"
# Get statistics
curl "http://localhost:8000/stats"
```
### Running the MCP Server
The MCP server allows integration with Claude and other MCP-compatible clients.
```bash
python -m mcp_server.server
```
#### MCP Tools Available
1. **query_rag** - Query the RAG system with a question
2. **add_document** - Add a document to the RAG system
3. **list_documents** - List all stored documents
4. **delete_document** - Delete a document by ID
5. **get_rag_stats** - Get system statistics
#### Using with Claude Desktop
Add to your Claude Desktop configuration (`claude_desktop_config.json`):
```json
{
"mcpServers": {
"rag": {
"command": "python",
"args": ["-m", "mcp_server.server"],
"cwd": "/path/to/rag"
}
}
}
```
## Configuration
All configuration is managed through environment variables (defined in `.env`):
| Variable | Description | Default |
|----------|-------------|---------|
| `GOOGLE_API_KEY` | Google AI Studio API key | (required) |
| `CHUNK_SIZE` | Size of text chunks in characters | 1000 |
| `CHUNK_OVERLAP` | Overlap between chunks | 200 |
| `TOP_K_RESULTS` | Number of chunks to retrieve | 5 |
| `QDRANT_PATH` | Path to Qdrant storage | ./qdrant_storage |
| `QDRANT_COLLECTION_NAME` | Qdrant collection name | documents |
| `FASTAPI_HOST` | FastAPI server host | 0.0.0.0 |
| `FASTAPI_PORT` | FastAPI server port | 8000 |
| `EMBEDDING_MODEL` | Google embedding model | text-embedding-004 |
| `LLM_MODEL` | Google LLM model | gemini-1.5-flash |
## Architecture
### Document Processing Pipeline
1. **Upload** - User uploads a .txt or .md file
2. **Processing** - Document is read and metadata extracted
3. **Chunking** - Text is split into overlapping chunks
4. **Embedding** - Each chunk is converted to a vector using Google AI embeddings
5. **Storage** - Vectors and metadata are stored in Qdrant
### Query Pipeline
1. **Query** - User submits a question
2. **Embedding** - Question is converted to a vector
3. **Retrieval** - Similar chunks are retrieved from Qdrant
4. **Generation** - Context is provided to Google AI Studio Flash model
5. **Response** - Answer is generated and returned with sources
## Development
### Running Tests
```bash
# Install test dependencies
pip install pytest pytest-asyncio httpx
# Run tests
pytest
```
### Code Style
The project follows Python best practices with type hints and docstrings.
## Troubleshooting
### Common Issues
**Issue**: `GOOGLE_API_KEY not found`
- **Solution**: Ensure you've created a `.env` file and added your Google API key
**Issue**: `Unsupported file type`
- **Solution**: Only .txt and .md files are supported. Convert other formats first.
**Issue**: `Collection already exists` error
- **Solution**: Delete the `qdrant_storage/` directory to reset the database
## Advanced Usage
### Tag-Based Organization
Organize your documents with tags for easy categorization and filtering:
```bash
# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
-F "file=@dagster-docs.md" \
-F "tags=dagster,python,orchestration"
# List all available tags
curl "http://localhost:8000/tags"
# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'
# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"
```
### Hierarchical Document Structure
For markdown documents, the system automatically preserves heading hierarchy:
```bash
# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"
# Query specific section
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'
```
### Section-Aware Queries
The system includes section context when generating answers:
```python
# Example: Markdown document structure
# Installation
# Prerequisites
# Python Version
# Setup Steps
# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths
```
### MCP Tools
The MCP server provides enhanced tools for Claude and other MCP clients:
**query_rag** - Query with optional tags and section filtering
```json
{
"question": "How do I deploy?",
"tags": ["dagster"],
"section_path": "Deployment"
}
```
**add_document** - Upload with tags
```json
{
"file_path": "/path/to/doc.md",
"tags": ["dagster", "docs"]
}
```
**get_tags** - List all tags
**get_document_structure** - Get table of contents
```json
{
"doc_id": "abc123"
}
```
## API Reference
### Enhanced Endpoints
**POST /documents**
- Body: `file` (multipart), `tags` (comma-separated string)
- Response: Document info with tags and chunk count
**POST /query**
- Body: `{"question": "...", "tags": [...], "section_path": "..."}`
- Response: Answer with section-aware sources
**GET /tags**
- Response: `{"tags": [...], "total": N}`
**GET /documents/{doc_id}/sections**
- Response: Document structure with section hierarchy
**GET /documents?tags=tag1,tag2**
- Query filtered by tags
- Response: List of matching documents
## License
MIT License
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Acknowledgments
- Google AI Studio for embeddings and LLM capabilities
- Qdrant for vector database
- FastAPI for the REST API framework
- Anthropic MCP for the Model Context Protocol