README.md•15.8 kB
# RAG Server with MCP Integration
A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.
## Features
### Core Capabilities
- **Document Storage**: Upload and store text (.txt) and Markdown (.md) documents
- **Hierarchical Chunking**: Structure-aware chunking for markdown that preserves document hierarchy
- **Vector Search**: Efficient similarity search using Qdrant vector database
- **Google AI Integration**: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
- **REST API**: FastAPI-based REST API with automatic OpenAPI documentation
- **MCP Server**: Model Context Protocol server for seamless integration with Claude and other MCP clients
- **OpenAI-Compatible API**: Supports OpenAI-compatible chat completions for web UI integration
- **Code Indexing**: Index and search source code repositories with semantic understanding
- **Smart Query Routing**: Automatic query classification and routing to appropriate retrieval methods
### Advanced Features
- **Tag-Based Organization**: Organize documents with multiple tags for easy categorization
- **Section-Aware Retrieval**: Query specific sections of documentation (e.g., "Installation > Prerequisites")
- **Markdown Structure Preservation**: Automatic extraction of heading hierarchy with breadcrumb paths
- **Context-Enhanced Answers**: LLM receives section context for more accurate responses
- **Flexible Filtering**: Filter documents by tags and/or section paths during queries
- **Document Structure API**: Explore table of contents and section organization
- **GitHub Integration**: Parse and extract content from GitHub URLs
- **Reference Following**: Automatically follow documentation references for comprehensive answers
- **Multi-Mode Retrieval**: Choose between standard, enhanced, or smart query modes
- **Rate Limiting**: Built-in rate limiting for API endpoints
## Project Structure
```
mcp-rag-docs/
config/
__init__.py
settings.py # Configuration and settings
rag_server/
__init__.py
models.py # Pydantic models for API
openai_api.py # OpenAI-compatible API endpoints
openai_models.py # OpenAI API models
rag_system.py # Core RAG system logic
server.py # FastAPI server
smart_query.py # Smart query routing
mcp_server/
__init__.py
server.py # MCP server implementation
utils/
__init__.py
code_indexer.py # Source code indexing
code_index_store.py # Code index storage
document_processor.py # Document processing
embeddings.py # Google AI embeddings
frontmatter_parser.py # YAML frontmatter parsing
github_parser.py # GitHub URL parsing
google_api_client.py # Google AI API client
hierarchical_chunker.py # Hierarchical document chunking
markdown_parser.py # Markdown parsing
query_classifier.py # Query type classification
rate_limit_store.py # Rate limiting
reference_extractor.py # Extract doc references
retrieval_router.py # Multi-mode retrieval routing
source_extractor.py # Extract source code snippets
text_chunker.py # Text chunking utility
vector_store.py # Qdrant vector store wrapper
build_code_index.py # Build code index from repository
check_github_urls.py # Validate GitHub URLs
check_status.py # System status checker
example_usage.py # Example usage scripts
ingest_docs.py # Document ingestion utility
main.py # Main entry point
.env.example # Example environment variables
docker-compose.yml # Docker setup for Qdrant
pyproject.toml # Project dependencies
```
## Installation
### Prerequisites
- Python 3.13 or higher
- Google AI Studio API key ([Get one here](https://makersuite.google.com/app/apikey))
### Setup
1. **Clone or navigate to the project directory**
2. **Install dependencies**
```bash
# Using pip
pip install -e .
# Or using uv (recommended)
uv pip install -e .
```
3. **Configure environment variables**
```bash
# Copy the example env file
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
```
4. **Start Qdrant (optional - using Docker)**
```bash
docker-compose up -d
```
## Usage
### Running the FastAPI Server
Start the REST API server:
```bash
python -m rag_server.server
```
The server will start at `http://localhost:8000`. Visit `http://localhost:8000/docs` for interactive API documentation.
#### API Endpoints
**Core Endpoints:**
- **POST /documents** - Upload a document
- **POST /query** - Query the RAG system (standard mode)
- **POST /query-enhanced** - Query with automatic reference following
- **POST /smart-query** - Smart query with automatic routing
- **GET /documents** - List all documents
- **DELETE /documents/{doc_id}** - Delete a document
- **GET /stats** - Get system statistics
- **GET /health** - Health check
- **GET /tags** - List all available tags
- **GET /documents/{doc_id}/sections** - Get document structure
**OpenAI-Compatible Endpoints:**
- **POST /v1/chat/completions** - OpenAI-compatible chat completions
- **GET /v1/models** - List available models
#### Example Usage with curl
```bash
# Upload a document
curl -X POST "http://localhost:8000/documents" \
-F "file=@example.txt"
# Upload with tags
curl -X POST "http://localhost:8000/documents" \
-F "file=@dagster-docs.md" \
-F "tags=dagster,python,orchestration"
# Query the RAG system
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic of the documents?", "top_k": 5}'
# Smart query with automatic routing
curl -X POST "http://localhost:8000/smart-query" \
-H "Content-Type: application/json" \
-d '{"question": "How do I create a Dagster asset?"}'
# OpenAI-compatible chat completion
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "rag-smart",
"messages": [{"role": "user", "content": "What is an asset in Dagster?"}],
"stream": false
}'
# List documents
curl "http://localhost:8000/documents"
# Get statistics
curl "http://localhost:8000/stats"
```
### Running the MCP Server
The MCP server allows integration with Claude and other MCP-compatible clients.
```bash
python -m mcp_server.server
```
#### MCP Tools Available
1. **query_rag** - Query the RAG system with a question
2. **query_rag_enhanced** - Query with automatic reference following
3. **smart_query** - Smart query with automatic routing and classification
4. **add_document** - Add a document to the RAG system
5. **list_documents** - List all stored documents
6. **delete_document** - Delete a document by ID
7. **get_rag_stats** - Get system statistics
8. **get_tags** - List all available tags
9. **get_document_structure** - Get document table of contents
#### Using with Claude Desktop
Add to your Claude Desktop configuration (`claude_desktop_config.json`):
```json
{
"mcpServers": {
"rag": {
"command": "uv",
"args": [
"--directory",
"/path/to/mcp-rag-docs",
"run",
"python",
"-m",
"mcp_server.server"
]
}
}
}
```
See [QUICK_START.md](QUICK_START.md) for a quick setup guide.
## Configuration
All configuration is managed through environment variables (defined in `.env`):
| Variable | Description | Default |
|----------|-------------|---------|
| `GOOGLE_API_KEY` | Google AI Studio API key | (required) |
| `CHUNK_SIZE` | Size of text chunks in characters | 1000 |
| `CHUNK_OVERLAP` | Overlap between chunks | 200 |
| `TOP_K_RESULTS` | Number of chunks to retrieve | 5 |
| `QDRANT_PATH` | Path to Qdrant storage | ./qdrant_storage |
| `QDRANT_COLLECTION_NAME` | Qdrant collection name | documents |
| `FASTAPI_HOST` | FastAPI server host | 0.0.0.0 |
| `FASTAPI_PORT` | FastAPI server port | 8000 |
| `EMBEDDING_MODEL` | Google embedding model | text-embedding-004 |
| `LLM_MODEL` | Google LLM model | gemini-1.5-flash |
## Architecture
### Document Processing Pipeline
1. **Upload** - User uploads a .txt or .md file
2. **Processing** - Document is read and metadata extracted (including frontmatter)
3. **Chunking** - Text is split using hierarchical chunking for markdown or standard chunking for text
4. **Embedding** - Each chunk is converted to a vector using Google AI embeddings
5. **Storage** - Vectors and metadata are stored in Qdrant
### Query Pipeline
#### Standard Query
1. **Query** - User submits a question
2. **Embedding** - Question is converted to a vector
3. **Retrieval** - Similar chunks are retrieved from Qdrant
4. **Generation** - Context is provided to Google AI Studio model
5. **Response** - Answer is generated and returned with sources
#### Smart Query
1. **Classification** - Query is classified (documentation, code, conceptual, etc.)
2. **Routing** - Automatically selects best retrieval strategy
3. **Multi-Source** - May combine documentation search, code search, and direct answers
4. **Synthesis** - Generates comprehensive answer from multiple sources
### Code Indexing
The system can index source code repositories:
```bash
# Build code index
python build_code_index.py /path/to/repo
# Query code through the API or MCP server
```
Code is indexed with:
- Class and function definitions
- Docstrings and comments
- File structure and imports
- Semantic embeddings for natural language queries
## Development
### Running Tests
```bash
# Install test dependencies
pip install pytest pytest-asyncio httpx
# Run tests
pytest
# Run specific test files
pytest test_openai_api.py
pytest test_mcp_integration.py
```
### Code Style
The project follows Python best practices with type hints and docstrings.
## Troubleshooting
### Common Issues
**Issue**: `GOOGLE_API_KEY not found`
- **Solution**: Ensure you've created a `.env` file and added your Google API key
**Issue**: `Unsupported file type`
- **Solution**: Only .txt and .md files are supported. Convert other formats first.
**Issue**: `Collection already exists` error
- **Solution**: Delete the `qdrant_storage/` directory to reset the database
**Issue**: MCP server not connecting
- **Solution**: Check that the path in your MCP config is correct and the `.env` file is in the project root
## Advanced Usage
### Tag-Based Organization
Organize your documents with tags for easy categorization and filtering:
```bash
# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
-F "file=@dagster-docs.md" \
-F "tags=dagster,python,orchestration"
# List all available tags
curl "http://localhost:8000/tags"
# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'
# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"
```
### Hierarchical Document Structure
For markdown documents, the system automatically preserves heading hierarchy:
```bash
# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"
# Query specific section
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'
```
### Section-Aware Queries
The system includes section context when generating answers:
```python
# Example: Markdown document structure
# Installation
# Prerequisites
# Python Version
# Setup Steps
# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths
```
### Smart Query Modes
The system supports three query modes:
1. **Standard** (`/query`) - Basic vector search and retrieval
2. **Enhanced** (`/query-enhanced`) - Follows documentation references automatically
3. **Smart** (`/smart-query`) - Automatic classification and routing
Use the OpenAI-compatible API to access different modes:
```bash
# Standard mode
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
# Enhanced mode with reference following
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
# Smart mode with automatic routing
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
```
### MCP Tools
The MCP server provides enhanced tools for Claude and other MCP clients:
**query_rag** - Query with optional tags and section filtering
```json
{
"question": "How do I deploy?",
"tags": ["dagster"],
"section_path": "Deployment"
}
```
**smart_query** - Smart query with automatic routing
```json
{
"question": "What is an asset and how do I use it?"
}
```
**add_document** - Upload with tags
```json
{
"file_path": "/path/to/doc.md",
"tags": ["dagster", "docs"]
}
```
**get_tags** - List all tags
**get_document_structure** - Get table of contents
```json
{
"doc_id": "abc123"
}
```
## API Reference
### Enhanced Endpoints
**POST /documents**
- Body: `file` (multipart), `tags` (comma-separated string)
- Response: Document info with tags and chunk count
**POST /query**
- Body: `{"question": "...", "tags": [...], "section_path": "..."}`
- Response: Answer with section-aware sources
**POST /smart-query**
- Body: `{"question": "..."}`
- Response: Smart answer with automatic routing and classification
**GET /tags**
- Response: `{"tags": [...], "total": N}`
**GET /documents/{doc_id}/sections**
- Response: Document structure with section hierarchy
**GET /documents?tags=tag1,tag2**
- Query filtered by tags
- Response: List of matching documents
**POST /v1/chat/completions**
- OpenAI-compatible chat completion endpoint
- Supports models: `rag-standard`, `rag-enhanced`, `rag-smart`
- Supports streaming with `stream: true`
**GET /v1/models**
- List available RAG models
## Additional Documentation
- [QUICK_START.md](QUICK_START.md) - Quick setup guide for MCP integration
- [MCP_SETUP.md](MCP_SETUP.md) - Detailed MCP server setup
- [OPENAI_API_GUIDE.md](OPENAI_API_GUIDE.md) - OpenAI-compatible API documentation
- [QUERY_ROUTING_GUIDE.md](QUERY_ROUTING_GUIDE.md) - Smart query routing guide
- [MULTI_MODE_RETRIEVAL_GUIDE.md](MULTI_MODE_RETRIEVAL_GUIDE.md) - Multi-mode retrieval documentation
- [CODE_INDEX_GUIDE.md](CODE_INDEX_GUIDE.md) - Code indexing and search guide
- [RATE_LIMITING.md](RATE_LIMITING.md) - Rate limiting configuration
- [TEST_COVERAGE.md](TEST_COVERAGE.md) - Test coverage and testing guide
## License
MIT License
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Acknowledgments
- Google AI Studio for embeddings and LLM capabilities
- Qdrant for vector database
- FastAPI for the REST API framework
- Anthropic MCP for the Model Context Protocol