RAG Server with MCP Integration

A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.

Features

Core Capabilities

Document Storage: Upload and store text (.txt) and Markdown (.md) documents
Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
Vector Search: Efficient similarity search using Qdrant vector database
Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
REST API: FastAPI-based REST API with automatic OpenAPI documentation
MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration
Code Indexing: Index and search source code repositories with semantic understanding
Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods

Advanced Features

Tag-Based Organization: Organize documents with multiple tags for easy categorization
Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
Context-Enhanced Answers: LLM receives section context for more accurate responses
Flexible Filtering: Filter documents by tags and/or section paths during queries
Document Structure API: Explore table of contents and section organization
GitHub Integration: Parse and extract content from GitHub URLs
Reference Following: Automatically follow documentation references for comprehensive answers
Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes
Rate Limiting: Built-in rate limiting for API endpoints

Project Structure

mcp-rag-docs/ config/ __init__.py settings.py # Configuration and settings rag_server/ __init__.py models.py # Pydantic models for API openai_api.py # OpenAI-compatible API endpoints openai_models.py # OpenAI API models rag_system.py # Core RAG system logic server.py # FastAPI server smart_query.py # Smart query routing mcp_server/ __init__.py server.py # MCP server implementation utils/ __init__.py code_indexer.py # Source code indexing code_index_store.py # Code index storage document_processor.py # Document processing embeddings.py # Google AI embeddings frontmatter_parser.py # YAML frontmatter parsing github_parser.py # GitHub URL parsing google_api_client.py # Google AI API client hierarchical_chunker.py # Hierarchical document chunking markdown_parser.py # Markdown parsing query_classifier.py # Query type classification rate_limit_store.py # Rate limiting reference_extractor.py # Extract doc references retrieval_router.py # Multi-mode retrieval routing source_extractor.py # Extract source code snippets text_chunker.py # Text chunking utility vector_store.py # Qdrant vector store wrapper build_code_index.py # Build code index from repository check_github_urls.py # Validate GitHub URLs check_status.py # System status checker example_usage.py # Example usage scripts ingest_docs.py # Document ingestion utility main.py # Main entry point .env.example # Example environment variables docker-compose.yml # Docker setup for Qdrant pyproject.toml # Project dependencies

Installation

Prerequisites

Python 3.13 or higher
Google AI Studio API key (Get one here)

Setup

Clone or navigate to the project directory
Install dependencies

# Using pip pip install -e . # Or using uv (recommended) uv pip install -e .

Configure environment variables

# Copy the example env file cp .env.example .env # Edit .env and add your Google API key GOOGLE_API_KEY=your_api_key_here

Start Qdrant (optional - using Docker)

docker-compose up -d

Usage

Running the FastAPI Server

Start the REST API server:

python -m rag_server.server

The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

Core Endpoints:

POST /documents - Upload a document
POST /query - Query the RAG system (standard mode)
POST /query-enhanced - Query with automatic reference following
POST /smart-query - Smart query with automatic routing
GET /documents - List all documents
DELETE /documents/{doc_id} - Delete a document
GET /stats - Get system statistics
GET /health - Health check
GET /tags - List all available tags
GET /documents/{doc_id}/sections - Get document structure

OpenAI-Compatible Endpoints:

POST /v1/chat/completions - OpenAI-compatible chat completions
GET /v1/models - List available models

Example Usage with curl

# Upload a document curl -X POST "http://localhost:8000/documents" \ -F "file=@example.txt" # Upload with tags curl -X POST "http://localhost:8000/documents" \ -F "file=@dagster-docs.md" \ -F "tags=dagster,python,orchestration" # Query the RAG system curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What is the main topic of the documents?", "top_k": 5}' # Smart query with automatic routing curl -X POST "http://localhost:8000/smart-query" \ -H "Content-Type: application/json" \ -d '{"question": "How do I create a Dagster asset?"}' # OpenAI-compatible chat completion curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "rag-smart", "messages": [{"role": "user", "content": "What is an asset in Dagster?"}], "stream": false }' # List documents curl "http://localhost:8000/documents" # Get statistics curl "http://localhost:8000/stats"

Running the MCP Server

The MCP server allows integration with Claude and other MCP-compatible clients.

python -m mcp_server.server

MCP Tools Available

query_rag - Query the RAG system with a question
query_rag_enhanced - Query with automatic reference following
smart_query - Smart query with automatic routing and classification
add_document - Add a document to the RAG system
list_documents - List all stored documents
delete_document - Delete a document by ID
get_rag_stats - Get system statistics
get_tags - List all available tags
get_document_structure - Get document table of contents

Using with Claude Desktop

Add to your Claude Desktop configuration (claude_desktop_config.json):

{ "mcpServers": { "rag": { "command": "uv", "args": [ "--directory", "/path/to/mcp-rag-docs", "run", "python", "-m", "mcp_server.server" ] } } }

See QUICK_START.md for a quick setup guide.

Configuration

All configuration is managed through environment variables (defined in .env):

Variable	Description	Default
`GOOGLE_API_KEY`	Google AI Studio API key	(required)
`CHUNK_SIZE`	Size of text chunks in characters	1000
`CHUNK_OVERLAP`	Overlap between chunks	200
`TOP_K_RESULTS`	Number of chunks to retrieve	5
`QDRANT_PATH`	Path to Qdrant storage	./qdrant_storage
`QDRANT_COLLECTION_NAME`	Qdrant collection name	documents
`FASTAPI_HOST`	FastAPI server host	0.0.0.0
`FASTAPI_PORT`	FastAPI server port	8000
`EMBEDDING_MODEL`	Google embedding model	text-embedding-004
`LLM_MODEL`	Google LLM model	gemini-1.5-flash

Architecture

Document Processing Pipeline

Upload - User uploads a .txt or .md file
Processing - Document is read and metadata extracted (including frontmatter)
Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text
Embedding - Each chunk is converted to a vector using Google AI embeddings
Storage - Vectors and metadata are stored in Qdrant

Query Pipeline

Standard Query

Query - User submits a question
Embedding - Question is converted to a vector
Retrieval - Similar chunks are retrieved from Qdrant
Generation - Context is provided to Google AI Studio model
Response - Answer is generated and returned with sources

Smart Query

Classification - Query is classified (documentation, code, conceptual, etc.)
Routing - Automatically selects best retrieval strategy
Multi-Source - May combine documentation search, code search, and direct answers
Synthesis - Generates comprehensive answer from multiple sources

Code Indexing

The system can index source code repositories:

# Build code index python build_code_index.py /path/to/repo # Query code through the API or MCP server

Code is indexed with:

Class and function definitions
Docstrings and comments
File structure and imports
Semantic embeddings for natural language queries

Development

Running Tests

# Install test dependencies pip install pytest pytest-asyncio httpx # Run tests pytest # Run specific test files pytest test_openai_api.py pytest test_mcp_integration.py

Code Style

The project follows Python best practices with type hints and docstrings.

Troubleshooting

Common Issues

Issue: GOOGLE_API_KEY not found

Solution: Ensure you've created a .env file and added your Google API key

Issue: Unsupported file type

Solution: Only .txt and .md files are supported. Convert other formats first.

Issue: Collection already exists error

Solution: Delete the qdrant_storage/ directory to reset the database

Issue: MCP server not connecting

Solution: Check that the path in your MCP config is correct and the .env file is in the project root

Advanced Usage

Tag-Based Organization

Organize your documents with tags for easy categorization and filtering:

# Upload document with tags curl -X POST "http://localhost:8000/documents" \ -F "file=@dagster-docs.md" \ -F "tags=dagster,python,orchestration" # List all available tags curl "http://localhost:8000/tags" # Query only dagster-related documents curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}' # List documents filtered by tags curl "http://localhost:8000/documents?tags=dagster,python"

Hierarchical Document Structure

For markdown documents, the system automatically preserves heading hierarchy:

# Get document structure (table of contents) curl "http://localhost:8000/documents/{doc_id}/sections" # Query specific section curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'

Section-Aware Queries

The system includes section context when generating answers:

# Example: Markdown document structure # Installation # Prerequisites # Python Version # Setup Steps # When you query about "Python version requirements" # The system will: # 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version" # 2. Include section path in context sent to LLM # 3. Cite sources with full section paths

Smart Query Modes

The system supports three query modes:

Standard (/query) - Basic vector search and retrieval
Enhanced (/query-enhanced) - Follows documentation references automatically
Smart (/smart-query) - Automatic classification and routing

Use the OpenAI-compatible API to access different modes:

# Standard mode curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}' # Enhanced mode with reference following curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}' # Smart mode with automatic routing curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

MCP Tools

The MCP server provides enhanced tools for Claude and other MCP clients:

query_rag - Query with optional tags and section filtering

{ "question": "How do I deploy?", "tags": ["dagster"], "section_path": "Deployment" }

smart_query - Smart query with automatic routing

{ "question": "What is an asset and how do I use it?" }

add_document - Upload with tags

{ "file_path": "/path/to/doc.md", "tags": ["dagster", "docs"] }

get_tags - List all tags

get_document_structure - Get table of contents

{ "doc_id": "abc123" }

API Reference

Enhanced Endpoints

POST /documents

Body: file (multipart), tags (comma-separated string)
Response: Document info with tags and chunk count

POST /query

Body: {"question": "...", "tags": [...], "section_path": "..."}
Response: Answer with section-aware sources

POST /smart-query

Body: {"question": "..."}
Response: Smart answer with automatic routing and classification

GET /tags

Response: {"tags": [...], "total": N}

GET /documents/{doc_id}/sections

Response: Document structure with section hierarchy

GET /documents?tags=tag1,tag2

Query filtered by tags
Response: List of matching documents

POST /v1/chat/completions

OpenAI-compatible chat completion endpoint
Supports models: rag-standard, rag-enhanced, rag-smart
Supports streaming with stream: true

GET /v1/models

List available RAG models

Additional Documentation

QUICK_START.md - Quick setup guide for MCP integration
MCP_SETUP.md - Detailed MCP server setup
OPENAI_API_GUIDE.md - OpenAI-compatible API documentation
QUERY_ROUTING_GUIDE.md - Smart query routing guide
MULTI_MODE_RETRIEVAL_GUIDE.md - Multi-mode retrieval documentation
CODE_INDEX_GUIDE.md - Code indexing and search guide
RATE_LIMITING.md - Rate limiting configuration
TEST_COVERAGE.md - Test coverage and testing guide

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Google AI Studio for embeddings and LLM capabilities
Qdrant for vector database
FastAPI for the REST API framework
Anthropic MCP for the Model Context Protocol