Integrates with Google AI Studio for embeddings (text-embedding-004) and text generation (gemini-1.5-flash) to power a Retrieval-Augmented Generation (RAG) system with document storage, vector search, and context-enhanced answers.
RAG Server with MCP Integration
A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.
Features
Core Capabilities
Document Storage: Upload and store text (.txt) and Markdown (.md) documents
Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
Vector Search: Efficient similarity search using Qdrant vector database
Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
REST API: FastAPI-based REST API with automatic OpenAPI documentation
MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration
Code Indexing: Index and search source code repositories with semantic understanding
Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods
Advanced Features
Tag-Based Organization: Organize documents with multiple tags for easy categorization
Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
Context-Enhanced Answers: LLM receives section context for more accurate responses
Flexible Filtering: Filter documents by tags and/or section paths during queries
Document Structure API: Explore table of contents and section organization
GitHub Integration: Parse and extract content from GitHub URLs
Reference Following: Automatically follow documentation references for comprehensive answers
Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes
Rate Limiting: Built-in rate limiting for API endpoints
Project Structure
Installation
Prerequisites
Python 3.13 or higher
Google AI Studio API key (Get one here)
Setup
Clone or navigate to the project directory
Install dependencies
Configure environment variables
Start Qdrant (optional - using Docker)
Usage
Running the FastAPI Server
Start the REST API server:
The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.
API Endpoints
Core Endpoints:
POST /documents - Upload a document
POST /query - Query the RAG system (standard mode)
POST /query-enhanced - Query with automatic reference following
POST /smart-query - Smart query with automatic routing
GET /documents - List all documents
DELETE /documents/{doc_id} - Delete a document
GET /stats - Get system statistics
GET /health - Health check
GET /tags - List all available tags
GET /documents/{doc_id}/sections - Get document structure
OpenAI-Compatible Endpoints:
POST /v1/chat/completions - OpenAI-compatible chat completions
GET /v1/models - List available models
Example Usage with curl
Running the MCP Server
The MCP server allows integration with Claude and other MCP-compatible clients.
MCP Tools Available
query_rag - Query the RAG system with a question
query_rag_enhanced - Query with automatic reference following
smart_query - Smart query with automatic routing and classification
add_document - Add a document to the RAG system
list_documents - List all stored documents
delete_document - Delete a document by ID
get_rag_stats - Get system statistics
get_tags - List all available tags
get_document_structure - Get document table of contents
Using with Claude Desktop
Add to your Claude Desktop configuration (claude_desktop_config.json):
See QUICK_START.md for a quick setup guide.
Configuration
All configuration is managed through environment variables (defined in .env):
Variable | Description | Default |
| Google AI Studio API key | (required) |
| Size of text chunks in characters | 1000 |
| Overlap between chunks | 200 |
| Number of chunks to retrieve | 5 |
| Path to Qdrant storage | ./qdrant_storage |
| Qdrant collection name | documents |
| FastAPI server host | 0.0.0.0 |
| FastAPI server port | 8000 |
| Google embedding model | text-embedding-004 |
| Google LLM model | gemini-1.5-flash |
Architecture
Document Processing Pipeline
Upload - User uploads a .txt or .md file
Processing - Document is read and metadata extracted (including frontmatter)
Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text
Embedding - Each chunk is converted to a vector using Google AI embeddings
Storage - Vectors and metadata are stored in Qdrant
Query Pipeline
Standard Query
Query - User submits a question
Embedding - Question is converted to a vector
Retrieval - Similar chunks are retrieved from Qdrant
Generation - Context is provided to Google AI Studio model
Response - Answer is generated and returned with sources
Smart Query
Classification - Query is classified (documentation, code, conceptual, etc.)
Routing - Automatically selects best retrieval strategy
Multi-Source - May combine documentation search, code search, and direct answers
Synthesis - Generates comprehensive answer from multiple sources
Code Indexing
The system can index source code repositories:
Code is indexed with:
Class and function definitions
Docstrings and comments
File structure and imports
Semantic embeddings for natural language queries
Development
Running Tests
Code Style
The project follows Python best practices with type hints and docstrings.
Troubleshooting
Common Issues
Issue: GOOGLE_API_KEY not found
Solution: Ensure you've created a
.envfile and added your Google API key
Issue: Unsupported file type
Solution: Only .txt and .md files are supported. Convert other formats first.
Issue: Collection already exists error
Solution: Delete the
qdrant_storage/directory to reset the database
Issue: MCP server not connecting
Solution: Check that the path in your MCP config is correct and the
.envfile is in the project root
Advanced Usage
Tag-Based Organization
Organize your documents with tags for easy categorization and filtering:
Hierarchical Document Structure
For markdown documents, the system automatically preserves heading hierarchy:
Section-Aware Queries
The system includes section context when generating answers:
Smart Query Modes
The system supports three query modes:
Standard (
/query) - Basic vector search and retrievalEnhanced (
/query-enhanced) - Follows documentation references automaticallySmart (
/smart-query) - Automatic classification and routing
Use the OpenAI-compatible API to access different modes:
MCP Tools
The MCP server provides enhanced tools for Claude and other MCP clients:
query_rag - Query with optional tags and section filtering
smart_query - Smart query with automatic routing
add_document - Upload with tags
get_tags - List all tags
get_document_structure - Get table of contents
API Reference
Enhanced Endpoints
POST /documents
Body:
file(multipart),tags(comma-separated string)Response: Document info with tags and chunk count
POST /query
Body:
{"question": "...", "tags": [...], "section_path": "..."}Response: Answer with section-aware sources
POST /smart-query
Body:
{"question": "..."}Response: Smart answer with automatic routing and classification
GET /tags
Response:
{"tags": [...], "total": N}
GET /documents/{doc_id}/sections
Response: Document structure with section hierarchy
GET /documents?tags=tag1,tag2
Query filtered by tags
Response: List of matching documents
POST /v1/chat/completions
OpenAI-compatible chat completion endpoint
Supports models:
rag-standard,rag-enhanced,rag-smartSupports streaming with
stream: true
GET /v1/models
List available RAG models
Additional Documentation
QUICK_START.md - Quick setup guide for MCP integration
MCP_SETUP.md - Detailed MCP server setup
OPENAI_API_GUIDE.md - OpenAI-compatible API documentation
QUERY_ROUTING_GUIDE.md - Smart query routing guide
MULTI_MODE_RETRIEVAL_GUIDE.md - Multi-mode retrieval documentation
CODE_INDEX_GUIDE.md - Code indexing and search guide
RATE_LIMITING.md - Rate limiting configuration
TEST_COVERAGE.md - Test coverage and testing guide
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
Google AI Studio for embeddings and LLM capabilities
Qdrant for vector database
FastAPI for the REST API framework
Anthropic MCP for the Model Context Protocol
This server cannot be installed