Integrates with Google AI Studio for embeddings (text-embedding-004) and text generation (gemini-1.5-flash) to power a Retrieval-Augmented Generation (RAG) system with document storage, vector search, and context-enhanced answers.
RAG Server with MCP Integration
A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.
Features
Core Capabilities
Document Storage: Upload and store text (.txt) and Markdown (.md) documents
Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
Vector Search: Efficient similarity search using Qdrant vector database
Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
REST API: FastAPI-based REST API with automatic OpenAPI documentation
MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
Advanced Features
Tag-Based Organization: Organize documents with multiple tags for easy categorization
Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
Context-Enhanced Answers: LLM receives section context for more accurate responses
Flexible Filtering: Filter documents by tags and/or section paths during queries
Document Structure API: Explore table of contents and section organization
Project Structure
Installation
Prerequisites
Python 3.13 or higher
Google AI Studio API key (Get one here)
Setup
Clone or navigate to the project directory
Install dependencies
Configure environment variables
Usage
Running the FastAPI Server
Start the REST API server:
The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.
API Endpoints
POST /documents - Upload a document
POST /query - Query the RAG system
GET /documents - List all documents
DELETE /documents/{doc_id} - Delete a document
GET /stats - Get system statistics
GET /health - Health check
Example Usage with curl
Running the MCP Server
The MCP server allows integration with Claude and other MCP-compatible clients.
MCP Tools Available
query_rag - Query the RAG system with a question
add_document - Add a document to the RAG system
list_documents - List all stored documents
delete_document - Delete a document by ID
get_rag_stats - Get system statistics
Using with Claude Desktop
Add to your Claude Desktop configuration (claude_desktop_config.json):
Configuration
All configuration is managed through environment variables (defined in .env):
Variable | Description | Default |
| Google AI Studio API key | (required) |
| Size of text chunks in characters | 1000 |
| Overlap between chunks | 200 |
| Number of chunks to retrieve | 5 |
| Path to Qdrant storage | ./qdrant_storage |
| Qdrant collection name | documents |
| FastAPI server host | 0.0.0.0 |
| FastAPI server port | 8000 |
| Google embedding model | text-embedding-004 |
| Google LLM model | gemini-1.5-flash |
Architecture
Document Processing Pipeline
Upload - User uploads a .txt or .md file
Processing - Document is read and metadata extracted
Chunking - Text is split into overlapping chunks
Embedding - Each chunk is converted to a vector using Google AI embeddings
Storage - Vectors and metadata are stored in Qdrant
Query Pipeline
Query - User submits a question
Embedding - Question is converted to a vector
Retrieval - Similar chunks are retrieved from Qdrant
Generation - Context is provided to Google AI Studio Flash model
Response - Answer is generated and returned with sources
Development
Running Tests
Code Style
The project follows Python best practices with type hints and docstrings.
Troubleshooting
Common Issues
Issue: GOOGLE_API_KEY not found
Solution: Ensure you've created a
.envfile and added your Google API key
Issue: Unsupported file type
Solution: Only .txt and .md files are supported. Convert other formats first.
Issue: Collection already exists error
Solution: Delete the
qdrant_storage/directory to reset the database
Advanced Usage
Tag-Based Organization
Organize your documents with tags for easy categorization and filtering:
Hierarchical Document Structure
For markdown documents, the system automatically preserves heading hierarchy:
Section-Aware Queries
The system includes section context when generating answers:
MCP Tools
The MCP server provides enhanced tools for Claude and other MCP clients:
query_rag - Query with optional tags and section filtering
add_document - Upload with tags
get_tags - List all tags
get_document_structure - Get table of contents
API Reference
Enhanced Endpoints
POST /documents
Body:
file(multipart),tags(comma-separated string)Response: Document info with tags and chunk count
POST /query
Body:
{"question": "...", "tags": [...], "section_path": "..."}Response: Answer with section-aware sources
GET /tags
Response:
{"tags": [...], "total": N}
GET /documents/{doc_id}/sections
Response: Document structure with section hierarchy
GET /documents?tags=tag1,tag2
Query filtered by tags
Response: List of matching documents
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
Google AI Studio for embeddings and LLM capabilities
Qdrant for vector database
FastAPI for the REST API framework
Anthropic MCP for the Model Context Protocol
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables semantic search and question-answering over uploaded documents using vector embeddings and Google AI. Supports document organization with tags, section-aware queries, and hierarchical markdown structure preservation.