RAG Document Server

README.md•15.8 kB

# RAG Server with MCP Integration A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server. ## Features ### Core Capabilities - **Document Storage**: Upload and store text (.txt) and Markdown (.md) documents - **Hierarchical Chunking**: Structure-aware chunking for markdown that preserves document hierarchy - **Vector Search**: Efficient similarity search using Qdrant vector database - **Google AI Integration**: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash) - **REST API**: FastAPI-based REST API with automatic OpenAPI documentation - **MCP Server**: Model Context Protocol server for seamless integration with Claude and other MCP clients - **OpenAI-Compatible API**: Supports OpenAI-compatible chat completions for web UI integration - **Code Indexing**: Index and search source code repositories with semantic understanding - **Smart Query Routing**: Automatic query classification and routing to appropriate retrieval methods ### Advanced Features - **Tag-Based Organization**: Organize documents with multiple tags for easy categorization - **Section-Aware Retrieval**: Query specific sections of documentation (e.g., "Installation > Prerequisites") - **Markdown Structure Preservation**: Automatic extraction of heading hierarchy with breadcrumb paths - **Context-Enhanced Answers**: LLM receives section context for more accurate responses - **Flexible Filtering**: Filter documents by tags and/or section paths during queries - **Document Structure API**: Explore table of contents and section organization - **GitHub Integration**: Parse and extract content from GitHub URLs - **Reference Following**: Automatically follow documentation references for comprehensive answers - **Multi-Mode Retrieval**: Choose between standard, enhanced, or smart query modes - **Rate Limiting**: Built-in rate limiting for API endpoints ## Project Structure ``` mcp-rag-docs/ config/ __init__.py settings.py # Configuration and settings rag_server/ __init__.py models.py # Pydantic models for API openai_api.py # OpenAI-compatible API endpoints openai_models.py # OpenAI API models rag_system.py # Core RAG system logic server.py # FastAPI server smart_query.py # Smart query routing mcp_server/ __init__.py server.py # MCP server implementation utils/ __init__.py code_indexer.py # Source code indexing code_index_store.py # Code index storage document_processor.py # Document processing embeddings.py # Google AI embeddings frontmatter_parser.py # YAML frontmatter parsing github_parser.py # GitHub URL parsing google_api_client.py # Google AI API client hierarchical_chunker.py # Hierarchical document chunking markdown_parser.py # Markdown parsing query_classifier.py # Query type classification rate_limit_store.py # Rate limiting reference_extractor.py # Extract doc references retrieval_router.py # Multi-mode retrieval routing source_extractor.py # Extract source code snippets text_chunker.py # Text chunking utility vector_store.py # Qdrant vector store wrapper build_code_index.py # Build code index from repository check_github_urls.py # Validate GitHub URLs check_status.py # System status checker example_usage.py # Example usage scripts ingest_docs.py # Document ingestion utility main.py # Main entry point .env.example # Example environment variables docker-compose.yml # Docker setup for Qdrant pyproject.toml # Project dependencies ``` ## Installation ### Prerequisites - Python 3.13 or higher - Google AI Studio API key ([Get one here](https://makersuite.google.com/app/apikey)) ### Setup 1. **Clone or navigate to the project directory** 2. **Install dependencies** ```bash # Using pip pip install -e . # Or using uv (recommended) uv pip install -e . ``` 3. **Configure environment variables** ```bash # Copy the example env file cp .env.example .env # Edit .env and add your Google API key GOOGLE_API_KEY=your_api_key_here ``` 4. **Start Qdrant (optional - using Docker)** ```bash docker-compose up -d ``` ## Usage ### Running the FastAPI Server Start the REST API server: ```bash python -m rag_server.server ``` The server will start at `http://localhost:8000`. Visit `http://localhost:8000/docs` for interactive API documentation. #### API Endpoints **Core Endpoints:** - **POST /documents** - Upload a document - **POST /query** - Query the RAG system (standard mode) - **POST /query-enhanced** - Query with automatic reference following - **POST /smart-query** - Smart query with automatic routing - **GET /documents** - List all documents - **DELETE /documents/{doc_id}** - Delete a document - **GET /stats** - Get system statistics - **GET /health** - Health check - **GET /tags** - List all available tags - **GET /documents/{doc_id}/sections** - Get document structure **OpenAI-Compatible Endpoints:** - **POST /v1/chat/completions** - OpenAI-compatible chat completions - **GET /v1/models** - List available models #### Example Usage with curl ```bash # Upload a document curl -X POST "http://localhost:8000/documents" \ -F "file=@example.txt" # Upload with tags curl -X POST "http://localhost:8000/documents" \ -F "file=@dagster-docs.md" \ -F "tags=dagster,python,orchestration" # Query the RAG system curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What is the main topic of the documents?", "top_k": 5}' # Smart query with automatic routing curl -X POST "http://localhost:8000/smart-query" \ -H "Content-Type: application/json" \ -d '{"question": "How do I create a Dagster asset?"}' # OpenAI-compatible chat completion curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "rag-smart", "messages": [{"role": "user", "content": "What is an asset in Dagster?"}], "stream": false }' # List documents curl "http://localhost:8000/documents" # Get statistics curl "http://localhost:8000/stats" ``` ### Running the MCP Server The MCP server allows integration with Claude and other MCP-compatible clients. ```bash python -m mcp_server.server ``` #### MCP Tools Available 1. **query_rag** - Query the RAG system with a question 2. **query_rag_enhanced** - Query with automatic reference following 3. **smart_query** - Smart query with automatic routing and classification 4. **add_document** - Add a document to the RAG system 5. **list_documents** - List all stored documents 6. **delete_document** - Delete a document by ID 7. **get_rag_stats** - Get system statistics 8. **get_tags** - List all available tags 9. **get_document_structure** - Get document table of contents #### Using with Claude Desktop Add to your Claude Desktop configuration (`claude_desktop_config.json`): ```json { "mcpServers": { "rag": { "command": "uv", "args": [ "--directory", "/path/to/mcp-rag-docs", "run", "python", "-m", "mcp_server.server" ] } } } ``` See [QUICK_START.md](QUICK_START.md) for a quick setup guide. ## Configuration All configuration is managed through environment variables (defined in `.env`): | Variable | Description | Default | |----------|-------------|---------| | `GOOGLE_API_KEY` | Google AI Studio API key | (required) | | `CHUNK_SIZE` | Size of text chunks in characters | 1000 | | `CHUNK_OVERLAP` | Overlap between chunks | 200 | | `TOP_K_RESULTS` | Number of chunks to retrieve | 5 | | `QDRANT_PATH` | Path to Qdrant storage | ./qdrant_storage | | `QDRANT_COLLECTION_NAME` | Qdrant collection name | documents | | `FASTAPI_HOST` | FastAPI server host | 0.0.0.0 | | `FASTAPI_PORT` | FastAPI server port | 8000 | | `EMBEDDING_MODEL` | Google embedding model | text-embedding-004 | | `LLM_MODEL` | Google LLM model | gemini-1.5-flash | ## Architecture ### Document Processing Pipeline 1. **Upload** - User uploads a .txt or .md file 2. **Processing** - Document is read and metadata extracted (including frontmatter) 3. **Chunking** - Text is split using hierarchical chunking for markdown or standard chunking for text 4. **Embedding** - Each chunk is converted to a vector using Google AI embeddings 5. **Storage** - Vectors and metadata are stored in Qdrant ### Query Pipeline #### Standard Query 1. **Query** - User submits a question 2. **Embedding** - Question is converted to a vector 3. **Retrieval** - Similar chunks are retrieved from Qdrant 4. **Generation** - Context is provided to Google AI Studio model 5. **Response** - Answer is generated and returned with sources #### Smart Query 1. **Classification** - Query is classified (documentation, code, conceptual, etc.) 2. **Routing** - Automatically selects best retrieval strategy 3. **Multi-Source** - May combine documentation search, code search, and direct answers 4. **Synthesis** - Generates comprehensive answer from multiple sources ### Code Indexing The system can index source code repositories: ```bash # Build code index python build_code_index.py /path/to/repo # Query code through the API or MCP server ``` Code is indexed with: - Class and function definitions - Docstrings and comments - File structure and imports - Semantic embeddings for natural language queries ## Development ### Running Tests ```bash # Install test dependencies pip install pytest pytest-asyncio httpx # Run tests pytest # Run specific test files pytest test_openai_api.py pytest test_mcp_integration.py ``` ### Code Style The project follows Python best practices with type hints and docstrings. ## Troubleshooting ### Common Issues **Issue**: `GOOGLE_API_KEY not found` - **Solution**: Ensure you've created a `.env` file and added your Google API key **Issue**: `Unsupported file type` - **Solution**: Only .txt and .md files are supported. Convert other formats first. **Issue**: `Collection already exists` error - **Solution**: Delete the `qdrant_storage/` directory to reset the database **Issue**: MCP server not connecting - **Solution**: Check that the path in your MCP config is correct and the `.env` file is in the project root ## Advanced Usage ### Tag-Based Organization Organize your documents with tags for easy categorization and filtering: ```bash # Upload document with tags curl -X POST "http://localhost:8000/documents" \ -F "file=@dagster-docs.md" \ -F "tags=dagster,python,orchestration" # List all available tags curl "http://localhost:8000/tags" # Query only dagster-related documents curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}' # List documents filtered by tags curl "http://localhost:8000/documents?tags=dagster,python" ``` ### Hierarchical Document Structure For markdown documents, the system automatically preserves heading hierarchy: ```bash # Get document structure (table of contents) curl "http://localhost:8000/documents/{doc_id}/sections" # Query specific section curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}' ``` ### Section-Aware Queries The system includes section context when generating answers: ```python # Example: Markdown document structure # Installation # Prerequisites # Python Version # Setup Steps # When you query about "Python version requirements" # The system will: # 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version" # 2. Include section path in context sent to LLM # 3. Cite sources with full section paths ``` ### Smart Query Modes The system supports three query modes: 1. **Standard** (`/query`) - Basic vector search and retrieval 2. **Enhanced** (`/query-enhanced`) - Follows documentation references automatically 3. **Smart** (`/smart-query`) - Automatic classification and routing Use the OpenAI-compatible API to access different modes: ```bash # Standard mode curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}' # Enhanced mode with reference following curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}' # Smart mode with automatic routing curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}' ``` ### MCP Tools The MCP server provides enhanced tools for Claude and other MCP clients: **query_rag** - Query with optional tags and section filtering ```json { "question": "How do I deploy?", "tags": ["dagster"], "section_path": "Deployment" } ``` **smart_query** - Smart query with automatic routing ```json { "question": "What is an asset and how do I use it?" } ``` **add_document** - Upload with tags ```json { "file_path": "/path/to/doc.md", "tags": ["dagster", "docs"] } ``` **get_tags** - List all tags **get_document_structure** - Get table of contents ```json { "doc_id": "abc123" } ``` ## API Reference ### Enhanced Endpoints **POST /documents** - Body: `file` (multipart), `tags` (comma-separated string) - Response: Document info with tags and chunk count **POST /query** - Body: `{"question": "...", "tags": [...], "section_path": "..."}` - Response: Answer with section-aware sources **POST /smart-query** - Body: `{"question": "..."}` - Response: Smart answer with automatic routing and classification **GET /tags** - Response: `{"tags": [...], "total": N}` **GET /documents/{doc_id}/sections** - Response: Document structure with section hierarchy **GET /documents?tags=tag1,tag2** - Query filtered by tags - Response: List of matching documents **POST /v1/chat/completions** - OpenAI-compatible chat completion endpoint - Supports models: `rag-standard`, `rag-enhanced`, `rag-smart` - Supports streaming with `stream: true` **GET /v1/models** - List available RAG models ## Additional Documentation - [QUICK_START.md](QUICK_START.md) - Quick setup guide for MCP integration - [MCP_SETUP.md](MCP_SETUP.md) - Detailed MCP server setup - [OPENAI_API_GUIDE.md](OPENAI_API_GUIDE.md) - OpenAI-compatible API documentation - [QUERY_ROUTING_GUIDE.md](QUERY_ROUTING_GUIDE.md) - Smart query routing guide - [MULTI_MODE_RETRIEVAL_GUIDE.md](MULTI_MODE_RETRIEVAL_GUIDE.md) - Multi-mode retrieval documentation - [CODE_INDEX_GUIDE.md](CODE_INDEX_GUIDE.md) - Code indexing and search guide - [RATE_LIMITING.md](RATE_LIMITING.md) - Rate limiting configuration - [TEST_COVERAGE.md](TEST_COVERAGE.md) - Test coverage and testing guide ## License MIT License ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## Acknowledgments - Google AI Studio for embeddings and LLM capabilities - Qdrant for vector database - FastAPI for the REST API framework - Anthropic MCP for the Model Context Protocol

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jaimeferj/mcp-rag-docs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server