Skip to main content
Glama

RAG Document Server

by jaimeferj
README.md9.22 kB
# RAG Server with MCP Integration A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server. ## Features ### Core Capabilities - **Document Storage**: Upload and store text (.txt) and Markdown (.md) documents - **Hierarchical Chunking**: Structure-aware chunking for markdown that preserves document hierarchy - **Vector Search**: Efficient similarity search using Qdrant vector database - **Google AI Integration**: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash) - **REST API**: FastAPI-based REST API with automatic OpenAPI documentation - **MCP Server**: Model Context Protocol server for seamless integration with Claude and other MCP clients ### Advanced Features - **Tag-Based Organization**: Organize documents with multiple tags for easy categorization - **Section-Aware Retrieval**: Query specific sections of documentation (e.g., "Installation > Prerequisites") - **Markdown Structure Preservation**: Automatic extraction of heading hierarchy with breadcrumb paths - **Context-Enhanced Answers**: LLM receives section context for more accurate responses - **Flexible Filtering**: Filter documents by tags and/or section paths during queries - **Document Structure API**: Explore table of contents and section organization ## Project Structure ``` rag/  config/   __init__.py   settings.py # Configuration and settings  rag_server/   __init__.py   models.py # Pydantic models for API   rag_system.py # Core RAG system logic   server.py # FastAPI server  mcp_server/   __init__.py   server.py # MCP server implementation  utils/   __init__.py   document_processor.py # Document processing   embeddings.py # Google AI embeddings   text_chunker.py # Text chunking utility   vector_store.py # Qdrant vector store wrapper  .env.example # Example environment variables  .gitignore  main.py  pyproject.toml  README.md ``` ## Installation ### Prerequisites - Python 3.13 or higher - Google AI Studio API key ([Get one here](https://makersuite.google.com/app/apikey)) ### Setup 1. **Clone or navigate to the project directory** 2. **Install dependencies** ```bash # Using pip pip install -e . # Or using uv (recommended) uv pip install -e . ``` 3. **Configure environment variables** ```bash # Copy the example env file cp .env.example .env # Edit .env and add your Google API key GOOGLE_API_KEY=your_api_key_here ``` ## Usage ### Running the FastAPI Server Start the REST API server: ```bash python -m rag_server.server ``` The server will start at `http://localhost:8000`. Visit `http://localhost:8000/docs` for interactive API documentation. #### API Endpoints - **POST /documents** - Upload a document - **POST /query** - Query the RAG system - **GET /documents** - List all documents - **DELETE /documents/{doc_id}** - Delete a document - **GET /stats** - Get system statistics - **GET /health** - Health check #### Example Usage with curl ```bash # Upload a document curl -X POST "http://localhost:8000/documents" \ -F "file=@example.txt" # Query the RAG system curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What is the main topic of the documents?", "top_k": 5}' # List documents curl "http://localhost:8000/documents" # Get statistics curl "http://localhost:8000/stats" ``` ### Running the MCP Server The MCP server allows integration with Claude and other MCP-compatible clients. ```bash python -m mcp_server.server ``` #### MCP Tools Available 1. **query_rag** - Query the RAG system with a question 2. **add_document** - Add a document to the RAG system 3. **list_documents** - List all stored documents 4. **delete_document** - Delete a document by ID 5. **get_rag_stats** - Get system statistics #### Using with Claude Desktop Add to your Claude Desktop configuration (`claude_desktop_config.json`): ```json { "mcpServers": { "rag": { "command": "python", "args": ["-m", "mcp_server.server"], "cwd": "/path/to/rag" } } } ``` ## Configuration All configuration is managed through environment variables (defined in `.env`): | Variable | Description | Default | |----------|-------------|---------| | `GOOGLE_API_KEY` | Google AI Studio API key | (required) | | `CHUNK_SIZE` | Size of text chunks in characters | 1000 | | `CHUNK_OVERLAP` | Overlap between chunks | 200 | | `TOP_K_RESULTS` | Number of chunks to retrieve | 5 | | `QDRANT_PATH` | Path to Qdrant storage | ./qdrant_storage | | `QDRANT_COLLECTION_NAME` | Qdrant collection name | documents | | `FASTAPI_HOST` | FastAPI server host | 0.0.0.0 | | `FASTAPI_PORT` | FastAPI server port | 8000 | | `EMBEDDING_MODEL` | Google embedding model | text-embedding-004 | | `LLM_MODEL` | Google LLM model | gemini-1.5-flash | ## Architecture ### Document Processing Pipeline 1. **Upload** - User uploads a .txt or .md file 2. **Processing** - Document is read and metadata extracted 3. **Chunking** - Text is split into overlapping chunks 4. **Embedding** - Each chunk is converted to a vector using Google AI embeddings 5. **Storage** - Vectors and metadata are stored in Qdrant ### Query Pipeline 1. **Query** - User submits a question 2. **Embedding** - Question is converted to a vector 3. **Retrieval** - Similar chunks are retrieved from Qdrant 4. **Generation** - Context is provided to Google AI Studio Flash model 5. **Response** - Answer is generated and returned with sources ## Development ### Running Tests ```bash # Install test dependencies pip install pytest pytest-asyncio httpx # Run tests pytest ``` ### Code Style The project follows Python best practices with type hints and docstrings. ## Troubleshooting ### Common Issues **Issue**: `GOOGLE_API_KEY not found` - **Solution**: Ensure you've created a `.env` file and added your Google API key **Issue**: `Unsupported file type` - **Solution**: Only .txt and .md files are supported. Convert other formats first. **Issue**: `Collection already exists` error - **Solution**: Delete the `qdrant_storage/` directory to reset the database ## Advanced Usage ### Tag-Based Organization Organize your documents with tags for easy categorization and filtering: ```bash # Upload document with tags curl -X POST "http://localhost:8000/documents" \ -F "file=@dagster-docs.md" \ -F "tags=dagster,python,orchestration" # List all available tags curl "http://localhost:8000/tags" # Query only dagster-related documents curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}' # List documents filtered by tags curl "http://localhost:8000/documents?tags=dagster,python" ``` ### Hierarchical Document Structure For markdown documents, the system automatically preserves heading hierarchy: ```bash # Get document structure (table of contents) curl "http://localhost:8000/documents/{doc_id}/sections" # Query specific section curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}' ``` ### Section-Aware Queries The system includes section context when generating answers: ```python # Example: Markdown document structure # Installation # Prerequisites # Python Version # Setup Steps # When you query about "Python version requirements" # The system will: # 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version" # 2. Include section path in context sent to LLM # 3. Cite sources with full section paths ``` ### MCP Tools The MCP server provides enhanced tools for Claude and other MCP clients: **query_rag** - Query with optional tags and section filtering ```json { "question": "How do I deploy?", "tags": ["dagster"], "section_path": "Deployment" } ``` **add_document** - Upload with tags ```json { "file_path": "/path/to/doc.md", "tags": ["dagster", "docs"] } ``` **get_tags** - List all tags **get_document_structure** - Get table of contents ```json { "doc_id": "abc123" } ``` ## API Reference ### Enhanced Endpoints **POST /documents** - Body: `file` (multipart), `tags` (comma-separated string) - Response: Document info with tags and chunk count **POST /query** - Body: `{"question": "...", "tags": [...], "section_path": "..."}` - Response: Answer with section-aware sources **GET /tags** - Response: `{"tags": [...], "total": N}` **GET /documents/{doc_id}/sections** - Response: Document structure with section hierarchy **GET /documents?tags=tag1,tag2** - Query filtered by tags - Response: List of matching documents ## License MIT License ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## Acknowledgments - Google AI Studio for embeddings and LLM capabilities - Qdrant for vector database - FastAPI for the REST API framework - Anthropic MCP for the Model Context Protocol

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jaimeferj/mcp-rag-docs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server