Skip to main content
Glama

RAG Anything MCP Server

by jesse-merhi
MIT License
5
README.md6.55 kB
# RAG Anything MCP Server An MCP (Model Context Protocol) server that provides comprehensive RAG (Retrieval-Augmented Generation) capabilities for processing and querying directories of documents using the `raganything` library with full multimodal support. ## Features - **End-to-End Document Processing**: Complete document parsing with multimodal content extraction - **Multimodal RAG**: Support for images, tables, equations, and text processing - **Batch Processing**: Process entire directories with multiple file types - **Advanced Querying**: Both pure text and multimodal-enhanced queries - **Multiple Query Modes**: hybrid, local, global, naive, mix, and bypass modes - **Vision Processing**: Advanced image analysis using GPT-4V - **Persistent Storage**: RAG instances maintained per directory for efficient querying ## Available Tools ### `process_directory` Process all files in a directory for comprehensive RAG indexing with multimodal support. **Required Parameters:** - `directory_path`: Path to the directory containing files to process - `api_key`: OpenAI API key for LLM and embedding functions **Optional Parameters:** - `working_dir`: Custom working directory for RAG storage - `base_url`: OpenAI API base URL (for custom endpoints) - `file_extensions`: List of file extensions to process (default: ['.pdf', '.docx', '.pptx', '.txt', '.md']) - `recursive`: Process subdirectories (default: True) - `enable_image_processing`: Enable image analysis (default: True) - `enable_table_processing`: Enable table extraction (default: True) - `enable_equation_processing`: Enable equation processing (default: True) - `max_workers`: Concurrent processing workers (default: 4) ### `process_single_document` Process a single document with full multimodal analysis. **Required Parameters:** - `file_path`: Path to the document to process - `api_key`: OpenAI API key **Optional Parameters:** - `working_dir`: Custom working directory for RAG storage - `base_url`: OpenAI API base URL - `output_dir`: Output directory for parsed content - `parse_method`: Document parsing method (default: "auto") - `enable_image_processing`: Enable image analysis (default: True) - `enable_table_processing`: Enable table extraction (default: True) - `enable_equation_processing`: Enable equation processing (default: True) ### `query_directory` Pure text query against processed documents using LightRAG. **Parameters:** - `directory_path`: Path to the processed directory - `query`: The question to ask about the documents - `mode`: Query mode - "hybrid", "local", "global", "naive", "mix", or "bypass" (default: "hybrid") ### `query_with_multimodal_content` Enhanced query with additional multimodal content (tables, equations, etc.). **Parameters:** - `directory_path`: Path to the processed directory - `query`: The question to ask - `multimodal_content`: List of multimodal content dictionaries - `mode`: Query mode (default: "hybrid") **Example multimodal_content:** ```json [ { "type": "table", "table_data": "Method,Accuracy\\nRAGAnything,95.2%\\nBaseline,87.3%", "table_caption": "Performance comparison" }, { "type": "equation", "latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}", "equation_caption": "Document relevance probability" } ] ``` ### `list_processed_directories` List all directories that have been processed and are available for querying. ### `get_rag_info` Get detailed information about the RAG configuration and status for a directory. ## Usage Examples ### 1. Basic Directory Processing ``` process_directory( directory_path="/path/to/documents", api_key="your-openai-api-key" ) ``` ### 2. Advanced Directory Processing ``` process_directory( directory_path="/path/to/research_papers", api_key="your-openai-api-key", file_extensions=[".pdf", ".docx"], enable_image_processing=true, enable_table_processing=true, max_workers=6 ) ``` ### 3. Pure Text Query ``` query_directory( directory_path="/path/to/documents", query="What are the main findings in these research papers?", mode="hybrid" ) ``` ### 4. Multimodal Query with Table Data ``` query_with_multimodal_content( directory_path="/path/to/documents", query="Compare these results with the document findings", multimodal_content=[{ "type": "table", "table_data": "Method,Accuracy,Speed\\nRAGAnything,95.2%,120ms\\nBaseline,87.3%,180ms", "table_caption": "Performance comparison" }], mode="hybrid" ) ``` ### 5. Single Document Processing ``` process_single_document( file_path="/path/to/important_paper.pdf", api_key="your-openai-api-key", enable_image_processing=true ) ``` ## Setup Requirements ### 1. Environment Variables ```bash export OPENAI_API_KEY="your-openai-api-key-here" ``` ### 2. Install Dependencies ```bash uv sync ``` ### 3. Run the MCP Server ```bash python main.py ``` ## Query Modes Explained - **hybrid**: Combines local and global search (recommended for most use cases) - **local**: Focuses on local context and entity relationships - **global**: Provides broader, document-level insights and summaries - **naive**: Simple keyword-based search without graph reasoning - **mix**: Combines multiple approaches for comprehensive results - **bypass**: Direct access without RAG processing ## Multimodal Content Types The server supports processing and querying with: - **Images**: Automatic caption generation and visual analysis - **Tables**: Structure extraction and content analysis - **Equations**: LaTeX parsing and mathematical reasoning - **Charts/Graphs**: Visual data interpretation - **Mixed Content**: Combined analysis of multiple content types ## API Configuration The server uses OpenAI's APIs by default: - **LLM**: GPT-4o-mini for text processing - **Vision**: GPT-4o for image analysis - **Embeddings**: text-embedding-3-large (3072 dimensions) You can customize the `base_url` parameter to use: - Azure OpenAI - OpenAI-compatible APIs - Custom model endpoints ## File Support Supported file formats include: - PDF documents - Microsoft Word (.docx) - PowerPoint presentations (.pptx) - Text files (.txt) - Markdown files (.md) - And more via the raganything library ## Performance Notes - **Concurrent Processing**: Use `max_workers` to control parallel document processing - **Memory Usage**: Large documents with many images may require significant memory - **API Costs**: Vision processing (GPT-4o) is more expensive than text processing - **Storage**: Processed data is stored locally for efficient re-querying

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jesse-merhi/rag-anything-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server