Supports processing and reasoning with mathematical equations in LaTeX format, including parsing and interpretation as part of multimodal content
Provides support for processing and querying Markdown files (.md) as part of the document indexing and RAG capabilities
Uses OpenAI API for LLM processing, embeddings, and vision capabilities including GPT-4o-mini for text, GPT-4o for image analysis, and text-embedding-3-large for embeddings
RAG Anything MCP Server
An MCP (Model Context Protocol) server that provides comprehensive RAG (Retrieval-Augmented Generation) capabilities for processing and querying directories of documents using the raganything
library with full multimodal support.
Features
- End-to-End Document Processing: Complete document parsing with multimodal content extraction
- Multimodal RAG: Support for images, tables, equations, and text processing
- Batch Processing: Process entire directories with multiple file types
- Advanced Querying: Both pure text and multimodal-enhanced queries
- Multiple Query Modes: hybrid, local, global, naive, mix, and bypass modes
- Vision Processing: Advanced image analysis using GPT-4V
- Persistent Storage: RAG instances maintained per directory for efficient querying
Available Tools
process_directory
Process all files in a directory for comprehensive RAG indexing with multimodal support.
Required Parameters:
directory_path
: Path to the directory containing files to processapi_key
: OpenAI API key for LLM and embedding functions
Optional Parameters:
working_dir
: Custom working directory for RAG storagebase_url
: OpenAI API base URL (for custom endpoints)file_extensions
: List of file extensions to process (default: ['.pdf', '.docx', '.pptx', '.txt', '.md'])recursive
: Process subdirectories (default: True)enable_image_processing
: Enable image analysis (default: True)enable_table_processing
: Enable table extraction (default: True)enable_equation_processing
: Enable equation processing (default: True)max_workers
: Concurrent processing workers (default: 4)
process_single_document
Process a single document with full multimodal analysis.
Required Parameters:
file_path
: Path to the document to processapi_key
: OpenAI API key
Optional Parameters:
working_dir
: Custom working directory for RAG storagebase_url
: OpenAI API base URLoutput_dir
: Output directory for parsed contentparse_method
: Document parsing method (default: "auto")enable_image_processing
: Enable image analysis (default: True)enable_table_processing
: Enable table extraction (default: True)enable_equation_processing
: Enable equation processing (default: True)
query_directory
Pure text query against processed documents using LightRAG.
Parameters:
directory_path
: Path to the processed directoryquery
: The question to ask about the documentsmode
: Query mode - "hybrid", "local", "global", "naive", "mix", or "bypass" (default: "hybrid")
query_with_multimodal_content
Enhanced query with additional multimodal content (tables, equations, etc.).
Parameters:
directory_path
: Path to the processed directoryquery
: The question to askmultimodal_content
: List of multimodal content dictionariesmode
: Query mode (default: "hybrid")
Example multimodal_content:
list_processed_directories
List all directories that have been processed and are available for querying.
get_rag_info
Get detailed information about the RAG configuration and status for a directory.
Usage Examples
1. Basic Directory Processing
2. Advanced Directory Processing
3. Pure Text Query
4. Multimodal Query with Table Data
5. Single Document Processing
Setup Requirements
1. Environment Variables
2. Install Dependencies
3. Run the MCP Server
Query Modes Explained
- hybrid: Combines local and global search (recommended for most use cases)
- local: Focuses on local context and entity relationships
- global: Provides broader, document-level insights and summaries
- naive: Simple keyword-based search without graph reasoning
- mix: Combines multiple approaches for comprehensive results
- bypass: Direct access without RAG processing
Multimodal Content Types
The server supports processing and querying with:
- Images: Automatic caption generation and visual analysis
- Tables: Structure extraction and content analysis
- Equations: LaTeX parsing and mathematical reasoning
- Charts/Graphs: Visual data interpretation
- Mixed Content: Combined analysis of multiple content types
API Configuration
The server uses OpenAI's APIs by default:
- LLM: GPT-4o-mini for text processing
- Vision: GPT-4o for image analysis
- Embeddings: text-embedding-3-large (3072 dimensions)
You can customize the base_url
parameter to use:
- Azure OpenAI
- OpenAI-compatible APIs
- Custom model endpoints
File Support
Supported file formats include:
- PDF documents
- Microsoft Word (.docx)
- PowerPoint presentations (.pptx)
- Text files (.txt)
- Markdown files (.md)
- And more via the raganything library
Performance Notes
- Concurrent Processing: Use
max_workers
to control parallel document processing - Memory Usage: Large documents with many images may require significant memory
- API Costs: Vision processing (GPT-4o) is more expensive than text processing
- Storage: Processed data is stored locally for efficient re-querying
This server cannot be installed
An MCP server that provides comprehensive multimodal Retrieval-Augmented Generation (RAG) capabilities for processing and querying document directories, supporting text, images, tables, and equations.
Related MCP Servers
- -securityAlicense-qualityAn MCP server that enables RAG (Retrieval-Augmented Generation) on markdown documents by converting them to embedding vectors and performing vector search using DuckDB.Last updated -3PythonApache 2.0
- -securityFlicense-qualityAn advanced MCP server providing RAG-enabled memory through a knowledge graph with vector search capabilities, enabling intelligent information storage, semantic retrieval, and document processing.Last updated -3213TypeScript
- -securityAlicense-qualityA server that integrates Retrieval-Augmented Generation (RAG) with the Model Control Protocol (MCP) to provide web search capabilities and document analysis for AI assistants.Last updated -1PythonApache 2.0
- -securityFlicense-qualityAn MCP server that implements Retrieval-Augmented Generation to efficiently retrieve and process important information from various sources, providing accurate and contextually relevant responses.Last updated -Python