Skip to main content
Glama
jesse-merhi

RAG Anything MCP Server

by jesse-merhi

RAG Anything MCP Server

An MCP (Model Context Protocol) server that provides comprehensive RAG (Retrieval-Augmented Generation) capabilities for processing and querying directories of documents using the raganything library with full multimodal support.

Features

  • End-to-End Document Processing: Complete document parsing with multimodal content extraction

  • Multimodal RAG: Support for images, tables, equations, and text processing

  • Batch Processing: Process entire directories with multiple file types

  • Advanced Querying: Both pure text and multimodal-enhanced queries

  • Multiple Query Modes: hybrid, local, global, naive, mix, and bypass modes

  • Vision Processing: Advanced image analysis using GPT-4V

  • Persistent Storage: RAG instances maintained per directory for efficient querying

Related MCP server: RAG Memory MCP

Available Tools

process_directory

Process all files in a directory for comprehensive RAG indexing with multimodal support.

Required Parameters:

  • directory_path: Path to the directory containing files to process

  • api_key: OpenAI API key for LLM and embedding functions

Optional Parameters:

  • working_dir: Custom working directory for RAG storage

  • base_url: OpenAI API base URL (for custom endpoints)

  • file_extensions: List of file extensions to process (default: ['.pdf', '.docx', '.pptx', '.txt', '.md'])

  • recursive: Process subdirectories (default: True)

  • enable_image_processing: Enable image analysis (default: True)

  • enable_table_processing: Enable table extraction (default: True)

  • enable_equation_processing: Enable equation processing (default: True)

  • max_workers: Concurrent processing workers (default: 4)

process_single_document

Process a single document with full multimodal analysis.

Required Parameters:

  • file_path: Path to the document to process

  • api_key: OpenAI API key

Optional Parameters:

  • working_dir: Custom working directory for RAG storage

  • base_url: OpenAI API base URL

  • output_dir: Output directory for parsed content

  • parse_method: Document parsing method (default: "auto")

  • enable_image_processing: Enable image analysis (default: True)

  • enable_table_processing: Enable table extraction (default: True)

  • enable_equation_processing: Enable equation processing (default: True)

query_directory

Pure text query against processed documents using LightRAG.

Parameters:

  • directory_path: Path to the processed directory

  • query: The question to ask about the documents

  • mode: Query mode - "hybrid", "local", "global", "naive", "mix", or "bypass" (default: "hybrid")

query_with_multimodal_content

Enhanced query with additional multimodal content (tables, equations, etc.).

Parameters:

  • directory_path: Path to the processed directory

  • query: The question to ask

  • multimodal_content: List of multimodal content dictionaries

  • mode: Query mode (default: "hybrid")

Example multimodal_content:

[
  {
    "type": "table",
    "table_data": "Method,Accuracy\\nRAGAnything,95.2%\\nBaseline,87.3%",
    "table_caption": "Performance comparison"
  },
  {
    "type": "equation",
    "latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}",
    "equation_caption": "Document relevance probability"
  }
]

list_processed_directories

List all directories that have been processed and are available for querying.

get_rag_info

Get detailed information about the RAG configuration and status for a directory.

Usage Examples

1. Basic Directory Processing

process_directory(
  directory_path="/path/to/documents",
  api_key="your-openai-api-key"
)

2. Advanced Directory Processing

process_directory(
  directory_path="/path/to/research_papers",
  api_key="your-openai-api-key",
  file_extensions=[".pdf", ".docx"],
  enable_image_processing=true,
  enable_table_processing=true,
  max_workers=6
)

3. Pure Text Query

query_directory(
  directory_path="/path/to/documents",
  query="What are the main findings in these research papers?",
  mode="hybrid"
)

4. Multimodal Query with Table Data

query_with_multimodal_content(
  directory_path="/path/to/documents",
  query="Compare these results with the document findings",
  multimodal_content=[{
    "type": "table",
    "table_data": "Method,Accuracy,Speed\\nRAGAnything,95.2%,120ms\\nBaseline,87.3%,180ms",
    "table_caption": "Performance comparison"
  }],
  mode="hybrid"
)

5. Single Document Processing

process_single_document(
  file_path="/path/to/important_paper.pdf",
  api_key="your-openai-api-key",
  enable_image_processing=true
)

Setup Requirements

1. Environment Variables

export OPENAI_API_KEY="your-openai-api-key-here"

2. Install Dependencies

uv sync

3. Run the MCP Server

python main.py

Query Modes Explained

  • hybrid: Combines local and global search (recommended for most use cases)

  • local: Focuses on local context and entity relationships

  • global: Provides broader, document-level insights and summaries

  • naive: Simple keyword-based search without graph reasoning

  • mix: Combines multiple approaches for comprehensive results

  • bypass: Direct access without RAG processing

Multimodal Content Types

The server supports processing and querying with:

  • Images: Automatic caption generation and visual analysis

  • Tables: Structure extraction and content analysis

  • Equations: LaTeX parsing and mathematical reasoning

  • Charts/Graphs: Visual data interpretation

  • Mixed Content: Combined analysis of multiple content types

API Configuration

The server uses OpenAI's APIs by default:

  • LLM: GPT-4o-mini for text processing

  • Vision: GPT-4o for image analysis

  • Embeddings: text-embedding-3-large (3072 dimensions)

You can customize the base_url parameter to use:

  • Azure OpenAI

  • OpenAI-compatible APIs

  • Custom model endpoints

File Support

Supported file formats include:

  • PDF documents

  • Microsoft Word (.docx)

  • PowerPoint presentations (.pptx)

  • Text files (.txt)

  • Markdown files (.md)

  • And more via the raganything library

Performance Notes

  • Concurrent Processing: Use max_workers to control parallel document processing

  • Memory Usage: Large documents with many images may require significant memory

  • API Costs: Vision processing (GPT-4o) is more expensive than text processing

  • Storage: Processed data is stored locally for efficient re-querying

-
security - not tested
A
license - permissive license
-
quality - not tested

Resources

Looking for Admin?

Admins can modify the Dockerfile, update the server description, and track usage metrics. If you are the server author, to access the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jesse-merhi/rag-anything-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server