MCP-RAG

📚 MCP-RAG

MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB) using intelligent chunking strategies, multi-format document support, and enterprise-grade reliability.

🌟 Features

📄 Multi-Format Document Support

PDF: Intelligent page-by-page processing with table detection
DOCX: Paragraph and table extraction with formatting preservation
Excel: Sheet-aware processing with column context (.xlsx/.xls)
CSV: Smart row batching with header preservation
PPTX: Support for PPTX
IMAGE: Suppport for jpeg , png , webp , gif etc and OCR

🚀 Large File Processing

Adaptive chunking: Different strategies based on file size
Memory management: Streaming processing for 50MB+ files
Progress tracking: Real-time progress indicators
Timeout handling: Graceful handling of long-running operations

🧠 Advanced RAG Capabilities

Semantic search: Vector similarity with confidence scores
Cross-document queries: Search across multiple documents simultaneously
Source attribution: Citations with similarity scores
Hybrid retrieval: Combine semantic and keyword search

🔌 Model Context Protocol (MCP) Integration

Universal tool interface: Standardized AI-to-tool communication
Auto-discovery: LangChain agents automatically find and use tools
Secure communication: Built-in permission controls
Extensible architecture: Easy to add new document processors

🏢 Enterprise Ready

Custom LLM endpoints: Support for any OpenAI-compatible API
Vector database options: ChromaDB (local) + Milvus (production)
Batch processing: Handles API rate limits and batch size constraints
Error recovery: Retry logic and graceful degradation

🏗️ Architecture

┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ Streamlit │ │ LangChain │ │ MCP Server │ │ Frontend │◄──►│ Agent │◄──►│ (Tools) │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ ┌────────────────────────┼────────────────────────┐ │ ▼ │ ┌───────▼────────┐ ┌─────────────────┐ ┌──────▼──────┐ │ Document │ │ Vector Database │ │ LLM API │ │ Processors │ │ (ChromaDB) │ │ Endpoint │ └────────────────┘ └─────────────────┘ └─────────────┘

🚀 Quick Start

Prerequisites

Python 3.11+
OpenAI API key or compatible LLM endpoint
8GB+ RAM (for large file processing)

Installation

Clone the repository

git clone https://github.com/yourusername/rag-large-file-processor.git
cd rag-large-file-processor

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

pip install -r requirements.txt

# Create .env file
cat > .env << EOF
OPENAI_API_KEY=your_openai_api_key_here
BASE_URL=https://api.openai.com/v1
MODEL_NAME=gpt-4o
VECTOR_DB_TYPE=chromadb


streamlit run streamlit_app.py

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

An MCP-compatible system that handles large files (up to 200MB) with intelligent chunking and multi-format document support for advanced retrieval-augmented generation.

Related MCP Servers

Claude Chunks
vetlefo
A
security
F
license
A
quality
An MCP server that intelligently chunks large documents for Claude, enabling efficient context-aware processing and summary generation for enhanced document comprehension.
Last updated -
12
JavaScript
MCP Excel Reader
ArchimedesCrypto
A
security
A
license
A
quality
Provides efficient handling of large Excel files through automatic chunking and pagination, using MCP to enable seamless file reading and management features such as sheet selection and error handling.
Last updated -
1
25
JavaScript
MIT License
Doc/docx-MCP
MeterLong
-
security
F
license
-
quality
A powerful Word document processing service based on FastMCP, enabling AI assistants to create, edit, and manage docx files with full formatting support. Preserves original styles when editing content.
Last updated -
116
Python
RAG Memory MCP
ttommyth
-
security
F
license
-
quality
An advanced MCP server providing RAG-enabled memory through a knowledge graph with vector search capabilities, enabling intelligent information storage, semantic retrieval, and document processing.
Last updated -
32
13
TypeScript

View all related MCP servers

MCP-RAG

📚 MCP-RAG

🌟 Features

📄 Multi-Format Document Support

🚀 Large File Processing

🧠 Advanced RAG Capabilities

🔌 Model Context Protocol (MCP) Integration

🏢 Enterprise Ready

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

Related MCP Servers

Claude Chunks

MCP Excel Reader

Doc/docx-MCP

RAG Memory MCP

New MCP Servers

MCP directory API