Offers integration with GitHub for Model Context Protocol compatibility, linking to the modelcontextprotocol repository.
Provides integration with LangChain, allowing agents to automatically discover and use tools through the MCP server for document processing and RAG capabilities.
Offers Milvus as a production-grade vector database option for enterprise deployments, supporting advanced retrieval capabilities.
Integrates with OpenAI API (or compatible endpoints) to power the language model capabilities needed for document processing and question answering.
Provides a Streamlit-based frontend interface that connects with the MCP server, enabling user interaction with the RAG system.
📚 MCP-RAG
MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB) using intelligent chunking strategies, multi-format document support, and enterprise-grade reliability.
🌟 Features
📄 Multi-Format Document Support
- PDF: Intelligent page-by-page processing with table detection
- DOCX: Paragraph and table extraction with formatting preservation
- Excel: Sheet-aware processing with column context (.xlsx/.xls)
- CSV: Smart row batching with header preservation
- PPTX: Support for PPTX
- IMAGE: Suppport for jpeg , png , webp , gif etc and OCR
🚀 Large File Processing
- Adaptive chunking: Different strategies based on file size
- Memory management: Streaming processing for 50MB+ files
- Progress tracking: Real-time progress indicators
- Timeout handling: Graceful handling of long-running operations
🧠 Advanced RAG Capabilities
- Semantic search: Vector similarity with confidence scores
- Cross-document queries: Search across multiple documents simultaneously
- Source attribution: Citations with similarity scores
- Hybrid retrieval: Combine semantic and keyword search
🔌 Model Context Protocol (MCP) Integration
- Universal tool interface: Standardized AI-to-tool communication
- Auto-discovery: LangChain agents automatically find and use tools
- Secure communication: Built-in permission controls
- Extensible architecture: Easy to add new document processors
🏢 Enterprise Ready
- Custom LLM endpoints: Support for any OpenAI-compatible API
- Vector database options: ChromaDB (local) + Milvus (production)
- Batch processing: Handles API rate limits and batch size constraints
- Error recovery: Retry logic and graceful degradation
🏗️ Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ Streamlit │ │ LangChain │ │ MCP Server │ │ Frontend │◄──►│ Agent │◄──►│ (Tools) │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ ┌────────────────────────┼────────────────────────┐ │ ▼ │ ┌───────▼────────┐ ┌─────────────────┐ ┌──────▼──────┐ │ Document │ │ Vector Database │ │ LLM API │ │ Processors │ │ (ChromaDB) │ │ Endpoint │ └────────────────┘ └─────────────────┘ └─────────────┘
🚀 Quick Start
Prerequisites
- Python 3.11+
- OpenAI API key or compatible LLM endpoint
- 8GB+ RAM (for large file processing)
Installation
Clone the repository
This server cannot be installed
An MCP-compatible system that handles large files (up to 200MB) with intelligent chunking and multi-format document support for advanced retrieval-augmented generation.
Related MCP Servers
- AsecurityFlicenseAqualityAn MCP server that intelligently chunks large documents for Claude, enabling efficient context-aware processing and summary generation for enhanced document comprehension.Last updated -12JavaScript
- AsecurityAlicenseAqualityProvides efficient handling of large Excel files through automatic chunking and pagination, using MCP to enable seamless file reading and management features such as sheet selection and error handling.Last updated -114JavaScriptMIT License
Vectorizeofficial
AsecurityAlicenseAqualityVectorize MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.Last updated -31517JavaScriptMIT License- AsecurityAlicenseAqualityAn MCP tool that provides AI with the ability to compress and decompress local files.Last updated -4958TypeScriptMIT License