Offers integration with GitHub for Model Context Protocol compatibility, linking to the modelcontextprotocol repository.
Provides integration with LangChain, allowing agents to automatically discover and use tools through the MCP server for document processing and RAG capabilities.
Offers Milvus as a production-grade vector database option for enterprise deployments, supporting advanced retrieval capabilities.
Integrates with OpenAI API (or compatible endpoints) to power the language model capabilities needed for document processing and question answering.
Provides a Streamlit-based frontend interface that connects with the MCP server, enabling user interaction with the RAG system.
π MCP-RAG
MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB) using intelligent chunking strategies, multi-format document support, and enterprise-grade reliability.
π Features
π Multi-Format Document Support
PDF: Intelligent page-by-page processing with table detection
DOCX: Paragraph and table extraction with formatting preservation
Excel: Sheet-aware processing with column context (.xlsx/.xls)
CSV: Smart row batching with header preservation
PPTX: Support for PPTX
IMAGE: Suppport for jpeg , png , webp , gif etc and OCR
π Large File Processing
Adaptive chunking: Different strategies based on file size
Memory management: Streaming processing for 50MB+ files
Progress tracking: Real-time progress indicators
Timeout handling: Graceful handling of long-running operations
π§ Advanced RAG Capabilities
Semantic search: Vector similarity with confidence scores
Cross-document queries: Search across multiple documents simultaneously
Source attribution: Citations with similarity scores
Hybrid retrieval: Combine semantic and keyword search
π Model Context Protocol (MCP) Integration
Universal tool interface: Standardized AI-to-tool communication
Auto-discovery: LangChain agents automatically find and use tools
Secure communication: Built-in permission controls
Extensible architecture: Easy to add new document processors
π’ Enterprise Ready
Custom LLM endpoints: Support for any OpenAI-compatible API
Vector database options: ChromaDB (local) + Milvus (production)
Batch processing: Handles API rate limits and batch size constraints
Error recovery: Retry logic and graceful degradation
Related MCP server: MCP Excel Reader
ποΈ Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ β Streamlit β β LangChain β β MCP Server β β Frontend βββββΊβ Agent βββββΊβ (Tools) β βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ β ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ β βΌ β βββββββββΌβββββββββ βββββββββββββββββββ ββββββββΌβββββββ β Document β β Vector Database β β LLM API β β Processors β β (ChromaDB) β β Endpoint β ββββββββββββββββββ βββββββββββββββββββ βββββββββββββββ
π Quick Start
Prerequisites
Python 3.11+
OpenAI API key or compatible LLM endpoint
8GB+ RAM (for large file processing)
Installation
Clone the repository