Skip to main content
Glama

MCP-RAG

by AnuragB7
MIT License
2

📚 MCP-RAG

MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB) using intelligent chunking strategies, multi-format document support, and enterprise-grade reliability.

🌟 Features

📄 Multi-Format Document Support

  • PDF: Intelligent page-by-page processing with table detection
  • DOCX: Paragraph and table extraction with formatting preservation
  • Excel: Sheet-aware processing with column context (.xlsx/.xls)
  • CSV: Smart row batching with header preservation
  • PPTX: Support for PPTX
  • IMAGE: Suppport for jpeg , png , webp , gif etc and OCR

🚀 Large File Processing

  • Adaptive chunking: Different strategies based on file size
  • Memory management: Streaming processing for 50MB+ files
  • Progress tracking: Real-time progress indicators
  • Timeout handling: Graceful handling of long-running operations

🧠 Advanced RAG Capabilities

  • Semantic search: Vector similarity with confidence scores
  • Cross-document queries: Search across multiple documents simultaneously
  • Source attribution: Citations with similarity scores
  • Hybrid retrieval: Combine semantic and keyword search

🔌 Model Context Protocol (MCP) Integration

  • Universal tool interface: Standardized AI-to-tool communication
  • Auto-discovery: LangChain agents automatically find and use tools
  • Secure communication: Built-in permission controls
  • Extensible architecture: Easy to add new document processors

🏢 Enterprise Ready

  • Custom LLM endpoints: Support for any OpenAI-compatible API
  • Vector database options: ChromaDB (local) + Milvus (production)
  • Batch processing: Handles API rate limits and batch size constraints
  • Error recovery: Retry logic and graceful degradation

🏗️ Architecture

┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ Streamlit │ │ LangChain │ │ MCP Server │ │ Frontend │◄──►│ Agent │◄──►│ (Tools) │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ ┌────────────────────────┼────────────────────────┐ │ ▼ │ ┌───────▼────────┐ ┌─────────────────┐ ┌──────▼──────┐ │ Document │ │ Vector Database │ │ LLM API │ │ Processors │ │ (ChromaDB) │ │ Endpoint │ └────────────────┘ └─────────────────┘ └─────────────┘

🚀 Quick Start

Prerequisites

  • Python 3.11+
  • OpenAI API key or compatible LLM endpoint
  • 8GB+ RAM (for large file processing)

Installation

Clone the repository

git clone https://github.com/yourusername/rag-large-file-processor.git cd rag-large-file-processor python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt # Create .env file cat > .env << EOF OPENAI_API_KEY=your_openai_api_key_here BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4o VECTOR_DB_TYPE=chromadb streamlit run streamlit_app.py
-
security - not tested
A
license - permissive license
-
quality - not tested

An MCP-compatible system that handles large files (up to 200MB) with intelligent chunking and multi-format document support for advanced retrieval-augmented generation.

  1. 🌟 Features
    1. 📄 Multi-Format Document Support
    2. 🚀 Large File Processing
    3. 🧠 Advanced RAG Capabilities
    4. 🔌 Model Context Protocol (MCP) Integration
    5. 🏢 Enterprise Ready
  2. 🏗️ Architecture
    1. 🚀 Quick Start
      1. Prerequisites
      2. Installation
    2. Create .env file

      Related MCP Servers

      • A
        security
        F
        license
        A
        quality
        An MCP server that intelligently chunks large documents for Claude, enabling efficient context-aware processing and summary generation for enhanced document comprehension.
        Last updated -
        12
        JavaScript
      • A
        security
        A
        license
        A
        quality
        Provides efficient handling of large Excel files through automatic chunking and pagination, using MCP to enable seamless file reading and management features such as sheet selection and error handling.
        Last updated -
        1
        14
        JavaScript
        MIT License
      • A
        security
        A
        license
        A
        quality
        Vectorize MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.
        Last updated -
        3
        15
        17
        JavaScript
        MIT License
        • Apple
        • Linux
      • A
        security
        A
        license
        A
        quality
        An MCP tool that provides AI with the ability to compress and decompress local files.
        Last updated -
        4
        95
        8
        TypeScript
        MIT License

      View all related MCP servers

      MCP directory API

      We provide all the information about MCP servers via our MCP API.

      curl -X GET 'https://glama.ai/api/mcp/v1/servers/AnuragB7/MCP-RAG'

      If you have feedback or need assistance with the MCP directory API, please join our Discord server