Skip to main content
Glama

πŸ“š MCP-RAG

MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB) using intelligent chunking strategies, multi-format document support, and enterprise-grade reliability.

Python 3.11+ License: MIT MCP

🌟 Features

πŸ“„ Multi-Format Document Support

  • PDF: Intelligent page-by-page processing with table detection

  • DOCX: Paragraph and table extraction with formatting preservation

  • Excel: Sheet-aware processing with column context (.xlsx/.xls)

  • CSV: Smart row batching with header preservation

  • PPTX: Support for PPTX

  • IMAGE: Suppport for jpeg , png , webp , gif etc and OCR

πŸš€ Large File Processing

  • Adaptive chunking: Different strategies based on file size

  • Memory management: Streaming processing for 50MB+ files

  • Progress tracking: Real-time progress indicators

  • Timeout handling: Graceful handling of long-running operations

🧠 Advanced RAG Capabilities

  • Semantic search: Vector similarity with confidence scores

  • Cross-document queries: Search across multiple documents simultaneously

  • Source attribution: Citations with similarity scores

  • Hybrid retrieval: Combine semantic and keyword search

πŸ”Œ Model Context Protocol (MCP) Integration

  • Universal tool interface: Standardized AI-to-tool communication

  • Auto-discovery: LangChain agents automatically find and use tools

  • Secure communication: Built-in permission controls

  • Extensible architecture: Easy to add new document processors

🏒 Enterprise Ready

  • Custom LLM endpoints: Support for any OpenAI-compatible API

  • Vector database options: ChromaDB (local) + Milvus (production)

  • Batch processing: Handles API rate limits and batch size constraints

  • Error recovery: Retry logic and graceful degradation

Related MCP server: MCP Excel Reader

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Streamlit β”‚ β”‚ LangChain β”‚ β”‚ MCP Server β”‚ β”‚ Frontend │◄──►│ Agent │◄──►│ (Tools) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β–Ό β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ Document β”‚ β”‚ Vector Database β”‚ β”‚ LLM API β”‚ β”‚ Processors β”‚ β”‚ (ChromaDB) β”‚ β”‚ Endpoint β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Python 3.11+

  • OpenAI API key or compatible LLM endpoint

  • 8GB+ RAM (for large file processing)

Installation

Clone the repository

git clone https://github.com/yourusername/rag-large-file-processor.git cd rag-large-file-processor python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt # Create .env file cat > .env << EOF OPENAI_API_KEY=your_openai_api_key_here BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4o VECTOR_DB_TYPE=chromadb streamlit run streamlit_app.py
-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AnuragB7/MCP-RAG'

If you have feedback or need assistance with the MCP directory API, please join our Discord server