Skip to main content
Glama

MCP-RAG

by seanshin0214
README.mdβ€’8.3 kB
# MCP-RAG **Your Personal NotebookLM for Claude Desktop** Universal RAG (Retrieval-Augmented Generation) MCP server for Claude Desktop. Index documents via CLI, search them in Claude Desktop with 0% hallucination. [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Node.js Version](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen)](https://nodejs.org/) [![Python Version](https://img.shields.io/badge/python-%3E%3D3.8-blue)](https://python.org/) --- ## What is MCP-RAG? Think of it as **NotebookLM for Claude Desktop**: - πŸ“š **Index any documents**: PDF, Word, PowerPoint, Excel, ν•œκΈ€, TXT, MD - πŸ” **Natural language search**: Ask questions in Claude Desktop - βœ… **0% Hallucination**: Answers based ONLY on your documents - πŸ’» **100% Local**: All data stays on your computer (ChromaDB) - 🎯 **Simple workflow**: CLI for indexing β†’ Claude Desktop for searching --- ## Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Your Documents β”‚ β”‚ (PDF, DOCX, etc) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό [CLI: npm run cli add] β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ChromaDB Server β”‚ ◄─── Vector embeddings β”‚ (localhost:8000) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MCP-RAG Server β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Claude Desktop β”‚ ◄─── You ask questions here! β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Two-Part System:** 1. **CLI** = Document management (add, delete, list) 2. **Claude Desktop** = Search and Q&A --- ## Quick Start ### 1. Install ```bash git clone https://github.com/seanshin0214/mcp-rag.git cd mcp-rag npm install pip install chromadb ``` ### 2. Start ChromaDB Server **Keep this running in a separate terminal:** ```bash chroma run --host localhost --port 8000 ``` ### 3. Add Documents (CLI) ```bash # Add single document npm run cli add school "path/to/regulations.pdf" # Add multiple documents npm run cli add research "paper1.pdf" npm run cli add research "paper2.docx" npm run cli add work "handbook.pptx" ``` **Supported formats:** - Documents: PDF, DOCX, HWP, TXT, MD - Presentations: PPTX - Spreadsheets: XLSX, XLS ### 4. Configure Claude Desktop **Windows:** `%APPDATA%\Claude\claude_desktop_config.json` **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json` Add this: ```json { "mcpServers": { "mcp-rag": { "command": "node", "args": ["/absolute/path/to/mcp-rag/src/index.js"] } } } ``` **Important:** Use your actual installation path! ### 5. Restart Claude Desktop ### 6. Ask Questions! In Claude Desktop: ``` "What does the school collection say about attendance?" ``` ``` "Search the research collection for methodology" ``` ``` "Show me all my collections" ``` --- ## CLI Commands ```bash # Add document npm run cli add <collection> <file> [-d "description"] # List all collections npm run cli list # Get collection info npm run cli info <collection> # Search test npm run cli search <collection> "query" # Delete collection npm run cli delete <collection> ``` ### Examples ```bash # Add with description npm run cli add school "regulations.pdf" -d "School regulations 2024" # Add multiple files (PowerShell) Get-ChildItem "*.docx" | ForEach-Object { npm run cli add MyCollection $_.FullName } # Check what's indexed npm run cli list npm run cli info school ``` --- ## MCP Tools (Claude Desktop) When you ask questions in Claude Desktop, these tools are automatically used: | Tool | Description | |------|-------------| | `search_documents` | Search in specific collection or all collections | | `list_collections` | List all available collections | | `get_collection_info` | Get details about a collection | **Note:** Document addition is CLI-only, not available in Claude Desktop. --- ## How It Works ### Indexing (CLI) ``` 1. Read file (PDF/DOCX/PPTX/etc) 2. Extract text 3. Split into 500-token chunks (50-token overlap) 4. Generate embeddings (ChromaDB) 5. Store in collection ``` ### Searching (Claude Desktop) ``` 1. You ask: "What's the attendance policy?" 2. MCP-RAG searches ChromaDB 3. Returns top 5 most relevant chunks 4. Claude answers using ONLY those chunks ``` --- ## Use Cases ### πŸ“š Students ```bash npm run cli add math "calculus-textbook.pdf" npm run cli add physics "lecture-notes.docx" ``` β†’ "Explain the concept of derivatives from my math collection" ### 🏒 Professionals ```bash npm run cli add company "employee-handbook.pdf" npm run cli add project "requirements.docx" ``` β†’ "What's our vacation policy?" ### πŸ”¬ Researchers ```bash npm run cli add literature "papers/*.pdf" npm run cli add notes "research-notes.md" ``` β†’ "Summarize the methodology from the literature collection" --- ## Features - βœ… **Multi-collection support** - Organize by topic - βœ… **Semantic search** - ChromaDB vector embeddings - βœ… **Source attribution** - See which document/chunk - βœ… **Relevance scoring** - Know how confident the match is - βœ… **Multiple file formats** - PDF, DOCX, PPTX, XLSX, HWP, TXT, MD - βœ… **100% local** - No cloud, all on your machine - βœ… **0% hallucination** - Only document-based answers --- ## Comparison | Feature | NotebookLM | MCP-RAG | |---------|-----------|---------| | Platform | Google Cloud | Local | | AI Model | Gemini | Claude | | Privacy | Cloud | 100% Local | | Multi-collection | ❌ | βœ… | | CLI | ❌ | βœ… | | Cost | Free (limited) | Free (unlimited) | --- ## Troubleshooting ### ChromaDB Connection Error **Problem:** `Cannot connect to ChromaDB` **Solution:** ```bash chroma run --host localhost --port 8000 ``` Keep this terminal open! ### Claude Desktop: MCP Server Not Showing 1. Check `claude_desktop_config.json` syntax 2. Use absolute path (not relative) 3. Restart Claude Desktop completely 4. Check ChromaDB is running ### No Search Results ```bash # Verify documents are indexed npm run cli list npm run cli info <collection> # Re-index if needed npm run cli add <collection> <file> ``` --- ## Advanced ### Batch Add Files **PowerShell:** ```powershell Get-ChildItem "C:\docs\*.pdf" | ForEach-Object { npm run cli add MyCollection $_.FullName } ``` **Bash:** ```bash for f in /path/to/docs/*.pdf; do npm run cli add MyCollection "$f" done ``` ### Custom Chunk Size Edit `src/indexer.js`: ```javascript const CHUNK_SIZE = 500; // Tokens per chunk const CHUNK_OVERLAP = 50; // Overlap between chunks ``` Larger chunks = more context, fewer chunks Smaller chunks = more precise, more chunks --- ## Project Structure ``` mcp-rag/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ index.js # MCP server β”‚ β”œβ”€β”€ cli.js # CLI tool β”‚ └── indexer.js # Document processing β”œβ”€β”€ chroma/ # ChromaDB data (auto-created) β”œβ”€β”€ package.json β”œβ”€β”€ README.md β”œβ”€β”€ QUICK_START.md └── HOW_TO_USE.md ``` --- ## Requirements - **Node.js** 18+ - **Python** 3.8+ (for ChromaDB) - **Claude Desktop** (latest version) --- ## Contributing Contributions welcome! This is a universal tool that can benefit many users. --- ## License MIT License - see [LICENSE](LICENSE) --- ## Credits Built with: - [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - Anthropic - [ChromaDB](https://www.trychroma.com/) - Vector database - [pdf-parse](https://www.npmjs.com/package/pdf-parse) - PDF extraction - [mammoth](https://www.npmjs.com/package/mammoth) - DOCX extraction - [officeparser](https://www.npmjs.com/package/officeparser) - PPTX extraction - [xlsx](https://www.npmjs.com/package/xlsx) - Excel extraction - [node-hwp](https://www.npmjs.com/package/node-hwp) - ν•œκΈ€ extraction --- **MCP-RAG** - Your documents, Claude's intelligence, zero hallucination.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/seanshin0214/mcp-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server