README.mdβ’8.3 kB
# MCP-RAG
**Your Personal NotebookLM for Claude Desktop**
Universal RAG (Retrieval-Augmented Generation) MCP server for Claude Desktop. Index documents via CLI, search them in Claude Desktop with 0% hallucination.
[](https://opensource.org/licenses/MIT)
[](https://nodejs.org/)
[](https://python.org/)
---
## What is MCP-RAG?
Think of it as **NotebookLM for Claude Desktop**:
- π **Index any documents**: PDF, Word, PowerPoint, Excel, νκΈ, TXT, MD
- π **Natural language search**: Ask questions in Claude Desktop
- β
**0% Hallucination**: Answers based ONLY on your documents
- π» **100% Local**: All data stays on your computer (ChromaDB)
- π― **Simple workflow**: CLI for indexing β Claude Desktop for searching
---
## Architecture
```
βββββββββββββββββββββββ
β Your Documents β
β (PDF, DOCX, etc) β
ββββββββββββ¬βββββββββββ
β
βΌ
[CLI: npm run cli add]
β
βΌ
βββββββββββββββββββββββ
β ChromaDB Server β ββββ Vector embeddings
β (localhost:8000) β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β MCP-RAG Server β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Claude Desktop β ββββ You ask questions here!
βββββββββββββββββββββββ
```
**Two-Part System:**
1. **CLI** = Document management (add, delete, list)
2. **Claude Desktop** = Search and Q&A
---
## Quick Start
### 1. Install
```bash
git clone https://github.com/seanshin0214/mcp-rag.git
cd mcp-rag
npm install
pip install chromadb
```
### 2. Start ChromaDB Server
**Keep this running in a separate terminal:**
```bash
chroma run --host localhost --port 8000
```
### 3. Add Documents (CLI)
```bash
# Add single document
npm run cli add school "path/to/regulations.pdf"
# Add multiple documents
npm run cli add research "paper1.pdf"
npm run cli add research "paper2.docx"
npm run cli add work "handbook.pptx"
```
**Supported formats:**
- Documents: PDF, DOCX, HWP, TXT, MD
- Presentations: PPTX
- Spreadsheets: XLSX, XLS
### 4. Configure Claude Desktop
**Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
**macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
Add this:
```json
{
"mcpServers": {
"mcp-rag": {
"command": "node",
"args": ["/absolute/path/to/mcp-rag/src/index.js"]
}
}
}
```
**Important:** Use your actual installation path!
### 5. Restart Claude Desktop
### 6. Ask Questions!
In Claude Desktop:
```
"What does the school collection say about attendance?"
```
```
"Search the research collection for methodology"
```
```
"Show me all my collections"
```
---
## CLI Commands
```bash
# Add document
npm run cli add <collection> <file> [-d "description"]
# List all collections
npm run cli list
# Get collection info
npm run cli info <collection>
# Search test
npm run cli search <collection> "query"
# Delete collection
npm run cli delete <collection>
```
### Examples
```bash
# Add with description
npm run cli add school "regulations.pdf" -d "School regulations 2024"
# Add multiple files (PowerShell)
Get-ChildItem "*.docx" | ForEach-Object {
npm run cli add MyCollection $_.FullName
}
# Check what's indexed
npm run cli list
npm run cli info school
```
---
## MCP Tools (Claude Desktop)
When you ask questions in Claude Desktop, these tools are automatically used:
| Tool | Description |
|------|-------------|
| `search_documents` | Search in specific collection or all collections |
| `list_collections` | List all available collections |
| `get_collection_info` | Get details about a collection |
**Note:** Document addition is CLI-only, not available in Claude Desktop.
---
## How It Works
### Indexing (CLI)
```
1. Read file (PDF/DOCX/PPTX/etc)
2. Extract text
3. Split into 500-token chunks (50-token overlap)
4. Generate embeddings (ChromaDB)
5. Store in collection
```
### Searching (Claude Desktop)
```
1. You ask: "What's the attendance policy?"
2. MCP-RAG searches ChromaDB
3. Returns top 5 most relevant chunks
4. Claude answers using ONLY those chunks
```
---
## Use Cases
### π Students
```bash
npm run cli add math "calculus-textbook.pdf"
npm run cli add physics "lecture-notes.docx"
```
β "Explain the concept of derivatives from my math collection"
### π’ Professionals
```bash
npm run cli add company "employee-handbook.pdf"
npm run cli add project "requirements.docx"
```
β "What's our vacation policy?"
### π¬ Researchers
```bash
npm run cli add literature "papers/*.pdf"
npm run cli add notes "research-notes.md"
```
β "Summarize the methodology from the literature collection"
---
## Features
- β
**Multi-collection support** - Organize by topic
- β
**Semantic search** - ChromaDB vector embeddings
- β
**Source attribution** - See which document/chunk
- β
**Relevance scoring** - Know how confident the match is
- β
**Multiple file formats** - PDF, DOCX, PPTX, XLSX, HWP, TXT, MD
- β
**100% local** - No cloud, all on your machine
- β
**0% hallucination** - Only document-based answers
---
## Comparison
| Feature | NotebookLM | MCP-RAG |
|---------|-----------|---------|
| Platform | Google Cloud | Local |
| AI Model | Gemini | Claude |
| Privacy | Cloud | 100% Local |
| Multi-collection | β | β
|
| CLI | β | β
|
| Cost | Free (limited) | Free (unlimited) |
---
## Troubleshooting
### ChromaDB Connection Error
**Problem:** `Cannot connect to ChromaDB`
**Solution:**
```bash
chroma run --host localhost --port 8000
```
Keep this terminal open!
### Claude Desktop: MCP Server Not Showing
1. Check `claude_desktop_config.json` syntax
2. Use absolute path (not relative)
3. Restart Claude Desktop completely
4. Check ChromaDB is running
### No Search Results
```bash
# Verify documents are indexed
npm run cli list
npm run cli info <collection>
# Re-index if needed
npm run cli add <collection> <file>
```
---
## Advanced
### Batch Add Files
**PowerShell:**
```powershell
Get-ChildItem "C:\docs\*.pdf" | ForEach-Object {
npm run cli add MyCollection $_.FullName
}
```
**Bash:**
```bash
for f in /path/to/docs/*.pdf; do
npm run cli add MyCollection "$f"
done
```
### Custom Chunk Size
Edit `src/indexer.js`:
```javascript
const CHUNK_SIZE = 500; // Tokens per chunk
const CHUNK_OVERLAP = 50; // Overlap between chunks
```
Larger chunks = more context, fewer chunks
Smaller chunks = more precise, more chunks
---
## Project Structure
```
mcp-rag/
βββ src/
β βββ index.js # MCP server
β βββ cli.js # CLI tool
β βββ indexer.js # Document processing
βββ chroma/ # ChromaDB data (auto-created)
βββ package.json
βββ README.md
βββ QUICK_START.md
βββ HOW_TO_USE.md
```
---
## Requirements
- **Node.js** 18+
- **Python** 3.8+ (for ChromaDB)
- **Claude Desktop** (latest version)
---
## Contributing
Contributions welcome! This is a universal tool that can benefit many users.
---
## License
MIT License - see [LICENSE](LICENSE)
---
## Credits
Built with:
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - Anthropic
- [ChromaDB](https://www.trychroma.com/) - Vector database
- [pdf-parse](https://www.npmjs.com/package/pdf-parse) - PDF extraction
- [mammoth](https://www.npmjs.com/package/mammoth) - DOCX extraction
- [officeparser](https://www.npmjs.com/package/officeparser) - PPTX extraction
- [xlsx](https://www.npmjs.com/package/xlsx) - Excel extraction
- [node-hwp](https://www.npmjs.com/package/node-hwp) - νκΈ extraction
---
**MCP-RAG** - Your documents, Claude's intelligence, zero hallucination.