RAG MCP Server
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@RAG MCP Serversearch my documents for renewable energy trends"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
RAG MCP Server
MCP server with Streamable HTTP transport that indexes PDF documents into Qdrant and exposes semantic search as MCP tools. Works with any MCP-compatible client: Claude Desktop, Claude Code, or a custom agent.
Stack
Layer | Library |
HTTP framework | Hono + |
MCP protocol |
|
Vector DB | Qdrant via |
Embeddings | FastEmbed — |
PDF parsing |
|
Related MCP server: PDF Indexer MCP Server
Architecture
┌──────────────────────────────────────────────────────────┐
│ MCP Client │
│ (Claude Desktop / Claude Code / Custom Agent) │
└────────────────────┬─────────────────────────────────────┘
│ HTTP port 38080 → /mcp
▼
┌──────────────────────────────────────────────────────────┐
│ RAG MCP Server (Hono) │
│ │
│ ┌──────────────────┐ ┌─────────────────┐ ┌────────┐ │
│ │ index_document │ │ semantic_search │ │ list │ │
│ └────────┬─────────┘ └────────┬────────┘ └───┬────┘ │
│ │ │ │ │
│ ┌────────▼─────────────────────▼───────────────▼──────┐ │
│ │ FastEmbed — BAAI/bge-small-en-v1.5 │ │
│ │ 384 dims · local ONNX │ │
│ └──────────────────────────┬──────────────────────────┘ │
└─────────────────────────────┼────────────────────────────┘
│ REST port 39333
▼
┌──────────────────────────────────────────────────────────┐
│ Qdrant │
│ Vector database │
│ collection: rag_documents │
│ volume: qdrant_storage (persistent) │
└──────────────────────────────────────────────────────────┘Why RAG with Qdrant instead of a direct summary?
When you ask an LLM to summarize a full document, it receives all the text at once. With RAG, only the most relevant chunks for your query are retrieved first. The difference is substantial:
WITHOUT RAG — Full document as context
────────────────────────────────────────────────────────────
┌─────────┐ ┌──────────────────────────────────────┐ ┌─────┐
│ Query │────►│ Entire book (~150 000 tokens) │────►│ LLM │
└─────────┘ │ ⚠ May exceed the context window │ └─────┘
│ ⚠ High cost proportional to size │
│ ⚠ Model gets distracted by noise │
└──────────────────────────────────────┘
WITH RAG — Only relevant chunks
────────────────────────────────────────────────────────────
┌─────────┐ ┌────────────┐ ┌───────────────────────┐ ┌─────┐
│ Query │────►│ Qdrant │────►│ Top-k chunks │────►│ LLM │
└─────────┘ │ (semantic │ │ (~5 000–8 000 tokens)│ └─────┘
│ search) │ │ ✓ Only what matters │
└────────────┘ └───────────────────────┘Benefits
Aspect | Without RAG | With RAG + Qdrant |
Input tokens | Entire document | Only relevant chunks |
Cost per query | High (scales with document size) | Low and predictable |
Accuracy | LLM must filter noise itself | Qdrant pre-filters by similarity |
Context window limit | Can be exceeded | Not a concern |
Multiple documents | Impossible in a single call | Unified search across all |
Response speed | Slower (more tokens) | Faster |
Citable sources | No | Yes (chunk + score + source file) |
Core idea: the LLM reasons, Qdrant remembers. Each does what it does best.
Quick start with Docker
The recommended way to run the server is Docker Compose. A single command starts both Qdrant and the MCP server.
Exposed ports
Service | Host port | Internal port | Description |
MCP Server | 38080 | 3000 | MCP endpoint ( |
Qdrant REST | 39333 | 6333 | Qdrant REST API (dashboard + client) |
Qdrant gRPC | 39334 | 6334 | Qdrant gRPC API |
Step 1 — Add your documents
Drop your PDFs into the .docs/ folder:
rag-demo/
└── .docs/
└── my-document.pdfStep 2 — Start the services
docker compose up -dThis starts:
Qdrant — vector database with a persistent volume
RAG MCP Server — auto-indexes all PDFs in
.docs/on startup
[+] Running 2/2
✔ Container rag-qdrant Started
✔ Container rag-mcp StartedFirst run: FastEmbed downloads
BAAI/bge-small-en-v1.5(~25 MB) into a persistent volume. Subsequent starts are instant.
Step 3 — Verify everything is running
curl http://localhost:38080/healthExpected response:
{
"status": "ok",
"sessions": 0,
"qdrant": "http://qdrant:6333",
"collection": "rag_documents"
}Step 4 — Connect your MCP client
MCP endpoint:
http://localhost:38080/mcpClaude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"rag": {
"type": "http",
"url": "http://localhost:38080/mcp"
}
}
}Claude Code:
claude mcp add rag --transport http http://localhost:38080/mcpQuick start (local dev)
If you prefer to run without Docker:
# Start only Qdrant
docker compose up qdrant -d
# Configure environment
cp .env.example .env
# Install and run
npm install
npm run devServer listens on http://localhost:3000/mcp.
Step-by-step usage guide
Indexing flow
PDF on disk
│
▼
index_document
│
├─► Extract full text (pdf-parse)
│
├─► Split into chunks (512 chars · 64 overlap)
│
├─► Generate embeddings (FastEmbed · 384 dims)
│
└─► Store in Qdrant (vectors + metadata)Search and summarization flow
Natural language query
│
▼
semantic_search
│
├─► Embed the query (FastEmbed)
│
├─► Cosine similarity search (Qdrant)
│
└─► Return top-k chunks with similarity score
│
▼
LLM (Claude)
│
└─► Generate answer grounded in the retrieved chunksFull example: The 7 Habits of Highly Effective People
A real end-to-end walkthrough using Stephen R. Covey's book.
1. Index the document
Place the PDF in .docs/ and call index_document:
Input:
Tool: index_document
path: .docs/the-7-habits-of-highly-effective-people.pdfOutput:
╔══════════════════════════════════════╗
║ Document indexed successfully ║
╚══════════════════════════════════════╝
File : the-7-habits-of-highly-effective-people.pdf
Chunks : 1823
Vectors : 1823 × 384 dims
Time : 47.3s
Ready for semantic search.Note: Large documents may take a few minutes on first indexing. Re-indexing the same file automatically replaces its previous vectors.
2. List indexed documents
Input:
Tool: list_indexed_documentsOutput:
Indexed documents (1):
1. the-7-habits-of-highly-effective-people.pdf3. Request a summary using semantic search
To summarize all 7 habits, run one targeted search per habit and then ask the LLM to synthesize the results.
Searches (one per habit):
semantic_search: "habit 1 be proactive personal vision"
semantic_search: "habit 2 begin with the end in mind personal mission"
semantic_search: "habit 3 put first things first quadrant II time management"
semantic_search: "habit 4 think win-win mutual benefit agreements"
semantic_search: "habit 5 seek first to understand empathic listening"
semantic_search: "habit 6 synergize creative cooperation"
semantic_search: "habit 7 sharpen the saw balanced renewal"Each search returns ranked chunks with a similarity score and their source:
[1] score=0.8205 source="the-7-habits-..." chunk=261
"Habit 1 says: You are the creator. You are in charge.
It is based on the four human endowments — self-awareness,
imagination, conscience, and independent will..."
[2] score=0.8100 ...
[3] score=0.8095 ...Summary generated from the retrieved chunks:
The 7 Habits of Highly Effective People — Summary
The book builds an effectiveness framework organized in two blocks: Private Victory (habits 1–3, independence) and Public Victory (habits 4–6, interdependence), topped by a seventh renewal habit.
Habit 1 — Be Proactive: You are the creator of your life. You choose your response to any stimulus using four human endowments: self-awareness, imagination, conscience, and independent will.
Habit 2 — Begin with the End in Mind: Everything is created twice — first mentally, then physically. Define a personal mission statement that guides every decision.
Habit 3 — Put First Things First: Focus on Quadrant II (important, not urgent): planning, relationships, personal renewal. Learn to say a firm "no" to urgent but unimportant demands.
Habit 4 — Think Win/Win: Seek agreements where all parties benefit. Built on five elements: desired results, guidelines, resources, accountability, and consequences.
Habit 5 — Seek First to Understand: Listen genuinely before speaking. Empathic listening requires consideration; being understood requires courage. Both are essential.
Habit 6 — Synergize: The whole is greater than the sum of its parts. High trust and high cooperation produce outcomes no individual party could reach alone.
Habit 7 — Sharpen the Saw: Renew yourself continuously across four dimensions: physical, mental, social/emotional, and spiritual. Without renewal, the other six habits deteriorate.
MCP endpoint
http://localhost:38080/mcp (Docker)
http://localhost:3000/mcp (local dev)The server implements the Streamable HTTP MCP transport with stateful sessions:
Method | Purpose |
| Initialise session / send client messages |
| SSE stream for server → client notifications |
| Explicitly close a session |
Sessions are tracked with Mcp-Session-Id headers.
MCP Tools
index_document
Index a single PDF file. Re-indexing the same file replaces its previous vectors automatically.
{ "path": "/absolute/path/to/file.pdf" }{ "path": ".docs/my-book.pdf" }semantic_search
Search indexed documents with natural language.
{
"query": "What are the principles of the Private Victory?",
"limit": 5,
"source": "the-7-habits-of-highly-effective-people.pdf"
}Parameter | Type | Description |
|
| Natural language search query |
|
| Max results to return (default: 5, max: 20) |
|
| Restrict search to a specific document (optional) |
list_indexed_documents
List all documents currently indexed.
{}Configuration
Variable | Default | Description |
|
| HTTP server port |
|
| Qdrant instance URL |
| (empty) | API key for Qdrant Cloud |
|
| Collection name |
|
| Default directory for auto-indexing on startup |
|
| Characters per chunk |
|
| Overlap between adjacent chunks |
Scripts
npm run dev # development with hot-reload (tsx watch)
npm run build # production build (tsup → dist/)
npm start # run production build
npm run typecheck # type check without emitting
npm run test # unit tests (vitest)
npm run test:integration # integration tests (requires Qdrant)
npm run test:all # unit + integrationThis server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/devsinsight/rag-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server