How do I use RAG MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@RAG MCP Server search my documents for renewable energy trends" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

RAG MCP Server

by devsinsight

Overview Schema Related Servers Score Discussions

TypeScript

Remote

RAG MCP Server

MCP server with Streamable HTTP transport that indexes PDF documents into Qdrant and exposes semantic search as MCP tools. Works with any MCP-compatible client: Claude Desktop, Claude Code, or a custom agent.

Stack

Layer	Library
HTTP framework	Hono + `@hono/node-server`
MCP protocol	`@modelcontextprotocol/sdk` (Streamable HTTP transport)
Vector DB	Qdrant via `@qdrant/js-client-rest`
Embeddings	FastEmbed — `BAAI/bge-small-en-v1.5` (384 dims, local ONNX)
PDF parsing	`pdf-parse`

Related MCP server: PDF Indexer MCP Server

Architecture

┌──────────────────────────────────────────────────────────┐
│                    MCP Client                            │
│        (Claude Desktop / Claude Code / Custom Agent)     │
└────────────────────┬─────────────────────────────────────┘
                     │  HTTP  port 38080  →  /mcp
                     ▼
┌──────────────────────────────────────────────────────────┐
│              RAG MCP Server  (Hono)                      │
│                                                          │
│  ┌──────────────────┐  ┌─────────────────┐  ┌────────┐   │
│  │  index_document  │  │ semantic_search │  │  list  │   │
│  └────────┬─────────┘  └────────┬────────┘  └───┬────┘   │
│           │                     │               │        │
│  ┌────────▼─────────────────────▼───────────────▼──────┐ │
│  │          FastEmbed  —  BAAI/bge-small-en-v1.5       │ │
│  │               384 dims · local ONNX                 │ │
│  └──────────────────────────┬──────────────────────────┘ │
└─────────────────────────────┼────────────────────────────┘
                              │  REST  port 39333
                              ▼
┌──────────────────────────────────────────────────────────┐
│                      Qdrant                              │
│                  Vector database                         │
│             collection: rag_documents                    │
│             volume: qdrant_storage (persistent)          │
└──────────────────────────────────────────────────────────┘

Why RAG with Qdrant instead of a direct summary?

When you ask an LLM to summarize a full document, it receives all the text at once. With RAG, only the most relevant chunks for your query are retrieved first. The difference is substantial:

WITHOUT RAG  —  Full document as context
────────────────────────────────────────────────────────────
┌─────────┐     ┌──────────────────────────────────────┐     ┌─────┐
│  Query  │────►│  Entire book  (~150 000 tokens)      │────►│ LLM │
└─────────┘     │  ⚠ May exceed the context window     │     └─────┘
                │  ⚠ High cost proportional to size    │
                │  ⚠ Model gets distracted by noise    │
                └──────────────────────────────────────┘

WITH RAG  —  Only relevant chunks
────────────────────────────────────────────────────────────
┌─────────┐     ┌────────────┐     ┌───────────────────────┐     ┌─────┐
│  Query  │────►│   Qdrant   │────►│  Top-k chunks         │────►│ LLM │
└─────────┘     │ (semantic  │     │  (~5 000–8 000 tokens)│     └─────┘
                │  search)   │     │  ✓ Only what matters  │
                └────────────┘     └───────────────────────┘

Benefits

Aspect	Without RAG	With RAG + Qdrant
Input tokens	Entire document	Only relevant chunks
Cost per query	High (scales with document size)	Low and predictable
Accuracy	LLM must filter noise itself	Qdrant pre-filters by similarity
Context window limit	Can be exceeded	Not a concern
Multiple documents	Impossible in a single call	Unified search across all
Response speed	Slower (more tokens)	Faster
Citable sources	No	Yes (chunk + score + source file)

Core idea: the LLM reasons, Qdrant remembers. Each does what it does best.

Quick start with Docker

The recommended way to run the server is Docker Compose. A single command starts both Qdrant and the MCP server.

Exposed ports

Service	Host port	Internal port	Description
MCP Server	38080	3000	MCP endpoint (`/mcp`) and health check (`/health`)
Qdrant REST	39333	6333	Qdrant REST API (dashboard + client)
Qdrant gRPC	39334	6334	Qdrant gRPC API

Step 1 — Add your documents

Drop your PDFs into the .docs/ folder:

rag-demo/
└── .docs/
    └── my-document.pdf

Step 2 — Start the services

docker compose up -d

This starts:

Qdrant — vector database with a persistent volume
RAG MCP Server — auto-indexes all PDFs in .docs/ on startup

[+] Running 2/2
 ✔ Container rag-qdrant  Started
 ✔ Container rag-mcp     Started

First run: FastEmbed downloads BAAI/bge-small-en-v1.5 (~25 MB) into a persistent volume. Subsequent starts are instant.

Step 3 — Verify everything is running

curl http://localhost:38080/health

Expected response:

{
  "status": "ok",
  "sessions": 0,
  "qdrant": "http://qdrant:6333",
  "collection": "rag_documents"
}

Step 4 — Connect your MCP client

MCP endpoint:

http://localhost:38080/mcp

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "rag": {
      "type": "http",
      "url": "http://localhost:38080/mcp"
    }
  }
}

Claude Code:

claude mcp add rag --transport http http://localhost:38080/mcp

Quick start (local dev)

If you prefer to run without Docker:

# Start only Qdrant
docker compose up qdrant -d

# Configure environment
cp .env.example .env

# Install and run
npm install
npm run dev

Server listens on http://localhost:3000/mcp.

Step-by-step usage guide

Indexing flow

PDF on disk
    │
    ▼
index_document
    │
    ├─► Extract full text        (pdf-parse)
    │
    ├─► Split into chunks        (512 chars · 64 overlap)
    │
    ├─► Generate embeddings      (FastEmbed · 384 dims)
    │
    └─► Store in Qdrant          (vectors + metadata)

Search and summarization flow

Natural language query
    │
    ▼
semantic_search
    │
    ├─► Embed the query          (FastEmbed)
    │
    ├─► Cosine similarity search (Qdrant)
    │
    └─► Return top-k chunks with similarity score
            │
            ▼
       LLM (Claude)
            │
            └─► Generate answer grounded in the retrieved chunks

Full example: The 7 Habits of Highly Effective People

A real end-to-end walkthrough using Stephen R. Covey's book.

1. Index the document

Place the PDF in .docs/ and call index_document:

Input:

Tool: index_document
path: .docs/the-7-habits-of-highly-effective-people.pdf

Output:

╔══════════════════════════════════════╗
║   Document indexed successfully      ║
╚══════════════════════════════════════╝

  File    : the-7-habits-of-highly-effective-people.pdf
  Chunks  : 1823
  Vectors : 1823 × 384 dims
  Time    : 47.3s

Ready for semantic search.

Note: Large documents may take a few minutes on first indexing. Re-indexing the same file automatically replaces its previous vectors.

2. List indexed documents

Input:

Tool: list_indexed_documents

Output:

Indexed documents (1):

1. the-7-habits-of-highly-effective-people.pdf

3. Request a summary using semantic search

To summarize all 7 habits, run one targeted search per habit and then ask the LLM to synthesize the results.

Searches (one per habit):

semantic_search: "habit 1 be proactive personal vision"
semantic_search: "habit 2 begin with the end in mind personal mission"
semantic_search: "habit 3 put first things first quadrant II time management"
semantic_search: "habit 4 think win-win mutual benefit agreements"
semantic_search: "habit 5 seek first to understand empathic listening"
semantic_search: "habit 6 synergize creative cooperation"
semantic_search: "habit 7 sharpen the saw balanced renewal"

Each search returns ranked chunks with a similarity score and their source:

[1] score=0.8205  source="the-7-habits-..."  chunk=261
    "Habit 1 says: You are the creator. You are in charge.
     It is based on the four human endowments — self-awareness,
     imagination, conscience, and independent will..."

[2] score=0.8100  ...
[3] score=0.8095  ...

Summary generated from the retrieved chunks:

The 7 Habits of Highly Effective People — Summary
The book builds an effectiveness framework organized in two blocks: Private Victory (habits 1–3, independence) and Public Victory (habits 4–6, interdependence), topped by a seventh renewal habit.
Habit 1 — Be Proactive: You are the creator of your life. You choose your response to any stimulus using four human endowments: self-awareness, imagination, conscience, and independent will.
Habit 2 — Begin with the End in Mind: Everything is created twice — first mentally, then physically. Define a personal mission statement that guides every decision.
Habit 3 — Put First Things First: Focus on Quadrant II (important, not urgent): planning, relationships, personal renewal. Learn to say a firm "no" to urgent but unimportant demands.
Habit 4 — Think Win/Win: Seek agreements where all parties benefit. Built on five elements: desired results, guidelines, resources, accountability, and consequences.
Habit 5 — Seek First to Understand: Listen genuinely before speaking. Empathic listening requires consideration; being understood requires courage. Both are essential.
Habit 6 — Synergize: The whole is greater than the sum of its parts. High trust and high cooperation produce outcomes no individual party could reach alone.
Habit 7 — Sharpen the Saw: Renew yourself continuously across four dimensions: physical, mental, social/emotional, and spiritual. Without renewal, the other six habits deteriorate.

MCP endpoint

http://localhost:38080/mcp        (Docker)
http://localhost:3000/mcp         (local dev)

The server implements the Streamable HTTP MCP transport with stateful sessions:

Method	Purpose
`POST /mcp`	Initialise session / send client messages
`GET /mcp`	SSE stream for server → client notifications
`DELETE /mcp`	Explicitly close a session

Sessions are tracked with Mcp-Session-Id headers.

MCP Tools

`index_document`

Index a single PDF file. Re-indexing the same file replaces its previous vectors automatically.

{ "path": "/absolute/path/to/file.pdf" }

{ "path": ".docs/my-book.pdf" }

`semantic_search`

Search indexed documents with natural language.

{
  "query": "What are the principles of the Private Victory?",
  "limit": 5,
  "source": "the-7-habits-of-highly-effective-people.pdf"
}

Parameter	Type	Description
`query`	`string`	Natural language search query
`limit`	`number`	Max results to return (default: 5, max: 20)
`source`	`string`	Restrict search to a specific document (optional)

`list_indexed_documents`

List all documents currently indexed.

{}

Configuration

Variable	Default	Description
`PORT`	`3000`	HTTP server port
`QDRANT_URL`	`http://localhost:6333`	Qdrant instance URL
`QDRANT_API_KEY`	(empty)	API key for Qdrant Cloud
`QDRANT_COLLECTION`	`rag_documents`	Collection name
`DOCS_DIR`	`.docs`	Default directory for auto-indexing on startup
`CHUNK_SIZE`	`512`	Characters per chunk
`CHUNK_OVERLAP`	`64`	Overlap between adjacent chunks

Scripts

npm run dev              # development with hot-reload (tsx watch)
npm run build            # production build (tsup → dist/)
npm start                # run production build
npm run typecheck        # type check without emitting
npm run test             # unit tests (vitest)
npm run test:integration # integration tests (requires Qdrant)
npm run test:all         # unit + integration

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/devsinsight/rag-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server