Skip to main content
Glama
devsinsight

RAG MCP Server

by devsinsight

RAG MCP Server

MCP server with Streamable HTTP transport that indexes PDF documents into Qdrant and exposes semantic search as MCP tools. Works with any MCP-compatible client: Claude Desktop, Claude Code, or a custom agent.


Stack

Layer

Library

HTTP framework

Hono + @hono/node-server

MCP protocol

@modelcontextprotocol/sdk (Streamable HTTP transport)

Vector DB

Qdrant via @qdrant/js-client-rest

Embeddings

FastEmbedBAAI/bge-small-en-v1.5 (384 dims, local ONNX)

PDF parsing

pdf-parse


Related MCP server: PDF Indexer MCP Server

Architecture

┌──────────────────────────────────────────────────────────┐
│                    MCP Client                            │
│        (Claude Desktop / Claude Code / Custom Agent)     │
└────────────────────┬─────────────────────────────────────┘
                     │  HTTP  port 38080  →  /mcp
                     ▼
┌──────────────────────────────────────────────────────────┐
│              RAG MCP Server  (Hono)                      │
│                                                          │
│  ┌──────────────────┐  ┌─────────────────┐  ┌────────┐   │
│  │  index_document  │  │ semantic_search │  │  list  │   │
│  └────────┬─────────┘  └────────┬────────┘  └───┬────┘   │
│           │                     │               │        │
│  ┌────────▼─────────────────────▼───────────────▼──────┐ │
│  │          FastEmbed  —  BAAI/bge-small-en-v1.5       │ │
│  │               384 dims · local ONNX                 │ │
│  └──────────────────────────┬──────────────────────────┘ │
└─────────────────────────────┼────────────────────────────┘
                              │  REST  port 39333
                              ▼
┌──────────────────────────────────────────────────────────┐
│                      Qdrant                              │
│                  Vector database                         │
│             collection: rag_documents                    │
│             volume: qdrant_storage (persistent)          │
└──────────────────────────────────────────────────────────┘

Why RAG with Qdrant instead of a direct summary?

When you ask an LLM to summarize a full document, it receives all the text at once. With RAG, only the most relevant chunks for your query are retrieved first. The difference is substantial:

WITHOUT RAG  —  Full document as context
────────────────────────────────────────────────────────────
┌─────────┐     ┌──────────────────────────────────────┐     ┌─────┐
│  Query  │────►│  Entire book  (~150 000 tokens)      │────►│ LLM │
└─────────┘     │  ⚠ May exceed the context window     │     └─────┘
                │  ⚠ High cost proportional to size    │
                │  ⚠ Model gets distracted by noise    │
                └──────────────────────────────────────┘

WITH RAG  —  Only relevant chunks
────────────────────────────────────────────────────────────
┌─────────┐     ┌────────────┐     ┌───────────────────────┐     ┌─────┐
│  Query  │────►│   Qdrant   │────►│  Top-k chunks         │────►│ LLM │
└─────────┘     │ (semantic  │     │  (~5 000–8 000 tokens)│     └─────┘
                │  search)   │     │  ✓ Only what matters  │
                └────────────┘     └───────────────────────┘

Benefits

Aspect

Without RAG

With RAG + Qdrant

Input tokens

Entire document

Only relevant chunks

Cost per query

High (scales with document size)

Low and predictable

Accuracy

LLM must filter noise itself

Qdrant pre-filters by similarity

Context window limit

Can be exceeded

Not a concern

Multiple documents

Impossible in a single call

Unified search across all

Response speed

Slower (more tokens)

Faster

Citable sources

No

Yes (chunk + score + source file)

Core idea: the LLM reasons, Qdrant remembers. Each does what it does best.


Quick start with Docker

The recommended way to run the server is Docker Compose. A single command starts both Qdrant and the MCP server.

Exposed ports

Service

Host port

Internal port

Description

MCP Server

38080

3000

MCP endpoint (/mcp) and health check (/health)

Qdrant REST

39333

6333

Qdrant REST API (dashboard + client)

Qdrant gRPC

39334

6334

Qdrant gRPC API

Step 1 — Add your documents

Drop your PDFs into the .docs/ folder:

rag-demo/
└── .docs/
    └── my-document.pdf

Step 2 — Start the services

docker compose up -d

This starts:

  1. Qdrant — vector database with a persistent volume

  2. RAG MCP Server — auto-indexes all PDFs in .docs/ on startup

[+] Running 2/2
 ✔ Container rag-qdrant  Started
 ✔ Container rag-mcp     Started

First run: FastEmbed downloads BAAI/bge-small-en-v1.5 (~25 MB) into a persistent volume. Subsequent starts are instant.

Step 3 — Verify everything is running

curl http://localhost:38080/health

Expected response:

{
  "status": "ok",
  "sessions": 0,
  "qdrant": "http://qdrant:6333",
  "collection": "rag_documents"
}

Step 4 — Connect your MCP client

MCP endpoint:

http://localhost:38080/mcp

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "rag": {
      "type": "http",
      "url": "http://localhost:38080/mcp"
    }
  }
}

Claude Code:

claude mcp add rag --transport http http://localhost:38080/mcp

Quick start (local dev)

If you prefer to run without Docker:

# Start only Qdrant
docker compose up qdrant -d

# Configure environment
cp .env.example .env

# Install and run
npm install
npm run dev

Server listens on http://localhost:3000/mcp.


Step-by-step usage guide

Indexing flow

PDF on disk
    │
    ▼
index_document
    │
    ├─► Extract full text        (pdf-parse)
    │
    ├─► Split into chunks        (512 chars · 64 overlap)
    │
    ├─► Generate embeddings      (FastEmbed · 384 dims)
    │
    └─► Store in Qdrant          (vectors + metadata)

Search and summarization flow

Natural language query
    │
    ▼
semantic_search
    │
    ├─► Embed the query          (FastEmbed)
    │
    ├─► Cosine similarity search (Qdrant)
    │
    └─► Return top-k chunks with similarity score
            │
            ▼
       LLM (Claude)
            │
            └─► Generate answer grounded in the retrieved chunks

Full example: The 7 Habits of Highly Effective People

A real end-to-end walkthrough using Stephen R. Covey's book.

1. Index the document

Place the PDF in .docs/ and call index_document:

Input:

Tool: index_document
path: .docs/the-7-habits-of-highly-effective-people.pdf

Output:

╔══════════════════════════════════════╗
║   Document indexed successfully      ║
╚══════════════════════════════════════╝

  File    : the-7-habits-of-highly-effective-people.pdf
  Chunks  : 1823
  Vectors : 1823 × 384 dims
  Time    : 47.3s

Ready for semantic search.

Note: Large documents may take a few minutes on first indexing. Re-indexing the same file automatically replaces its previous vectors.


2. List indexed documents

Input:

Tool: list_indexed_documents

Output:

Indexed documents (1):

1. the-7-habits-of-highly-effective-people.pdf

To summarize all 7 habits, run one targeted search per habit and then ask the LLM to synthesize the results.

Searches (one per habit):

semantic_search: "habit 1 be proactive personal vision"
semantic_search: "habit 2 begin with the end in mind personal mission"
semantic_search: "habit 3 put first things first quadrant II time management"
semantic_search: "habit 4 think win-win mutual benefit agreements"
semantic_search: "habit 5 seek first to understand empathic listening"
semantic_search: "habit 6 synergize creative cooperation"
semantic_search: "habit 7 sharpen the saw balanced renewal"

Each search returns ranked chunks with a similarity score and their source:

[1] score=0.8205  source="the-7-habits-..."  chunk=261
    "Habit 1 says: You are the creator. You are in charge.
     It is based on the four human endowments — self-awareness,
     imagination, conscience, and independent will..."

[2] score=0.8100  ...
[3] score=0.8095  ...

Summary generated from the retrieved chunks:


The 7 Habits of Highly Effective People — Summary

The book builds an effectiveness framework organized in two blocks: Private Victory (habits 1–3, independence) and Public Victory (habits 4–6, interdependence), topped by a seventh renewal habit.

Habit 1 — Be Proactive: You are the creator of your life. You choose your response to any stimulus using four human endowments: self-awareness, imagination, conscience, and independent will.

Habit 2 — Begin with the End in Mind: Everything is created twice — first mentally, then physically. Define a personal mission statement that guides every decision.

Habit 3 — Put First Things First: Focus on Quadrant II (important, not urgent): planning, relationships, personal renewal. Learn to say a firm "no" to urgent but unimportant demands.

Habit 4 — Think Win/Win: Seek agreements where all parties benefit. Built on five elements: desired results, guidelines, resources, accountability, and consequences.

Habit 5 — Seek First to Understand: Listen genuinely before speaking. Empathic listening requires consideration; being understood requires courage. Both are essential.

Habit 6 — Synergize: The whole is greater than the sum of its parts. High trust and high cooperation produce outcomes no individual party could reach alone.

Habit 7 — Sharpen the Saw: Renew yourself continuously across four dimensions: physical, mental, social/emotional, and spiritual. Without renewal, the other six habits deteriorate.


MCP endpoint

http://localhost:38080/mcp        (Docker)
http://localhost:3000/mcp         (local dev)

The server implements the Streamable HTTP MCP transport with stateful sessions:

Method

Purpose

POST /mcp

Initialise session / send client messages

GET /mcp

SSE stream for server → client notifications

DELETE /mcp

Explicitly close a session

Sessions are tracked with Mcp-Session-Id headers.


MCP Tools

index_document

Index a single PDF file. Re-indexing the same file replaces its previous vectors automatically.

{ "path": "/absolute/path/to/file.pdf" }
{ "path": ".docs/my-book.pdf" }

Search indexed documents with natural language.

{
  "query": "What are the principles of the Private Victory?",
  "limit": 5,
  "source": "the-7-habits-of-highly-effective-people.pdf"
}

Parameter

Type

Description

query

string

Natural language search query

limit

number

Max results to return (default: 5, max: 20)

source

string

Restrict search to a specific document (optional)

list_indexed_documents

List all documents currently indexed.

{}

Configuration

Variable

Default

Description

PORT

3000

HTTP server port

QDRANT_URL

http://localhost:6333

Qdrant instance URL

QDRANT_API_KEY

(empty)

API key for Qdrant Cloud

QDRANT_COLLECTION

rag_documents

Collection name

DOCS_DIR

.docs

Default directory for auto-indexing on startup

CHUNK_SIZE

512

Characters per chunk

CHUNK_OVERLAP

64

Overlap between adjacent chunks


Scripts

npm run dev              # development with hot-reload (tsx watch)
npm run build            # production build (tsup → dist/)
npm start                # run production build
npm run typecheck        # type check without emitting
npm run test             # unit tests (vitest)
npm run test:integration # integration tests (requires Qdrant)
npm run test:all         # unit + integration
F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/devsinsight/rag-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server