rag-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@rag-mcpRetrieve information about reciprocal rank fusion"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
rag-mcp
A production-grade Model Context Protocol server that exposes the retrieval capabilities of two RAG systems as authenticated, strongly-typed MCP tools. Any MCP-capable client — Claude Desktop, Cursor, an agent framework, or the bundled thin client — can connect over Streamable HTTP, authenticate with a JWT, and call retrieval as a tool.
It wraps two existing projects:
rag_graph— graph-augmented RAG (Chroma + BM25 + Neo4j GraphRAG, RRF fusion, FlashRank rerank, LangGraph adaptive routing).rag_kb— a hybrid-retrieval knowledge base (dense + BM25 + RRF, with HyDE / multi-query / rerank pipelines).
The point of the project is not the retrieval itself — it is the integration discipline: a clean adapter boundary, server-level auth, tool schemas that are locked by tests, and a stub seam that keeps the whole suite runnable in CI with no databases, models, or secrets.
Tools
Tool | Backend | Parameters |
|
|
|
|
|
|
Both return a list of typed Hit objects: doc_id, text, score, source (dense/sparse/graph/fused), and a string metadata map. The schemas are generated from Python type hints and frozen by a snapshot test (see Testing).
Each tool's parameter set is a 1:1 reflection of what its backend actually supports — retrieve_graph exposes the app's simple/complex routes plus the LangGraph router (adaptive); retrieve_kb exposes exactly the six named pipelines in app/retrieve/vector.py. There are no parameter combinations the backend can't honor.
Related MCP server: Modular RAG MCP Server
Architecture
client ──HTTP+JWT──▶ FastMCP server (auth) ──▶ retrieve_graph ─▶ GraphRetriever ─┐
└──▶ retrieve_kb ─▶ KbRetriever ────┤
▼
Protocols decouple the MCP layer from backends
│
┌──────────────────────────────────────────────────────┤
▼ ▼
Stub*Retriever (tests, CI) Rag*Retriever (adapters)
└─ rag_graph / rag_kbFour decisions carry the design:
Protocol + adapter decoupling. The MCP layer depends on two
RetrieverProtocols, never on backend internals. Each backend has a stub implementation (deterministic, dependency-free) and a real adapter. This is what lets the contract tests and integration tests run in CI without Neo4j, Chroma, Redis, torch, or an OpenAI key.Server-level JWT auth. A
JWTVerifiervalidates RS256 tokens (signature, expiry, audience, issuer) and enforces arag:readscope. The server is a pure resource server — token issuance lives elsewhere (a real IdP in production; anRSAKeyPairfor dev/tests). Auth is applied once at the server and covers both tools.Tested tool schemas. The generated JSON schema for each tool is asserted against a frozen snapshot. Renaming a parameter, dropping a description, changing a bound, or drifting an enum fails CI — the MCP surface is a contract, and the contract is version-controlled.
Stub boundary as the CI seam. Because the real adapters are injected at a composition root and never imported by the core or the stubs, the test suite is fully hermetic. Adapters are unit-tested with fakes that assert the backend-row →
Hitmapping in isolation.
Quickstart
Requires Python 3.12 and uv.
uv sync --all-extras --dev
# End-to-end demo: spins up an authed server, mints a token, runs the loop
uv run python -m rag_mcp.client --demo
# Or run the server and connect manually
uv run python -m rag_mcp # prints a bearer token, serves on :8000
uv run python -m rag_mcp.client <token> # in another terminalThe demo prints something like:
Connected to http://127.0.0.1:8000/mcp
Tools: retrieve_graph, retrieve_kb
retrieve_graph(route=adaptive) (3 hits)
[1.000] fused stub-0 'stub result 0 for: What is reciprocal rank fusion?'
...By default everything runs against the stub backend, so no external services are needed.
Authentication
The server validates JWTs as a resource server and requires the rag:read scope.
Dev:
python -m rag_mcpgenerates an ephemeral RSA keypair, prints a signed token, and verifies against the matching public key.Production: point the verifier at your IdP's public key (e.g. via a JWKS endpoint) and have clients obtain tokens from the IdP. No server code changes — only how the public key is supplied.
Rejected requests return 401 (no token, expired, wrong audience/issuer, or missing scope), each covered by a test against a real HTTP server.
Testing & CI
uv run ruff check .
uv run pytest -qThe suite is split by concern and is entirely hermetic:
test_schema.py— locks each tool's input schema (the regression gate) and asserts the output exposes theHitfields.test_auth.py— happy path plus the four401rejection cases, against a real Streamable HTTP server (the in-process transport does not enforce auth, so these must run over HTTP).test_adapters.py— backend-row →Hitmapping for both adapters, using injected fakes (no real backends).test_smoke.py/test_wiring.py— tool wiring, parameter validation, and backend selection.
CI (.github/workflows/ci.yml) runs ruff check + pytest on every push and PR via uv sync --locked (which also fails if the lockfile is stale). No databases, models, or secrets are required.
Running against real backends
The real adapters call into rag_graph and rag_kb directly (the rag_kb HTTP endpoint returns a generated answer, not chunks, so retrieval goes through its Python pipelines). To run for real, the two repos must be importable and their dependencies installed in this environment:
export RAG_MCP_BACKEND=real
export RAG_GRAPH_PATH=/path/to/rag_graph
export RAG_KB_PATH=/path/to/rag_kb
uv run python -m rag_mcpThe composition root puts those paths on sys.path and lazy-imports the backends, so the import cost (and heavy dependencies) are incurred only in this mode.
Design notes
Schema matches backend, never the reverse. Both tools' parameters were redesigned once the real retrieval code was read, so no exposed option is unsupported (e.g. there is no
hybrid + hydebecause no such pipeline exists).rag_kbvia import, not HTTP. Its/queryendpoint retrieves and generates; an MCP retrieval tool needs raw chunks, so the adapter calls the async pipeline functions and skips generation.doc_idforrag_graphis a content hash. FlashRank rebuilds reranked documents without a stable id (and the id it adds is just a batch index), so a hash of the chunk text is the most reliable stable identifier; the original metadata is preserved inHit.metadata.rag_kbuses its realchunk_id.Sync retrieval is offloaded.
rag_graph's LangChain retrievers are synchronous; the adapter wraps.invokeinasyncio.to_threadso the async server is never blocked.
Project layout
src/rag_mcp/
server.py build_server(): the two MCP tools
retrieval.py Hit model, Retriever Protocols, stub backends
auth.py JWTVerifier builder
wiring.py composition root (stub | real)
client.py thin client + self-contained --demo
__main__.py dev entrypoint (prints token, serves over HTTP)
adapters/
rag_kb.py RagKbRetriever (+ real wiring)
rag_graph.py RagGraphRetriever (+ real wiring)
tests/
conftest.py stub + authed-HTTP fixtures
test_schema.py tool-contract regression gate
test_auth.py 401 cases + happy path (real HTTP)
test_adapters.py row/Document -> Hit mapping
test_smoke.py tool wiring & validation
test_wiring.py backend selectionThis server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Keith-hoka/rag_mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server