Skip to main content
Glama
Keith-hoka

rag-mcp

by Keith-hoka

rag-mcp

A production-grade Model Context Protocol server that exposes the retrieval capabilities of two RAG systems as authenticated, strongly-typed MCP tools. Any MCP-capable client — Claude Desktop, Cursor, an agent framework, or the bundled thin client — can connect over Streamable HTTP, authenticate with a JWT, and call retrieval as a tool.

It wraps two existing projects:

  • rag_graph — graph-augmented RAG (Chroma + BM25 + Neo4j GraphRAG, RRF fusion, FlashRank rerank, LangGraph adaptive routing).

  • rag_kb — a hybrid-retrieval knowledge base (dense + BM25 + RRF, with HyDE / multi-query / rerank pipelines).

The point of the project is not the retrieval itself — it is the integration discipline: a clean adapter boundary, server-level auth, tool schemas that are locked by tests, and a stub seam that keeps the whole suite runnable in CI with no databases, models, or secrets.

Tools

Tool

Backend

Parameters

retrieve_graph

rag_graph

query, top_k (1–20), route{simple, complex, adaptive}

retrieve_kb

rag_kb

query, top_k (1–20), pipeline{dense, multi_query, hyde, rerank, hybrid, hybrid_rerank}

Both return a list of typed Hit objects: doc_id, text, score, source (dense/sparse/graph/fused), and a string metadata map. The schemas are generated from Python type hints and frozen by a snapshot test (see Testing).

Each tool's parameter set is a 1:1 reflection of what its backend actually supports — retrieve_graph exposes the app's simple/complex routes plus the LangGraph router (adaptive); retrieve_kb exposes exactly the six named pipelines in app/retrieve/vector.py. There are no parameter combinations the backend can't honor.

Related MCP server: Modular RAG MCP Server

Architecture

client ──HTTP+JWT──▶ FastMCP server (auth) ──▶ retrieve_graph ─▶ GraphRetriever ─┐
                                          └──▶ retrieve_kb    ─▶ KbRetriever ────┤
                                                                                 ▼
                                          Protocols decouple the MCP layer from backends
                                                                                 │
                          ┌──────────────────────────────────────────────────────┤
                          ▼                                                        ▼
                 Stub*Retriever (tests, CI)                          Rag*Retriever (adapters)
                                                                     └─ rag_graph / rag_kb

Four decisions carry the design:

  1. Protocol + adapter decoupling. The MCP layer depends on two Retriever Protocols, never on backend internals. Each backend has a stub implementation (deterministic, dependency-free) and a real adapter. This is what lets the contract tests and integration tests run in CI without Neo4j, Chroma, Redis, torch, or an OpenAI key.

  2. Server-level JWT auth. A JWTVerifier validates RS256 tokens (signature, expiry, audience, issuer) and enforces a rag:read scope. The server is a pure resource server — token issuance lives elsewhere (a real IdP in production; an RSAKeyPair for dev/tests). Auth is applied once at the server and covers both tools.

  3. Tested tool schemas. The generated JSON schema for each tool is asserted against a frozen snapshot. Renaming a parameter, dropping a description, changing a bound, or drifting an enum fails CI — the MCP surface is a contract, and the contract is version-controlled.

  4. Stub boundary as the CI seam. Because the real adapters are injected at a composition root and never imported by the core or the stubs, the test suite is fully hermetic. Adapters are unit-tested with fakes that assert the backend-row → Hit mapping in isolation.

Quickstart

Requires Python 3.12 and uv.

uv sync --all-extras --dev

# End-to-end demo: spins up an authed server, mints a token, runs the loop
uv run python -m rag_mcp.client --demo

# Or run the server and connect manually
uv run python -m rag_mcp                 # prints a bearer token, serves on :8000
uv run python -m rag_mcp.client <token>  # in another terminal

The demo prints something like:

Connected to http://127.0.0.1:8000/mcp
Tools: retrieve_graph, retrieve_kb

  retrieve_graph(route=adaptive)  (3 hits)
    [1.000] fused  stub-0   'stub result 0 for: What is reciprocal rank fusion?'
  ...

By default everything runs against the stub backend, so no external services are needed.

Authentication

The server validates JWTs as a resource server and requires the rag:read scope.

  • Dev: python -m rag_mcp generates an ephemeral RSA keypair, prints a signed token, and verifies against the matching public key.

  • Production: point the verifier at your IdP's public key (e.g. via a JWKS endpoint) and have clients obtain tokens from the IdP. No server code changes — only how the public key is supplied.

Rejected requests return 401 (no token, expired, wrong audience/issuer, or missing scope), each covered by a test against a real HTTP server.

Testing & CI

uv run ruff check .
uv run pytest -q

The suite is split by concern and is entirely hermetic:

  • test_schema.py — locks each tool's input schema (the regression gate) and asserts the output exposes the Hit fields.

  • test_auth.py — happy path plus the four 401 rejection cases, against a real Streamable HTTP server (the in-process transport does not enforce auth, so these must run over HTTP).

  • test_adapters.py — backend-row → Hit mapping for both adapters, using injected fakes (no real backends).

  • test_smoke.py / test_wiring.py — tool wiring, parameter validation, and backend selection.

CI (.github/workflows/ci.yml) runs ruff check + pytest on every push and PR via uv sync --locked (which also fails if the lockfile is stale). No databases, models, or secrets are required.

Running against real backends

The real adapters call into rag_graph and rag_kb directly (the rag_kb HTTP endpoint returns a generated answer, not chunks, so retrieval goes through its Python pipelines). To run for real, the two repos must be importable and their dependencies installed in this environment:

export RAG_MCP_BACKEND=real
export RAG_GRAPH_PATH=/path/to/rag_graph
export RAG_KB_PATH=/path/to/rag_kb
uv run python -m rag_mcp

The composition root puts those paths on sys.path and lazy-imports the backends, so the import cost (and heavy dependencies) are incurred only in this mode.

Design notes

  • Schema matches backend, never the reverse. Both tools' parameters were redesigned once the real retrieval code was read, so no exposed option is unsupported (e.g. there is no hybrid + hyde because no such pipeline exists).

  • rag_kb via import, not HTTP. Its /query endpoint retrieves and generates; an MCP retrieval tool needs raw chunks, so the adapter calls the async pipeline functions and skips generation.

  • doc_id for rag_graph is a content hash. FlashRank rebuilds reranked documents without a stable id (and the id it adds is just a batch index), so a hash of the chunk text is the most reliable stable identifier; the original metadata is preserved in Hit.metadata. rag_kb uses its real chunk_id.

  • Sync retrieval is offloaded. rag_graph's LangChain retrievers are synchronous; the adapter wraps .invoke in asyncio.to_thread so the async server is never blocked.

Project layout

src/rag_mcp/
  server.py        build_server(): the two MCP tools
  retrieval.py     Hit model, Retriever Protocols, stub backends
  auth.py          JWTVerifier builder
  wiring.py        composition root (stub | real)
  client.py        thin client + self-contained --demo
  __main__.py      dev entrypoint (prints token, serves over HTTP)
  adapters/
    rag_kb.py      RagKbRetriever  (+ real wiring)
    rag_graph.py   RagGraphRetriever (+ real wiring)
tests/
  conftest.py      stub + authed-HTTP fixtures
  test_schema.py   tool-contract regression gate
  test_auth.py     401 cases + happy path (real HTTP)
  test_adapters.py row/Document -> Hit mapping
  test_smoke.py    tool wiring & validation
  test_wiring.py   backend selection
F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Keith-hoka/rag_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server