Skip to main content
Glama

Agentic RAG MCP

CI License: MIT Python 3.10+ MCP

A multi-agent Retrieval-Augmented Generation system exposed as an MCP server. Ask a question and a LangGraph pipeline plans the retrieval, pulls evidence from a pgvector knowledge base, optionally augments it with live web research, drafts a cited answer, and then self-critiques it for grounding — revising until the answer is supported by the sources.

It plugs into any MCP client (Claude Code/Desktop, Cursor, Windsurf, …) as three tools: ingest, ask, and search.

Why this design? A bare RAG endpoint is easy to copy; a multi-agent system that verifies its own answers and ships as an MCP server is not. The architecture is the moat — "easy to buy, hard to replicate."


Architecture

flowchart LR
    Q([Question]) --> P[🧭 Planner<br/>plan + search queries]
    P --> R[📚 Retriever<br/>pgvector top-k]
    R --> W[🌐 Web Researcher<br/>Firecrawl • optional]
    W --> S[✍️ Synthesizer<br/>cited answer]
    S --> C{🔎 Critic<br/>grounded?}
    C -- needs revision --> S
    C -- grounded --> A([Answer + citations])

    subgraph Stores
      DB[(Supabase<br/>pgvector)]
    end
    R <-->|cosine search| DB

    classDef agent fill:#1e293b,stroke:#7C3AED,color:#e2e8f0;
    class P,R,W,S,C agent;

Agent

Model / tool

Responsibility

Planner

Claude (claude-opus-4-8, adaptive thinking)

Decompose the question into focused search queries

Retriever

Voyage embeddings + pgvector

Cosine top-k over the knowledge base

Web Researcher

Firecrawl (optional)

Augment with live web results when a key is set

Synthesizer

Claude

Draft an answer grounded in context, with [n] citations

Critic

Claude

Verify grounding; loop back for revision if unsupported


Related MCP server: AI MCP System

MCP tools

Tool

Arguments

Returns

ingest

url: str

Scrapes the URL, chunks + embeds it, stores it. { url, chunks_added }

ask

question: str

Runs the full pipeline. { answer, citations, plan, grounded }

search

query: str, k: int = 5

Retrieval only — top-k chunks with similarity scores


Quickstart

# 1. Install (Python 3.10+)
uv venv && uv pip install -e ".[dev]"     # or: pip install -e ".[dev]"

# 2. Configure
cp .env.example .env                       # fill in ANTHROPIC_API_KEY, VOYAGE_API_KEY, DATABASE_URL

# 3. Create the vector table (Supabase SQL editor or psql)
psql "$DATABASE_URL" -f sql/schema.sql

# 4. Run the MCP server (stdio by default)
agentic-rag-mcp

Connect it to Claude Code

claude mcp add agentic-rag -s user \
  --env ANTHROPIC_API_KEY=sk-ant-... \
  --env VOYAGE_API_KEY=pa-... \
  --env DATABASE_URL=postgresql://... \
  -- agentic-rag-mcp

Then, from the client: "ingest https://example.com/docs""ask: how do I configure X?".


How it works

  1. Plan — Claude turns the question into a short plan + 1–5 search queries.

  2. Retrieve — each query is embedded (Voyage voyage-3.5) and matched against pgvector by cosine distance; results are de-duplicated and ranked.

  3. Research — if FIRECRAWL_API_KEY is set, live web results are added to the context.

  4. Synthesize — Claude writes an answer grounded only in the numbered context, citing each claim as [n].

  5. Critique — a strict fact-checker pass decides whether the answer is fully supported. If not (and revisions remain), it loops back to the synthesizer with feedback.

Configurable via env: RAG_MODEL, RAG_TOP_K, RAG_MAX_REVISIONS, RAG_EMBED_MODEL.


Evaluation

Answer quality is tracked with promptfoo — faithfulness, citation presence, and latency — so quality is measured, not asserted:

cd evals && promptfoo eval -c promptfooconfig.yaml

See evals/ for the rubric and test cases.


Deploy

Containerised and ready for Railway (HTTP transport):

railway up        # uses Dockerfile + railway.json; set RAG_TRANSPORT=http

Expose RAG_HTTP_PORT and connect over --transport http. A cloudflared tunnel works for local demos.


Project layout

src/agentic_rag_mcp/
  config.py      # env-driven settings
  llm.py         # Anthropic (Claude) helper — adaptive thinking, JSON parsing
  embeddings.py  # Voyage embeddings
  store.py       # pgvector store (psycopg)
  web.py         # Firecrawl web research (optional)
  ingest.py      # chunking + ingestion
  state.py       # LangGraph state
  nodes.py       # planner / retriever / researcher / synthesizer / critic
  graph.py       # graph assembly
  server.py      # FastMCP server (ingest / ask / search)
sql/schema.sql   # pgvector schema
evals/           # promptfoo eval suite

License

MIT — see LICENSE.

Install Server
A
license - permissive license
A
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/enached134-ctrl/agentic-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server