How do I use RAG Knowledge Base MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@RAG Knowledge Base MCP Server How much does the Standard tier cost?" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

RAG Knowledge Base MCP Server

by RaulSaavedraDeLaRiera

Overview Schema Related Servers Score Discussions

Python

Local

RAG Knowledge Base — Hybrid Search + Evaluation

A production-style Retrieval-Augmented Generation pipeline over a document knowledge base. It combines dense and sparse retrieval, cross-encoder reranking, source-grounded answers with citations, and a full evaluation harness that measures both retrieval and answer quality. Exposed over a REST API and as an MCP server for agentic access.

Built to run fully local at zero cost (PostgreSQL + pgvector, on-device embeddings), with a pluggable embedding backend so the same code runs against an API provider by changing one config value.

Why this is more than a basic RAG

Concern	Approach
Retrieval	Hybrid search: pgvector cosine (dense) + Postgres full-text (sparse), fused with Reciprocal Rank Fusion
Ranking	Cross-encoder reranker scores each (query, chunk) pair directly
Grounding	Answers cite sources with `[n]` markers and refuse when the context is insufficient
Evaluation	Retrieval metrics (precision@k, recall@k, MRR) + LLM-as-judge faithfulness and answer relevance + refusal accuracy
A/B evaluation	Same harness runs each retrieval mode (vector / hybrid / hybrid+rerank) and reports the lift with numbers
Streaming	Answers stream token by token over Server-Sent Events
UI	Minimal web frontend with live streaming and clickable citations
Portability	Pluggable embedding backend (local sentence-transformers or Voyage API)
Agentic access	MCP server exposing `search_knowledge_base` and `ask_knowledge_base` tools

Related MCP server: MCP RAG Server

Architecture

graph LR
    subgraph Ingestion
        DOCS[Documents\nmd / txt / pdf]
        CHUNK[Chunker\nparagraph-aware + overlap]
        EMB[Embedding backend\nlocal or api]
    end

    subgraph Store ["Vector Store — PostgreSQL + pgvector"]
        VEC[(chunks\nvector + tsvector)]
    end

    subgraph Retrieval
        DENSE[Vector search\ncosine / hnsw]
        SPARSE[Keyword search\nfull-text / gin]
        RRF[Reciprocal Rank Fusion]
        RER[Cross-encoder rerank]
    end

    subgraph Generation
        GEN[Claude\ngrounded + cited answer]
    end

    DOCS --> CHUNK --> EMB --> VEC
    VEC --> DENSE --> RRF
    VEC --> SPARSE --> RRF
    RRF --> RER --> GEN

Stack

Layer	Tool
Vector store	PostgreSQL + pgvector (HNSW index)
Keyword search	Postgres full-text search (GIN index)
Embeddings	sentence-transformers (local) / Voyage AI (optional)
Reranking	cross-encoder (sentence-transformers)
Generation	Claude (Anthropic)
Serving	FastAPI (REST + SSE streaming) + web UI + MCP server

Quickstart

# 1. start the vector store
make db

# 2. install dependencies and set your key
make install
cp .env.example .env      # add ANTHROPIC_API_KEY

# 3. ingest the sample knowledge base (fictional "Nimbus" product docs)
make ingest RESET=1

# 4. start the API and open the web UI
make api
# then open http://localhost:8000 in a browser, or query the API directly:
curl -X POST localhost:8000/ask \
  -H "content-type: application/json" \
  -d '{"question": "How much does the Standard tier cost?"}'

# 5. run the evaluation harness and the retrieval a/b comparison
make eval
make compare

Example response

{
  "answer": "The Standard tier costs 99 US dollars per month. [1]",
  "citations": [
    {"marker": 1, "source": "nimbus_pricing.md", "title": "nimbus_pricing", "score": 8.42}
  ],
  "retrieved": [
    {"chunk_id": 7, "source": "nimbus_pricing.md", "score": 8.42, "preview": "..."}
  ]
}

Evaluation

The harness runs a gold question set (eval/dataset.py) and reports:

Retrieval — precision@k, recall@k, mean reciprocal rank against known relevant sources
Generation — faithfulness (are all claims grounded in the retrieved context) and answer relevance (does it match the reference), both judged by an LLM on a 0-1 scale
Refusal accuracy — whether the system correctly declines to answer a question the knowledge base does not cover

python -m eval.run_eval

Results are printed as a summary table and written to eval/results/latest.json.

A/B comparison of retrieval modes

eval/compare.py runs the same gold set through each retrieval mode and reports the lift, so design decisions are backed by numbers rather than asserted. It uses only deterministic retrieval metrics, so it makes no LLM calls and costs nothing.

python -m eval.compare

On the sample corpus, reranking lifts top-1 retrieval accuracy from 92% to 100%:

mode                           k=1             k=3             k=5
------------------------------------------------------------------
vector only         0.923 /  0.846    1.0 /  0.885    1.0 /  0.885
hybrid (rrf)        0.923 /  0.846    1.0 /  0.885    1.0 /  0.885
hybrid + rerank       1.0 /  0.923    1.0 /  0.923    1.0 /  0.923
                    (recall@k / mrr@k)

The cross-encoder reranker fixes the case where a semantically-close distractor outranked the correct passage in the top position.

Web UI

Start the API with make api and open http://localhost:8000. The frontend streams the answer token by token and renders the cited sources with their rerank scores, so you can see exactly which passages grounded the response.

Adding your own documents

Drop .md, .txt or .pdf files into data/documents/ and re-run make ingest RESET=1. The schema adapts to the embedding dimension of the configured backend automatically.

Using it as an MCP server

The pipeline is exposed as an MCP server so an LLM agent can retrieve grounded facts on demand:

python -m mcp_server.server

Tools: search_knowledge_base(query, top_k) for raw passages and ask_knowledge_base(question) for a grounded, cited answer.

The retrieval, ranking, generation and evaluation core was designed by hand. AI agents assisted with documentation, the web frontend and peripheral scaffolding.

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RaulSaavedraDeLaRiera/Rag-Knowledge-Documentation'

If you have feedback or need assistance with the MCP directory API, please join our Discord server