Skip to main content
Glama
nickbiird

mcp-ai-workspace

by nickbiird

mcp-ai-workspace

Give an LLM a tool that searches your own documents — and measure whether it retrieves the right ones.

A small, from-scratch implementation of the pattern behind every serious "chat with your data" product: an MCP server that exposes document retrieval as a tool, a vector-RAG pipeline underneath it, and an evals harness that scores retrieval quality. No framework magic — ~200 lines you can read in one sitting.


The problem it solves

An LLM on its own can't see your private documents, and when asked about them it will confidently make things up. The fix is retrieval: look up the relevant real passages first, then answer from them, with citations.

The interesting question is how an agent gets access to that retrieval. The answer here is MCP (the Model Context Protocol) — the emerging standard for giving models tools. This repo wires retrieval up as an MCP tool, so any MCP-capable client (Claude Desktop, an agent framework, a self-hosted chat UI) can call it and answer grounded in your documents instead of guessing.

Related MCP server: Markdown RAG MCP

What it does

  • Indexes a folder of markdown docs into a local vector store.

  • Serves a single MCP tool, search_knowledge_base, that returns the passages most relevant to a question, each with its source file and a similarity score.

  • Ships an evals harness that checks, over a set of known question→document pairs, whether retrieval actually surfaces the right source (hit@k + MRR).

The bundled corpus is a fictional company handbook (corpus/), so the whole thing runs end-to-end with no setup beyond pip install.

Architecture

flowchart LR
    subgraph Offline["Indexing (ingest.py)"]
        D[corpus/*.md] --> C[chunk into passages]
        C --> E1[embed]
        E1 --> Q[(Qdrant<br/>vector store)]
    end

    subgraph Online["Serving (server.py)"]
        U[MCP client / LLM] -->|calls tool| T[search_knowledge_base]
        T --> E2[embed query]
        E2 --> Q
        Q -->|top-k passages + sources| T
        T -->|grounded context| U
    end

    EV[evals/run_evals.py] -.->|same retrieval path| Q

The model on the left never talks to the vector store directly. It calls the tool; the tool does the retrieval. That indirection is the whole point of MCP.

How the agent knows what it can do

An MCP tool is defined by three things, and the model reads all three to decide when and how to call it:

Part

In this repo

What it's for

name

search_knowledge_base

how the model refers to the tool

description

the tool's docstring in server.py

the model reads this to decide when to call it

input schema

the typed arguments (query: str, top_k: int)

tells the model how to call it

That contract — name + description + schema — is the entire interface between the model and your code. Get the description right and the model uses the tool well; that's most of the "prompt engineering" in an agentic system.

Run it

make install      # create a venv, install deps
make ingest       # build the vector index from corpus/
make evals        # score retrieval quality
make serve        # run the MCP server (stdio)
# or just:
make demo         # ingest + evals, end to end

To use it from an MCP client (e.g. Claude Desktop), register the server with the example in mcp-client-config.example.json (fix the absolute path), restart the client, and the model gains a search_knowledge_base tool.

Evals

make evals runs evals/evalset.json — questions whose correct source document is known — and reports:

  • hit@k — fraction of questions where the right document is in the top-k

  • MRR — mean reciprocal rank, which rewards ranking the right doc first

The run exits non-zero if hit@k falls below the threshold, so it can gate CI. "I built RAG" is cheap; "I measure RAG, and here's the number" is the point.

Stack

Layer

Choice

Why

Tool protocol

MCP (mcp SDK, FastMCP)

the standard way to expose tools to an LLM

Vector store

Qdrant (local, on-disk)

a real vector DB API with no service to run

Embeddings

fastembed (BAAI/bge-small-en-v1.5)

ONNX, CPU-only, no torch, no API key

Every choice is swappable: a stronger embedding model, a hosted Qdrant, or a synthesis step that calls an LLM to write the final answer from the retrieved passages.

What I'd add next

  • An answer tool that calls an LLM to synthesise a cited answer from the retrieved passages (kept out of the core so the repo runs with no API key).

  • Chunking by semantics rather than character budget.

  • A reranker, and reporting precision/recall per document, not just hit@k.

Acknowledgements

The architecture here — a self-hosted LLM fronted by MCP tools and a vector-RAG layer — follows the pattern I learned from my DevOps professor, Oriol Rius, whose course stack first showed me how these pieces fit together. This repo is my own from-scratch, minimal re-implementation, written to internalise the concepts and demonstrate them honestly in code I wrote myself.

License

MIT — see LICENSE.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nickbiird/mcp-ai-workspace'

If you have feedback or need assistance with the MCP directory API, please join our Discord server