mcp-ai-workspace
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-ai-workspaceWhat is the company's policy on remote work?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-ai-workspace
Give an LLM a tool that searches your own documents — and measure whether it retrieves the right ones.
A small, from-scratch implementation of the pattern behind every serious "chat with your data" product: an MCP server that exposes document retrieval as a tool, a vector-RAG pipeline underneath it, and an evals harness that scores retrieval quality. No framework magic — ~200 lines you can read in one sitting.
The problem it solves
An LLM on its own can't see your private documents, and when asked about them it will confidently make things up. The fix is retrieval: look up the relevant real passages first, then answer from them, with citations.
The interesting question is how an agent gets access to that retrieval. The answer here is MCP (the Model Context Protocol) — the emerging standard for giving models tools. This repo wires retrieval up as an MCP tool, so any MCP-capable client (Claude Desktop, an agent framework, a self-hosted chat UI) can call it and answer grounded in your documents instead of guessing.
Related MCP server: Markdown RAG MCP
What it does
Indexes a folder of markdown docs into a local vector store.
Serves a single MCP tool,
search_knowledge_base, that returns the passages most relevant to a question, each with its source file and a similarity score.Ships an evals harness that checks, over a set of known question→document pairs, whether retrieval actually surfaces the right source (
hit@k+MRR).
The bundled corpus is a fictional company handbook (corpus/), so the whole
thing runs end-to-end with no setup beyond pip install.
Architecture
flowchart LR
subgraph Offline["Indexing (ingest.py)"]
D[corpus/*.md] --> C[chunk into passages]
C --> E1[embed]
E1 --> Q[(Qdrant<br/>vector store)]
end
subgraph Online["Serving (server.py)"]
U[MCP client / LLM] -->|calls tool| T[search_knowledge_base]
T --> E2[embed query]
E2 --> Q
Q -->|top-k passages + sources| T
T -->|grounded context| U
end
EV[evals/run_evals.py] -.->|same retrieval path| QThe model on the left never talks to the vector store directly. It calls the tool; the tool does the retrieval. That indirection is the whole point of MCP.
How the agent knows what it can do
An MCP tool is defined by three things, and the model reads all three to decide when and how to call it:
Part | In this repo | What it's for |
name |
| how the model refers to the tool |
description | the tool's docstring in | the model reads this to decide when to call it |
input schema | the typed arguments ( | tells the model how to call it |
That contract — name + description + schema — is the entire interface between the model and your code. Get the description right and the model uses the tool well; that's most of the "prompt engineering" in an agentic system.
Run it
make install # create a venv, install deps
make ingest # build the vector index from corpus/
make evals # score retrieval quality
make serve # run the MCP server (stdio)
# or just:
make demo # ingest + evals, end to endTo use it from an MCP client (e.g. Claude Desktop), register the server with the
example in mcp-client-config.example.json (fix the absolute path), restart the
client, and the model gains a search_knowledge_base tool.
Evals
make evals runs evals/evalset.json — questions whose correct source document
is known — and reports:
hit@k — fraction of questions where the right document is in the top-k
MRR — mean reciprocal rank, which rewards ranking the right doc first
The run exits non-zero if hit@k falls below the threshold, so it can gate CI.
"I built RAG" is cheap; "I measure RAG, and here's the number" is the point.
Stack
Layer | Choice | Why |
Tool protocol | MCP ( | the standard way to expose tools to an LLM |
Vector store | Qdrant (local, on-disk) | a real vector DB API with no service to run |
Embeddings | fastembed ( | ONNX, CPU-only, no torch, no API key |
Every choice is swappable: a stronger embedding model, a hosted Qdrant, or a synthesis step that calls an LLM to write the final answer from the retrieved passages.
What I'd add next
An
answertool that calls an LLM to synthesise a cited answer from the retrieved passages (kept out of the core so the repo runs with no API key).Chunking by semantics rather than character budget.
A reranker, and reporting precision/recall per document, not just hit@k.
Acknowledgements
The architecture here — a self-hosted LLM fronted by MCP tools and a vector-RAG layer — follows the pattern I learned from my DevOps professor, Oriol Rius, whose course stack first showed me how these pieces fit together. This repo is my own from-scratch, minimal re-implementation, written to internalise the concepts and demonstrate them honestly in code I wrote myself.
License
MIT — see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/nickbiird/mcp-ai-workspace'
If you have feedback or need assistance with the MCP directory API, please join our Discord server