local-document-rag-agent
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@local-document-rag-agentWhat's the cost of the Premium subscription?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Local Document RAG Agent — with MCP and LangChain
A retrieval-augmented (RAG) AI assistant that answers questions from my own local Word and PDF documents. I point it at a folder of documents, it builds a searchable vector index, and an LLM (Claude) answers questions grounded in those documents — with source citations.
I built it and implemented it two ways: once as an MCP server that Claude Desktop talks to, and once as a standalone LangChain agent that calls the Claude API directly. The two share the same retrieval engine, so the project shows both the raw mechanics of RAG and how a framework abstracts them.
The sample dataset is a fictional legal-information/customer-support platform ("BlueRiver Legal Platforms"): support docs about customer profiles, login/access setup, amendment email alerts, IP-based access, saved searches, subscription plans/pricing, onboarding, permissions, troubleshooting and escalation. All data is synthetic and safe to publish.
What it does
Ingests a folder of
.docxand.pdffiles (including their tables and hyperlinks).Splits them into overlapping chunks, embeds each chunk, and stores them in a local vector database.
Answers natural-language questions by retrieving the most relevant chunks and letting Claude write a grounded, cited answer.
Works in two modes:
Path A — MCP server + Claude Desktop: code exposes a search tool; Claude Desktop decides when to call it and writes the answer.
Path B — Standalone LangChain agent: Python code runs the agent loop and calls the Claude API directly.
Example
Q: What is the Premium plan monthly price, and what's the total if I add 20 extra users? A: (Path B, multi-tool) Searches the knowledge base for the Premium price and per-user add-on, then uses a calculator tool to compute the total — answer grounded in
BR-SUP-011_Subscription_Plans_and_Pricing_Guide.docx.
Related MCP server: MCP Data Server
Architecture
The whole system is one pipeline, split into an offline ingestion stage and an online query stage:
INGESTION (run once / when docs change):
source docs -> extract text -> chunk -> embed -> store in Chroma
QUERY (every question):
question -> embed -> retrieve nearest chunks -> LLM writes grounded answerThe same embedding model is used for both ingestion and querying, so the question and the chunks live in the same vector space and "closeness" is meaningful.
Two consumption paths over one engine:
+-----------------------------+
Path A (MCP) | extract / chunk / embed |
Claude Desktop --> | Chroma vector index | <-- Path B (LangChain)
calls the tool | retrieval (search tool) | agent loop
+-----------------------------+Tech stack
Layer | Choice |
Extraction |
|
Embeddings |
|
Vector DB | ChromaDB (local, persistent) |
MCP server | FastMCP (stdio transport) |
LLM | Claude (via Claude Desktop, or the Anthropic API) |
Agent framework (Path B) | LangChain + LangGraph ( |
Project structure
legal_support_agent/
extract_docs.py # extraction: Word + PDF, including tables and hyperlinks
chunk_docs.py # line-aware chunking
build_index.py # embed + store in Chroma (clean rebuild)
query_test.py # standalone retrieval test
kb_server.py # Path A: MCP server (exposes the search tool)
langchain_agent/ # Path B: standalone agent
hello.py # minimal Claude API connection test
tools.py # retrieval wrapped as a LangChain tool
agent.py # hand-written agent loop (bind_tools)
react_agent.py # prebuilt multi-tool agent (LangGraph create_react_agent)
data/chroma/ # the persisted vector index (shared by both paths)
requirements.txt
.envReusable design: change the folder, reuse the agent
A core design goal was that the code never has to change to switch knowledge domains. Everything domain-specific lives in configuration, so you can repoint the agent at a completely different set of documents (a different subject, a different company) and reuse the whole thing.
To switch domains:
Set
SOURCE_DIRto the new folder of documents.Give it a fresh
COLLECTION_NAME.Update
KB_DESCRIPTIONto describe the new topics (this is the tool description the LLM reads to decide when to search).Re-run
build_index.py.(Optional) Set a matching persona/grounding prompt in Claude Desktop's Project instructions, or in the LangChain agent's system prompt.
.env:
SOURCE_DIR=path/to/your/documents
CHROMA_DIR=path/to/data/chroma
COLLECTION_NAME=legal_support
SERVER_NAME=legal-support-kb
KB_DESCRIPTION=ALWAYS use this tool to answer any question about <your domain> ...
# Path B only:
ANTHROPIC_API_KEY=your_key_hereThe Python is generic; the domain is data + config.
Setup
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txtrequirements.txt:
python-docx
pypdf
sentence-transformers
chromadb
fastmcp
python-dotenv
# Path B:
langchain
langchain-anthropic
langgraphCreate your .env (see above), then build the index:
python build_index.pyRun Path A (MCP server + Claude Desktop)
Register the server in claude_desktop_config.json as a sibling under mcpServers (use the venv's Python, and double every backslash on Windows):
"legal-support-kb": {
"command": "path\\to\\legal_support_agent\\venv\\Scripts\\python.exe",
"args": ["path\\to\\legal_support_agent\\kb_server.py"]
}Restart Claude Desktop, then ask questions inside a Project that has grounding instructions set.
Run Path B (standalone LangChain agent)
python langchain_agent/react_agent.pyWhen LangChain Is Used — and When It Isn't
The project implements both paths on purpose, because LangChain isn't always needed, and the two paths show where it actually adds value.
Path A (no LangChain). When Claude Desktop is the client, it is the agent — it decides when to call the tool and writes the answer. The code only does retrieval and exposes one MCP tool. There's no agent loop to orchestrate, so a framework would only hide the underlying mechanics. This path uses no LangChain at all.
Path B (LangChain, single tool). Here the code calls the Claude API directly and runs the loop. For a single tool and one step, LangChain is a thin convenience — it auto-generates the tool schema from the function, normalises messages/tool-calls, and abstracts the provider. The agent loop is still hand-written (agent.py), which shows that the framework provides tidier building blocks, not the agent itself.
Path B (LangChain, multi-tool) — where it earns its place. Adding a second tool (a calculator) and asking a question that needs both retrieval and computation, create_react_agent (LangGraph) provides the entire multi-step, multi-tool loop — routing, multiple rounds, error handling — in one line. That's the real value of the framework: as soon as there are multiple tools, multi-step chains, memory or streaming to handle robustly, a hand-written loop keeps growing edge cases, and the prebuilt agent absorbs them.
Rule of thumb: single tool / simple flow → raw SDK or MCP is fine; multiple tools / multi-step / production concerns → a framework like LangChain + LangGraph starts paying for itself.
Note: which tool gets called is decided by the LLM, based on each tool's description — not by LangChain. LangChain is the plumbing that passes the tools to the model, runs whichever it picks, and feeds the result back.
Limitations I hit, and how I improved the pipeline
The most valuable part of this project was that the naive version looked like it worked but quietly gave wrong answers. Most of those failures were in document processing, not in retrieval or the model — and the symptom almost always pointed at the wrong layer. Here's what broke and what I did about it.
Problem | Symptom | Fix |
| Pricing/feature tables never retrieved; model guessed | Read |
| The support email ( | Read every |
Indexing both | Duplicate chunks wasted the limited retrieval slots | Index one clean representation (the |
| Deleted PDFs still showed up in results | Clean rebuild: drop and recreate the collection each index build |
Fixed-size chunking cut headings off their content | A heading retrieved without its table/list, so the answer was missing | Line-aware chunking: group whole lines up to a size, never split a line, so a heading stays with what follows it |
Tool name collision across MCP servers | "tool not found" — two servers both named | Give each tool a domain-unique name ( |
Model invents facts not in the docs | Confident but wrong prices/emails | Grounding instructions: must search first, answer only from retrieved text, say "not found" otherwise, cite sources |
Dense retrieval is phrasing-sensitive | "support email" found it, "service email" didn't | Mitigations: raise |
Takeaway I'd carry into any RAG system: when a grounded answer is missing, check extraction first — confirm the fact is actually in the indexed text before blaming retrieval or the model. And trust the tool-call trace, not the model's claim that it searched.
Is this "enterprise-grade"? Not yet — and that's intentional
The pipeline was hand-built to understand each layer end to end. In order to address the shortcomings identified, the following tools and techniques may be valuable to study and apply when moving toward a production system:
Better extraction: Unstructured / Docling / PyMuPDF instead of hand-rolled
python-docx/pypdf.Structure-aware chunking: chunk on headings/sections so each unit stays coherent.
Reranking: retrieve ~20 candidates, re-score with a cross-encoder (
bge-reranker), keep the best few — the single highest-leverage fix for the phrasing-sensitivity misses.Hybrid search: BM25 keyword + vector, so exact terms (emails, IDs) always surface.
Evaluation: RAGAS / TruLens to turn "is it good enough?" into a measurable number.
LlamaIndex/LangChain ingestion to replace the bespoke pipeline.
Notes
All sample documents are fictional and exist only for testing local RAG/MCP.
Paths in this README are placeholders — set your own in
.envand the Claude Desktop config.Built and tested on Windows with Python 3.13.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/iupulk/local-document-rag-agent'
If you have feedback or need assistance with the MCP directory API, please join our Discord server