How do I use local-document-rag-agent?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@local-document-rag-agent What's the cost of the Premium subscription?" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

local-document-rag-agent

by iupulk

Overview Schema Related Servers Score Discussions

Python

Local

Local Document RAG Agent — with MCP and LangChain

A retrieval-augmented (RAG) AI assistant that answers questions from my own local Word and PDF documents. I point it at a folder of documents, it builds a searchable vector index, and an LLM (Claude) answers questions grounded in those documents — with source citations.

I built it and implemented it two ways: once as an MCP server that Claude Desktop talks to, and once as a standalone LangChain agent that calls the Claude API directly. The two share the same retrieval engine, so the project shows both the raw mechanics of RAG and how a framework abstracts them.

The sample dataset is a fictional legal-information/customer-support platform ("BlueRiver Legal Platforms"): support docs about customer profiles, login/access setup, amendment email alerts, IP-based access, saved searches, subscription plans/pricing, onboarding, permissions, troubleshooting and escalation. All data is synthetic and safe to publish.

What it does

Ingests a folder of .docx and .pdf files (including their tables and hyperlinks).
Splits them into overlapping chunks, embeds each chunk, and stores them in a local vector database.
Answers natural-language questions by retrieving the most relevant chunks and letting Claude write a grounded, cited answer.
Works in two modes:
- Path A — MCP server + Claude Desktop: code exposes a search tool; Claude Desktop decides when to call it and writes the answer.
- Path B — Standalone LangChain agent: Python code runs the agent loop and calls the Claude API directly.

Example

Q: What is the Premium plan monthly price, and what's the total if I add 20 extra users? A: (Path B, multi-tool) Searches the knowledge base for the Premium price and per-user add-on, then uses a calculator tool to compute the total — answer grounded in BR-SUP-011_Subscription_Plans_and_Pricing_Guide.docx.

Related MCP server: smart-search

Architecture

The whole system is one pipeline, split into an offline ingestion stage and an online query stage:

INGESTION (run once / when docs change):
  source docs  ->  extract text  ->  chunk  ->  embed  ->  store in Chroma

QUERY (every question):
  question  ->  embed  ->  retrieve nearest chunks  ->  LLM writes grounded answer

The same embedding model is used for both ingestion and querying, so the question and the chunks live in the same vector space and "closeness" is meaningful.

Two consumption paths over one engine:

                       +-----------------------------+
   Path A (MCP)        |   extract / chunk / embed   |
   Claude Desktop  --> |   Chroma vector index       | <-- Path B (LangChain)
   calls the tool      |   retrieval (search tool)   |     agent loop
                       +-----------------------------+

Tech stack

Layer	Choice
Extraction	`python-docx`, `pypdf`
Embeddings	`sentence-transformers` (`all-MiniLM-L6-v2`, 384-dim)
Vector DB	ChromaDB (local, persistent)
MCP server	FastMCP (stdio transport)
LLM	Claude (via Claude Desktop, or the Anthropic API)
Agent framework (Path B)	LangChain + LangGraph (`langchain-anthropic`)

Project structure

legal_support_agent/
  extract_docs.py        # extraction: Word + PDF, including tables and hyperlinks
  chunk_docs.py          # line-aware chunking
  build_index.py         # embed + store in Chroma (clean rebuild)
  query_test.py          # standalone retrieval test
  kb_server.py           # Path A: MCP server (exposes the search tool)
  langchain_agent/       # Path B: standalone agent
    hello.py             #   minimal Claude API connection test
    tools.py             #   retrieval wrapped as a LangChain tool
    agent.py             #   hand-written agent loop (bind_tools)
    react_agent.py       #   prebuilt multi-tool agent (LangGraph create_react_agent)
  data/chroma/           # the persisted vector index (shared by both paths)
  requirements.txt
  .env

Reusable design: change the folder, reuse the agent

A core design goal was that the code never has to change to switch knowledge domains. Everything domain-specific lives in configuration, so you can repoint the agent at a completely different set of documents (a different subject, a different company) and reuse the whole thing.

To switch domains:

Set SOURCE_DIR to the new folder of documents.
Give it a fresh COLLECTION_NAME.
Update KB_DESCRIPTION to describe the new topics (this is the tool description the LLM reads to decide when to search).
Re-run build_index.py.
(Optional) Set a matching persona/grounding prompt in Claude Desktop's Project instructions, or in the LangChain agent's system prompt.

.env:

SOURCE_DIR=path/to/your/documents
CHROMA_DIR=path/to/data/chroma
COLLECTION_NAME=legal_support
SERVER_NAME=legal-support-kb
KB_DESCRIPTION=ALWAYS use this tool to answer any question about <your domain> ...
# Path B only:
ANTHROPIC_API_KEY=your_key_here

The Python is generic; the domain is data + config.

Setup

python -m venv venv
venv\Scripts\activate          # Windows
pip install -r requirements.txt

requirements.txt:

python-docx
pypdf
sentence-transformers
chromadb
fastmcp
python-dotenv
# Path B:
langchain
langchain-anthropic
langgraph

Create your .env (see above), then build the index:

python build_index.py

Run Path A (MCP server + Claude Desktop)

Register the server in claude_desktop_config.json as a sibling under mcpServers (use the venv's Python, and double every backslash on Windows):

"legal-support-kb": {
  "command": "path\\to\\legal_support_agent\\venv\\Scripts\\python.exe",
  "args": ["path\\to\\legal_support_agent\\kb_server.py"]
}

Restart Claude Desktop, then ask questions inside a Project that has grounding instructions set.

Run Path B (standalone LangChain agent)

python langchain_agent/react_agent.py

When LangChain Is Used — and When It Isn't

The project implements both paths on purpose, because LangChain isn't always needed, and the two paths show where it actually adds value.

Path A (no LangChain). When Claude Desktop is the client, it is the agent — it decides when to call the tool and writes the answer. The code only does retrieval and exposes one MCP tool. There's no agent loop to orchestrate, so a framework would only hide the underlying mechanics. This path uses no LangChain at all.

Path B (LangChain, single tool). Here the code calls the Claude API directly and runs the loop. For a single tool and one step, LangChain is a thin convenience — it auto-generates the tool schema from the function, normalises messages/tool-calls, and abstracts the provider. The agent loop is still hand-written (agent.py), which shows that the framework provides tidier building blocks, not the agent itself.

Path B (LangChain, multi-tool) — where it earns its place. Adding a second tool (a calculator) and asking a question that needs both retrieval and computation, create_react_agent (LangGraph) provides the entire multi-step, multi-tool loop — routing, multiple rounds, error handling — in one line. That's the real value of the framework: as soon as there are multiple tools, multi-step chains, memory or streaming to handle robustly, a hand-written loop keeps growing edge cases, and the prebuilt agent absorbs them.

Rule of thumb: single tool / simple flow → raw SDK or MCP is fine; multiple tools / multi-step / production concerns → a framework like LangChain + LangGraph starts paying for itself.

Note: which tool gets called is decided by the LLM, based on each tool's description — not by LangChain. LangChain is the plumbing that passes the tools to the model, runs whichever it picks, and feeds the result back.

Limitations I hit, and how I improved the pipeline

The most valuable part of this project was that the naive version looked like it worked but quietly gave wrong answers. Most of those failures were in document processing, not in retrieval or the model — and the symptom almost always pointed at the wrong layer. Here's what broke and what I did about it.

Problem	Symptom	Fix
`python-docx` drops table text	Pricing/feature tables never retrieved; model guessed	Read `doc.tables` explicitly and linearise each row (`"IP-based access — Standard: ...; Enterprise: ..."`) so it stays self-contained
`python-docx` drops hyperlink text	The support email (`service@blueriver.com`) was never indexed	Read every `<w:t>` node in the paragraph XML, including those inside `<w:hyperlink>`
Indexing both `.pdf` and `.docx` of each doc	Duplicate chunks wasted the limited retrieval slots	Index one clean representation (the `.docx`, which has clean tables)
`upsert` never deletes removed chunks	Deleted PDFs still showed up in results	Clean rebuild: drop and recreate the collection each index build
Fixed-size chunking cut headings off their content	A heading retrieved without its table/list, so the answer was missing	Line-aware chunking: group whole lines up to a size, never split a line, so a heading stays with what follows it
Tool name collision across MCP servers	"tool not found" — two servers both named `search_knowledge_base`	Give each tool a domain-unique name (`search_legal_support_kb`)
Model invents facts not in the docs	Confident but wrong prices/emails	Grounding instructions: must search first, answer only from retrieved text, say "not found" otherwise, cite sources
Dense retrieval is phrasing-sensitive	"support email" found it, "service email" didn't	Mitigations: raise `top_k`; planned hybrid search + reranking

Takeaway I'd carry into any RAG system: when a grounded answer is missing, check extraction first — confirm the fact is actually in the indexed text before blaming retrieval or the model. And trust the tool-call trace, not the model's claim that it searched.

Is this "enterprise-grade"? Not yet — and that's intentional

The pipeline was hand-built to understand each layer end to end. In order to address the shortcomings identified, the following tools and techniques may be valuable to study and apply when moving toward a production system:

Better extraction: Unstructured / Docling / PyMuPDF instead of hand-rolled python-docx/pypdf.
Structure-aware chunking: chunk on headings/sections so each unit stays coherent.
Reranking: retrieve ~20 candidates, re-score with a cross-encoder (bge-reranker), keep the best few — the single highest-leverage fix for the phrasing-sensitivity misses.
Hybrid search: BM25 keyword + vector, so exact terms (emails, IDs) always surface.
Evaluation: RAGAS / TruLens to turn "is it good enough?" into a measurable number.
LlamaIndex/LangChain ingestion to replace the bespoke pipeline.

Notes

All sample documents are fictional and exist only for testing local RAG/MCP.
Paths in this README are placeholders — set your own in .env and the Claude Desktop config.
Built and tested on Windows with Python 3.13.

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

LlamaCloud MCP Server
RAG Systems Cloud Platforms
run-llama
A
license
-
quality
D
maintenance
A local MCP server that integrates with Claude Desktop, enabling RAG capabilities to provide Claude with up-to-date private information from custom LlamaCloud indices.
Last updated 2026-03-03
225
MIT
smart-search
RAG Systems Vector Databases File Systems
ekmungi
F
license
-
quality
C
maintenance
A local-first MCP server that enables semantic search over PDF and DOCX documents using structure-aware parsing and vector storage. It allows users to query their local knowledge base through Claude Code without cloud dependencies or GPU requirements.
Last updated 2026-04-07
vectorise-mcp
RAG Systems Search Vector Databases
jameslovespancakes
A
license
-
quality
D
maintenance
Local MCP server that indexes folders of documents into a hybrid vector + keyword search index for Claude Desktop, with support for PDFs, Office files, and images via OCR.
Last updated 2026-05-08
MIT
Local Development MCP Server
Developer Tools File Systems Knowledge & Memory
lreb
A
license
-
quality
C
maintenance
A local MCP server for Claude Desktop with persistent task management, file operations, document generation, and PDF indexing.
Last updated 2026-03-19
305
1
MIT

View all related MCP servers

Related MCP Connectors

Darwin RAG
Local-first RAG engine with MCP server for AI agent integration.
mcp
Augments MCP Server - A comprehensive framework documentation provider for Claude Code
nyc-property-intel
MCP server giving Claude AI access to 22+ NYC public-record databases for real estate due diligence

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/iupulk/local-document-rag-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server