MCP-Knowledge-Toolbox
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP-Knowledge-Toolboxsearch for 'hybrid retrieval benchmark results'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP-Knowledge-Toolbox
MCP-Knowledge-Toolbox is a local knowledge-base MCP toolbox built on top of the Project 1 DocuPilot-RAG baseline. Project 2 does not modify Project 1 core code. It packages local document ingest, retrieval, context reading, citation checking, and evaluation-report reading as MCP-callable tools.
This repository is currently an engineering MVP, not a production multi-tenant RAG platform.
Architecture
flowchart LR
A[Local Documents] --> B[Parser]
B --> C[Chunker]
C --> D[SQLite Metadata Store]
C --> E[Vector Index]
C --> F[BM25 Index]
E --> G[Hybrid Retriever]
F --> G
G --> H[Lightweight Reranker]
H --> I[MCP Tools]
I --> J[MCP stdio Client]
I --> K[Citation Verifier]
I --> L[Eval Report Reader]Related MCP server: MinerU Document Explorer
Tech Stack
Python 3.10/3.11 compatible code path
SQLite metadata store
MCP stdio JSON-RPC compatible MVP transport
Optional official MCP Python SDK when installed
sentence-transformers with
BAAI/bge-small-zh-v1.5as the default embedding modelhashing vector fallback when the embedding model is unavailable
PyMuPDF for PDF, python-docx for docx, native readers for Markdown/txt
pytest integration tests
Tools
The server exposes 11 tools:
ingest_file, ingest_folder, search_knowledge, read_chunk_neighbors, summarize_document, query_table, verify_citation, get_eval_report, list_documents, delete_document, server_status.
MCP Compatibility
Current implementation is an MCP stdio JSON-RPC compatible MVP. It can use the official MCP Python SDK if installed; otherwise it uses the built-in stdio JSON-RPC transport.
MCP capability | Status | Notes |
stdio transport | Supported | Used by |
| Supported | Returns protocol version, server info, and tool capability. |
| Supported | Returns all registered tool schemas. |
| Supported | Returns text content and structuredContent. |
notifications/initialized | Accepted | Notification is ignored safely. |
resources | Not implemented | No MCP resources are exposed yet. |
prompts | Not implemented | No MCP prompts are exposed yet. |
sampling | Not implemented | No LLM sampling bridge. |
streaming progress | Not verified | Tool calls are request/response only. |
official SDK mode | Optional | Depends on |
Reproduce From Scratch
From a fresh clone:
pip install -r requirements.txt
python scripts/ingest_demo_docs.py --input data/raw --collection demo
python scripts/build_index.py --collection demo
python scripts/run_mcp_stdio_client_demo.py
pytest testsExpected scale after ingest:
ingested files: 20
success: 20
failed: 0
chunks: 1201
documents: 20
collections: demo
embedding_provider: sentence-transformersEnd-to-End Demo
Generate the full E2E MCP log:
python scripts/run_e2e_demo.py --collection e2e --input data/raw --output docs/e2e_demo_log.mdThe log records:
MCP server startup through stdio subprocess
stdio client
initializetools/listtools/call ingest_foldertools/call list_documentstools/call search_knowledgetools/call read_chunk_neighborstools/call verify_citationfinal answer with citations
See docs/e2e_demo_log.md.
Retrieval Evaluation
Generate 50 QA samples and evaluate four retrieval strategies:
python scripts/run_retrieval_eval.py --collection demoOutputs:
data/eval/demo_qa.jsonldocs/retrieval_eval_report.md
Current measured metrics:
Strategy | Hit@3 | Hit@5 | MRR | Avg Latency (ms) |
bm25 | 0.400 | 0.400 | 0.400 | 193.55 |
vector | 0.340 | 0.340 | 0.340 | 82.97 |
hybrid | 0.460 | 0.460 | 0.460 | 84.71 |
hybrid_rerank | 0.460 | 0.460 | 0.460 | 80.97 |
Hybrid improved over individual retrieval modes on this demo set. Hybrid + rerank did not improve over hybrid; the report explains that the corpus is synthetic and repetitive, so first-stage retrieval already ranks many expected documents at the top.
Final Acceptance Artifacts
docs/e2e_demo_log.mddocs/retrieval_eval_report.mddocs/final_acceptance.mddata/eval/demo_qa.jsonl
Limitations
hashing vector is only a fallback when the sentence-transformers model is unavailable.
verify_citationis a lightweight keyword/similarity check, not an LLM judge.query_tableis Markdown table caption/content matching, not complex table reasoning.rerank is lightweight token-overlap reranking, not a cross-encoder reranker.
summarize_documentuses extractive summarization when no LLM is configured.current storage is local SQLite and local JSON indexes, not a distributed vector database.
current MCP support covers tools over stdio, not resources/prompts/sampling.
this is not a production-grade multi-tenant platform.
Resume Wording
MCP-Knowledge-Toolbox: a local knowledge-base MCP toolbox for Agent workflows. Built an MCP stdio JSON-RPC compatible server exposing 11 tools for document ingest, SQLite metadata management, sentence-transformers vector retrieval, BM25, hybrid retrieval, context reading, citation verification, document deletion sync, and evaluation report reading. Added an end-to-end stdio client demo, 50-sample retrieval evaluation, and 37 pytest tests. Demo acceptance reached 20 documents and 1201 chunks across Markdown, txt, docx, and PDF.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/lhhub10086/MCP-Knowledge-Toolbox'
If you have feedback or need assistance with the MCP directory API, please join our Discord server