MCP RAG
MCP-RAG is a low-latency retrieval-augmented generation service providing intelligent knowledge management through a modular MCP protocol architecture.
Core Capabilities:
Knowledge Management
Add text content manually (facts, definitions, notes, conversation summaries)
Process 25+ document formats (PDF, DOCX, PPTX, XLSX, TXT, HTML, CSV, JSON, XML, ODT, ODP, ODS, RTF, images with OCR, emails) using advanced semantic chunking with structure preservation, automatic denoising, and metadata extraction
Get comprehensive statistics on document counts, file type distribution, processing methods, and structural complexity
Intelligent Retrieval
Query the knowledge base with semantic search (<100ms latency)
Use Raw mode for direct retrieval or Summary mode for LLM-powered intelligent summarization
Apply filters for targeted search by file type, document structure (tables, titles), or processing method
Performance Optimization
Monitor and optimize vector database performance with health diagnostics
Reindex with optimized profiles (small/medium/large/auto)
Manage embedding cache with performance monitoring (hit rates, memory usage) and cache clearing
Technical Features
Multi-provider support (Doubao, Ollama for LLMs; Doubao API and local sentence-transformers for embeddings)
Web interface for configuration management, document management, and API documentation (Swagger UI)
HTTP API and MCP protocol support
Enables local AI model inference for RAG operations, allowing private document processing and knowledge base interactions without sending data to external services.
Provides RAG (Retrieval-Augmented Generation) capabilities using OpenAI's language models and embedding models for intelligent document processing, semantic search, and knowledge base question answering.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP RAGsearch for quarterly sales projections in the uploaded documents"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP-RAG
✨100% 由 AI 编写
面向 AI 客户端的服务优先 RAG 服务,当前以 FastAPI HTTP 服务和 Streamable HTTP MCP 端点为主。
代码当前提供的是一个统一后端壳层:
FastAPI HTTP 服务
Streamable HTTP MCP
共享运行时、配置热更新、鉴权、限流、配额、观测
基于知识库注册表的检索与文档管理
当前能力
文档导入:支持直接添加文本,以及上传
txt、md、pdf、docx检索:向量检索 + 关键词检索融合
问答:
/search、/chat、MCPrag_ask多知识库:支持单知识库和
kb_ids多知识库聚合检索/对话知识库作用域:
public和agent_private租户上下文:
base_collection + user_id + agent_id运行时治理:API key、内存限流、上传/索引配额、request-level retrieval cache
provider 治理:provider budget、熔断、fallback
可观测性:
/health、/ready、/metrics前端:内置单页管理面板
/app
架构
主链路:
HTTP / MCP
-> app_factory.py
-> http_server.py / mcp_server.py
-> context.py
-> service_facade.py
-> services/
- runtime.py
- indexing_service.py
- retrieval_service.py
- chat_service.py
-> knowledge_bases.py
-> core/indexing/
-> retrieval/关键文件:
src/mcp_rag/cli.py: CLI 入口,提供serve和initsrc/mcp_rag/main.py: HTTP 服务启动入口src/mcp_rag/http_server.py: HTTP API、SPA 入口、Streamable HTTP MCP 挂载src/mcp_rag/mcp_server.py: MCP 工具定义与rag_asksrc/mcp_rag/app_factory.py: 统一装配 app context、runtime、guardrailssrc/mcp_rag/knowledge_bases.py: 知识库注册表与默认知识库解析src/mcp_rag/config.py: 配置模型、JSON/SQLite 持久化、热更新
环境要求
Python
>= 3.13uv
安装
安装 CLI:
uv tool install mcp-rag安装后直接运行:
mcp-rag serve在仓库里开发:
uv sync如果需要本地 embedding:
uv sync --extra local-embeddings边界说明:
使用
uv tool install mcp-rag的安装用户不需要 Node.js,也不需要pnpmpnpm只用于维护前端构建,不是服务运行时依赖
启动与初始化
启动服务:
uv run mcp-rag serve初始化数据目录:
uv run mcp-rag init --data-dir ./data默认端口是 8060,服务默认监听 0.0.0.0:8060。
常用入口:
管理面板:
http://127.0.0.1:8060/appAPI 文档:
http://127.0.0.1:8060/docsMCP 端点:
http://127.0.0.1:8060/mcp
兼容入口:
/会重定向到/app/doc会重定向到/docs/documents-page会重定向到/app/documents/config-page会重定向到/app/config
首次启动行为:
如果
./data/config.json不存在,读取配置时会先使用默认值服务启动时会调用
ensure_config_file(),把默认配置写入磁盘数据目录中的
./data/chroma和相关 SQLite 文件会按需创建
前端与静态资源
发布包会把 src/mcp_rag/static/ 一并打进 wheel / sdist。
这意味着:
安装用户运行
uv tool install mcp-rag后可以直接访问/app不需要单独构建前端,也不需要 Node.js
前端维护者需要在发版前生成最新静态资源
前端源码在 frontend/,构建输出到 src/mcp_rag/static/app。
典型流程:
cd frontend
pnpm install
pnpm build知识库模型
当前项目不再只靠裸 collection 组织数据,而是以知识库注册表为主。
知识库特性:
持久化注册表在
knowledge_base_db_path指向的 SQLite 文件中默认会确保存在一个公共知识库
当传入
user_id + agent_id时,会确保存在对应的默认agent_private知识库新建知识库后会分配稳定的内部集合名,例如
kb_<id>
接口层仍然保留 collection 参数,原因是需要兼容旧调用方式。当前实际行为是:
可以显式传
kb_id也可以继续传旧
collection服务会把请求解析到具体知识库和实际集合名
HTTP 接口
系统接口:
GET /healthGET /readyGET /metrics
配置接口:
GET /configPOST /configPOST /config/bulkPOST /config/resetPOST /config/reload
服务商接口:
GET /providers/{provider}/models
知识库接口:
GET /collectionsGET /knowledge-basesPOST /knowledge-bases
文档接口:
POST /add-documentPOST /upload-filesGET /list-documentsDELETE /delete-documentGET /list-filesDELETE /delete-file
检索与问答:
GET /searchPOST /chat
MCP 调试接口:
GET /debug/mcp/toolsPOST /debug/mcp/call
几点需要明确:
/search和/chat支持kb_id/search和/chat也支持kb_ids做多知识库聚合/upload-files使用multipart/form-data/delete-document和/delete-file通过请求体传删除参数
如果启用了安全策略,API key 可以通过以下方式传入:
HTTP Header:
x-api-keyHeader:
Authorization: Bearer <token>查询参数、JSON body 或 form 中的
api_key
MCP
当前主形态是 Streamable HTTP MCP:
{
"mcpServers": {
"rag": {
"url": "http://127.0.0.1:8060/mcp"
}
}
}已实现的 MCP 工具:
rag_ask
rag_ask 主要参数:
querymode:raw或summarycollectionkb_idscopelimitthresholdtenantuser_id/agent_id_user_id/_agent_idapi_keyrequest_idtrace_id
示例:
{
"name": "rag_ask",
"arguments": {
"query": "FastAPI 是什么",
"kb_id": 1,
"mode": "summary",
"limit": 5
}
}配置
默认配置文件:
./data/config.json默认知识库数据库:
./data/knowledge_bases.sqlite3当前配置有一个重要变化:
普通运行配置保存在
config.jsonprovider 相关配置会持久化到 SQLite,而不是继续完整写回
config.json
也就是说,这些字段会存到 SQLite 中的 service_provider_settings:
embedding_providerembedding_fallback_providerprovider_configsllm_providerllm_fallback_providerllm_modelllm_base_urlllm_api_key
其余配置仍然保存在 config.json,例如:
{
"http_port": 8060,
"chroma_persist_directory": "./data/chroma",
"knowledge_base_db_path": "./data/knowledge_bases.sqlite3",
"enable_llm_summary": false,
"security": {
"enabled": false,
"allow_anonymous": true,
"api_keys": [],
"tenant_api_keys": {}
},
"rate_limit": {
"requests_per_window": 120,
"window_seconds": 60,
"burst": 30
},
"quotas": {
"max_upload_files": 20,
"max_upload_bytes": 52428800,
"max_upload_file_bytes": 10485760,
"max_index_documents": 500,
"max_index_chunks": 2000,
"max_index_chars": 500000
},
"cache": {
"enabled": false,
"max_entries": 256,
"ttl_seconds": 300
},
"provider_budget": {
"enabled": true
}
}当前内置 provider 相关能力:
embedding provider 默认值是
zhipuLLM provider 默认值是
doubao内置 provider 配置包含
doubao、zhipu、aliyunqwen/dashscope会规范化为aliyun/providers/{provider}/models支持从兼容 OpenAI 的模型服务拉取模型列表本地 embedding 支持
m3e-small和e5-smallLLM 额外支持
ollama
热更新与运行时刷新
热更新行为:
通过
/config、/config/bulk、/config/reset、/config/reload修改后,运行时会立即刷新请求进入时会通过
reload_if_changed()检测磁盘配置是否变化provider 设置或检索配置变化后,会重建相关运行时依赖并清理检索缓存
Readiness 与 Metrics
/health返回健康摘要、运行时快照和config_revision/ready在未完成 bootstrap 或关键依赖未就绪时返回503/metrics返回按 operation / provider 聚合的观测指标
当前 readiness 快照会包含:
document_processorembedding_modelvector_storehybrid_servicellm_modelretrieval_cacheprovider_budget
测试
运行全量测试:
uv run python -m unittest discover -s tests编译检查:
uv run python -m compileall src当前测试覆盖:
配置默认值、磁盘重载与 provider 配置迁移
HTTP 壳层与 MCP 壳层行为
request context / tenant 解析
request-level retrieval cache
provider budget / fallback
readiness / health / metrics
打包元数据与静态资源
许可证
MIT
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Appeared in Searches
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/kalicyh/mcp-rag'
If you have feedback or need assistance with the MCP directory API, please join our Discord server