PDF RAG MCP 服务器

一个强大的文档知识库系统，利用 PDF 处理、向量存储和 MCP（模型上下文协议）为 PDF 文档提供语义搜索功能。该系统允许您通过现代 Web 界面或 MCP 协议上传、处理和查询 PDF 文档，以便与 Cursor 等 AI 工具集成。

特征

PDF 文档上传和处理：上传 PDF 并自动提取、分块和矢量化内容
实时处理状态：文档处理过程中基于 WebSocket 的实时状态更新
语义搜索：对所有已处理文档进行基于向量的语义搜索
MCP 协议支持：使用模型上下文协议与 Cursor 等 AI 工具集成
现代 Web 界面：用于文档管理和查询的 React/Chakra UI 前端
快速依赖管理：使用 uv 进行高效的 Python 依赖管理

系统架构

该系统包括：

FastAPI 后端：处理 API 请求、PDF 处理和矢量存储
React Frontend ：提供用户友好的文档管理界面
矢量数据库：存储用于语义搜索的嵌入
WebSocket 服务器：提供文档处理的实时更新
MCP 服务器：向兼容 MCP 的客户端公开知识库

快速入门

先决条件

Python 3.8 或更高版本
uv - 快速 Python 软件包安装程序和解析器
Git
光标（可选，用于 MCP 集成）

使用 uv 和 run.py 快速安装和启动

克隆存储库：
git clone https://github.com/yourusername/PdfRagMcpServer.git cd PdfRagMcpServer
如果尚未安装 uv，请安装它：
curl -sS https://astral.sh/uv/install.sh | bash
使用 uv 安装依赖项：
uv init . uv venv source .venv/bin/activate uv pip install -r backend/requirements.txt
使用便捷的脚本启动应用程序：
uv run run.py
访问http://localhost:8000 的Web 界面
与游标一起使用

前往“设置”->“Cursor 设置”->“MCP”->“添加新的全局 MCP 服务器”，将以下内容粘贴到 Cursor 的 ~/.cursor/mcp.json 文件中。更多信息，请参阅 Cursor MCP 文档。

{
  "mcpServers": {
    "pdf-rag": {
      "url": "http://localhost:7800/mcp"
    }
  }
}

您也可以将 localhost 更改为您部署服务的主机 IP。将此配置添加到 mcp json 后，您将在 Cursor mcp 配置页面看到 mcp 服务器，将其切换为 on 即可启用服务器：

构建前端（针对开发人员）

如果需要重建前端，您有两个选择：

选项 1：使用提供的脚本（推荐）

# Make the script executable if needed
chmod +x build_frontend.py

# Run the script
./build_frontend.py

该脚本将自动：

安装前端依赖项
构建前端
将构建输出复制到后端的静态目录

选项 2：手动构建过程

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Build the frontend
npm run build

# Create static directory if it doesn't exist
mkdir -p ../backend/static

# Copy build files
cp -r dist/* ../backend/static/

构建前端后，您可以使用 run.py 脚本启动应用程序。

简单的生产设置

对于已经构建静态文件的生产环境：

将预先构建的前端放在backend/static目录中
启动服务器：
cd backend uv pip install -r requirements.txt python -m app.main

开发设置（单独服务）

如果您想要单独运行服务以进行开发：

后端

导航到后端目录：
cd backend
使用 uv 安装依赖项：
uv pip install -r requirements.txt
运行后端服务器：
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

前端

导航到前端目录：
cd frontend
安装依赖项：
npm install
运行开发服务器：
npm run dev

用法

上传文件

访问http://localhost:8000 的Web 界面
点击“上传新 PDF”并选择一个 PDF 文件
系统将处理该文件，并实时显示进度
处理完成后，文档将可供搜索

搜索文档

使用 Web 界面中的搜索功能
或者使用 MCP 协议与 Cursor 集成

MCP 与 Cursor 集成

打开游标
前往“设置”→“AI 和 MCP”
添加自定义 MCP 服务器，URL： http://localhost:8000/mcp/v1
保存设置
现在您可以直接从 Cursor 查询您的 PDF 知识库

故障排除

连接问题

验证端口 8000 未被其他应用程序使用
检查 WebSocket 连接是否正常工作
确保您的浏览器支持 WebSockets

处理问题

检查您的 PDF 是否包含可提取的文本（某些扫描的 PDF 可能不包含）
确保系统有足够的资源（内存和CPU）
检查后端日志以获取详细的错误消息

项目结构

PdfRagMcpServer/
├── backend/               # FastAPI backend
│   ├── app/
│   │   ├── __init__.py
│   │   ├── main.py        # Main FastAPI application
│   │   ├── database.py    # Database models
│   │   ├── pdf_processor.py # PDF processing logic
│   │   ├── vector_store.py # Vector database interface
│   │   └── websocket.py   # WebSocket handling
│   ├── static/            # Static files for the web interface
│   └── requirements.txt   # Backend dependencies
├── frontend/              # React frontend
│   ├── public/
│   ├── src/
│   │   ├── components/    # UI components
│   │   ├── context/       # React context
│   │   ├── pages/         # Page components
│   │   └── App.jsx        # Main application component
│   ├── package.json       # Frontend dependencies
│   └── vite.config.js     # Vite configuration
├── uploads/               # PDF file storage
└── README.md              # This documentation

贡献

欢迎贡献代码！欢迎提交 Pull 请求。

执照

该项目根据 MIT 许可证获得许可 - 有关详细信息，请参阅 LICENSE 文件。

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

文档知识库系统，使用户能够上传 PDF 并通过 Web 界面或模型上下文协议对其进行语义查询，从而允许与 Cursor 等 AI 工具集成。

Related MCP Servers

MCP Development Framework
aigo666
A
security
A
license
A
quality
A powerful Model Context Protocol framework that extends Cursor IDE with tools for web content retrieval, PDF processing, and Word document parsing.
Last updated -
8
9
Python
MIT License
RAG_MCP
mytechnotalent
-
security
-
license
-
quality
A Retrieval-Augmented Generation server that enables semantic PDF search with OCR capabilities, allowing users to query document content through any MCP client and receive intelligent answers.
Last updated -
1
Python
Apache 2.0
DocuFlow
seungmin988
-
security
-
license
-
quality
A TypeScript-based document processing server that supports various document formats (.docx, .pdf, .xlsx) and integrates with Model Context Protocol SDK for efficient document context management.
Last updated -
TypeScript
MIT License
MCP PDF Server
Dev-91
-
security
A
license
-
quality
A Model Context Protocol (MCP) based server that efficiently manages PDF files, allowing AI coding tools like Cursor to read, summarize, and extract information from PDF datasheets to assist embedded development work.
Last updated -
Apache 2.0

View all related MCP servers

PDF RAG MCP Server