PDF阅读器MCP服务器

模型上下文协议 (MCP) 服务器提供从 PDF 文件读取和提取文本的工具，支持本地文件和 URL。

作者

菲利普·范德沃克
电子邮件: philip.vandewalker@gmail.com
GitHub： https://github.com/trafflux

特征

从本地PDF文件中读取文本内容
从 PDF URL 读取文本内容
损坏或无效 PDF 的错误处理
用于访问本地 PDF 的卷安装
自动检测 PDF 编码
标准化 JSON 输出格式

安装

克隆存储库：

git clone https://github.com/trafflux/pdf-reader-mcp.git
cd pdf-reader-mcp

构建 Docker 镜像：

docker build -t mcp/pdf-reader .

用法

运行服务器

要运行可以访问本地 PDF 文件的服务器：

docker run -i --rm -v /path/to/pdfs:/pdfs mcp/pdf-reader

将/path/to/pdfs替换为 PDF 文件目录的实际路径。

如果不使用本地 PDF 文件：

docker run -i --rm mcp/pdf-reader

MCP 配置

添加到您的 MCP 设置配置：

{
  "mcpServers": {
    "pdf-reader": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-v",
        "/path/to/pdfs:/pdfs",
        "mcp/pdf-reader"
      ],
      "disabled": false,
      "autoApprove": []
    }
  }
}

不含本地文件的PDF文件：

{
  "mcpServers": {
    "pdf-reader": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "mcp/pdf-reader"],
      "disabled": false,
      "autoApprove": []
    }
  }
}

可用工具

read_local_pdf
- 目的：从本地PDF文件中读取文本内容
- 输入：
  { "path": "/pdfs/document.pdf" }
- 输出：
  { "success": true, "data": { "text": "Extracted content..." } }
read_pdf_url
- 目的：从 PDF URL 读取文本内容
- 输入：
  { "url": "https://example.com/document.pdf" }
- 输出：
  { "success": true, "data": { "text": "Extracted content..." } }

错误处理

服务器通过清晰的错误消息处理各种错误情况：

无效或损坏的 PDF 文件
缺少文件
失败的 URL 请求
权限问题
网络连接问题

错误响应遵循以下格式：

{
  "success": false,
  "error": "Detailed error message"
}

依赖项

Python 3.11+
PyPDF2：PDF解析和文本提取
请求：用于从 URL 获取 PDF 的 HTTP 客户端
MCP SDK：模型上下文协议实现

项目结构

.
├── Dockerfile          # Container configuration
├── README.md          # This documentation
├── requirements.txt   # Python dependencies
└── src/
    ├── __init__.py    # Package initialization
    └── server.py      # Main server implementation

执照

本文件遵循 Apache 许可证 2.0 版（简称“许可证”）；您不得在未遵守该许可证的情况下使用本文件。您可以访问以下网址获取许可证副本：

http://www.apache.org/licenses/LICENSE-2.0

除非适用法律另有规定或双方以书面形式达成一致，否则根据本许可证分发的软件均按“原样”分发，不附带任何明示或暗示的保证或条件。请参阅许可证，了解本许可证下特定语言的权限和限制规定。

贡献

欢迎贡献代码！欢迎提交 Pull 请求。

接触

如有任何疑问、问题或投稿，请联系 Philip Van de Walker：

电子邮件: philip.vandewalker@gmail.com
GitHub： https://github.com/trafflux

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

提供从 PDF 文件读取和提取文本的工具，支持本地文件和 URL。

Related Resources

Reddit Discussion about this server

Related MCP Servers

MCP Access Server
shin-t-o
A
security
A
license
A
quality
Enables text extraction from web pages and PDFs, and execution of predefined commands, enhancing content processing and automation capabilities.
Last updated -
3
TypeScript
MIT License
Simple Document Processing MCP Server
cablate
A
security
A
license
A
quality
Provides comprehensive document processing, including reading, converting, and manipulating various document formats with advanced text and HTML processing capabilities.
Last updated -
16
231
11
TypeScript
MIT License
MCP PDF Forms
Wildebeest
-
security
F
license
-
quality
A server providing PDF form manipulation tools via MCP's API, allowing users to find PDFs across directories, extract form field information, and visualize form fields in documents.
Last updated -
Python
PDF Extraction MCP Server
xraywu
A
security
F
license
A
quality
An MCP server that provides a tool to extract text content from local PDF files, supporting both standard PDF reading and OCR capabilities with optional page selection.
Last updated -
1
6
Python

View all related MCP servers

PDF Reader MCP Server