Skip to main content
Glama

PaddleOCR MCP Server

📖 中文  |  🌐 English

Python License PaddleOCR MCP PRs Welcome GitHub Stars


🏮 中文

PaddleOCR MCP Server

基于 PaddleOCR模型上下文协议 (MCP) 服务器,为 AI Agent 提供本地图片文字识别能力。

🚀 核心优势

特性

PaddleOCR MCP

云端 OCR API

费用

💰 免费,零成本

按次收费

隐私

🔒 100% 本地,数据不出设备

数据上传云端

网络

📡 离线可用,无需联网

依赖网络

延迟

⚡ ~1-3 秒

~0.5-3 秒 + 网络耗时

语言

🌍 支持 109 种语言

视服务商而定

模型

🤖 支持 DeepSeek-v4 等任意大模型搭配

厂商锁定

开源

📂 完全开源,可自由修改

闭源黑盒

部署

🖥️ 低成本 CPU 即可运行

需高成本服务器

💡 搭配 DeepSeek-v4 的低成本方案:PaddleOCR 负责提取图片文字,DeepSeek-v4 等大模型负责理解分析——本地运行,无需 GPU,开发成本趋近于零。

Related MCP server: Mistral OCR MCP Server

🤖 自动识别机制

即使你的模型不支持看图(纯文本模型),AI Agent 也能自动调用 ocr_image MCP 工具识别图片文字后返回结果。

举个例子:

  1. 用户发送一张截图

  2. AI Agent 发现该模型不支持直接看图

  3. 自动调用 ocr_image 工具提取图片中的文字

  4. 基于提取的文字内容进行分析和回答

这意味着:

  • 无需切换模型 — 纯文本模型也能"看懂"图片

  • 零额外配置 — MCP 挂载后自动生效

  • 透明使用 — 用户只需发图,Agent 自动处理

✨ 功能特性

  • 纯文字识别ocr_image(image_path) 一键提取图片中全部文字

  • 中英混合 — 原生支持中文 + 英文混合识别

  • 109 种语言 — 覆盖全球主要语言

  • 本地离线 — 无需 API Key,首次下载模型后完全离线

  • 轻量快速 — PP-OCRv6 模型仅 34.5M 参数,CPU 流畅运行

  • MCP 标准 — 兼容所有 MCP 客户端

🎯 适用场景

  • 🤖 搭配 DeepSeek-v4、GLM、Qwen 等大模型做图片内容理解

  • 📄 文档扫描件、截图文字提取

  • 🏭 工业场景:车牌、标签、单据识别

  • 🌐 多语言文档处理

📦 快速开始

环境要求

  • Python 3.8~3.12

  • Node.js 18+(仅 Claude Code 等客户端需要,服务端本身不需要)

安装

# 安装 PaddleOCR 和 PaddlePaddle
pip install paddlepaddle==3.2.0 paddleocr

# 克隆本仓库
git clone https://github.com/YuanJinke/paddleocr_mcp.git
cd paddleocr_mcp

⚠️ Windows 用户注意:PaddlePaddle 3.3.1 有 oneDNN 兼容性问题,请务必安装 3.2.0 版本。

配置到你的 MCP 客户端

claude mcp add -s user paddleocr -- python /路径/paddleocr_mcp.py

或添加到 ~/.claude.json

{
  "mcpServers": {
    "paddleocr": {
      "type": "stdio",
      "command": "python",
      "args": ["/路径/paddleocr_mcp.py"]
    }
  }
}

添加到 ~/.hermes/config.yaml

mcp_servers:
  paddleocr:
    command: python
    args: ["/路径/paddleocr_mcp.py"]
    timeout: 120
    connect_timeout: 120

然后重启 Hermes。

{
  "mcpServers": {
    "paddleocr": {
      "type": "stdio",
      "command": "python",
      "args": ["/路径/paddleocr_mcp.py"]
    }
  }
}

使用示例

向你的 AI Agent 发送:

调用 ocr_image 识别 /图片路径/screenshot.png,然后告诉我图片里写了什么。

返回结果示例:

{"text": "Hello World\n你好世界", "lines": ["Hello World", "你好世界"]}

🔧 本地测试

# 启动服务
python paddleocr_mcp.py

# 另开终端发送测试请求
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"ocr_image","arguments":{"image_path":"/路径/test.png"}},"id":1}' | python paddleocr_mcp.py

⚠️ 已知问题

  • Windows + oneDNN:PaddlePaddle 3.3.1 有 oneDNN 属性转换 bug。使用 3.2.0 或设置 FLAGS_use_onednn=0

  • 首次运行:自动下载模型权重(约 50 MB),后续使用缓存

📄 许可证

MIT — 随意使用,随意修改。


🌐 English

PaddleOCR MCP Server

A Model Context Protocol (MCP) server providing local OCR capabilities via PaddleOCR. Works with any MCP-compatible AI agent.

🚀 Key Advantages

Feature

PaddleOCR MCP

Cloud OCR APIs

Cost

💰 Free

Pay per request

Privacy

🔒 100% local, data never leaves

Data sent to cloud

Network

📡 Offline, no internet needed

Internet required

Latency

⚡ ~1-3s

~0.5-3s + network

Languages

🌍 109 languages supported

Varies

Model

🤖 Works with DeepSeek-v4, any LLM

Vendor locked

Open Source

📂 Fully open source

Closed source

Deployment

🖥️ Runs on CPU, low cost

Requires servers

💡 Low-cost solution with DeepSeek-v4: PaddleOCR extracts text from images, then pairs with any LLM (DeepSeek-v4, GPT, Claude, GLM, etc.) for understanding — all running locally, zero GPU required, near-zero development cost.

🤖 Auto-OCR Mechanism

Even if your model doesn't support images (text-only model), the AI Agent will automatically call the ocr_image MCP tool to read the image and return the text content.

How it works:

  1. User sends a screenshot

  2. AI Agent detects the model can't process images natively

  3. Auto-calls ocr_image to extract text from the image

  4. Analyzes and responds based on the extracted text

This means:

  • No model switching required — text-only models can "see" images

  • Zero extra config — works automatically once MCP is mounted

  • Transparent usage — just send the image, the agent handles the rest

✨ Features

  • Pure OCRocr_image(image_path) extracts all text from images

  • Chinese + English — mixed-language recognition out of the box

  • 109 languages — global language coverage

  • Local & offline — no API keys, no network calls after initial model download

  • Lightweight — PP-OCRv6 (34.5M params) runs on CPU

  • MCP standard — compatible with all MCP clients

🎯 Use Cases

  • 🤖 DeepSeek-v4, GLM, Qwen image understanding pipeline

  • 📄 Document scanning & screenshot OCR

  • 🏭 Industrial: license plates, labels, receipts

  • 🌐 Multilingual document processing

📦 Quick Start

Prerequisites

  • Python 3.8–3.12

  • Node.js 18+ (for client-side tools only, not needed by the server itself)

Install

# Install PaddleOCR and PaddlePaddle
pip install paddlepaddle==3.2.0 paddleocr

# Clone this repo
git clone https://github.com/YuanJinke/paddleocr_mcp.git
cd paddleocr_mcp

⚠️ Windows users: PaddlePaddle 3.3.1 has a known oneDNN bug. Pin to 3.2.0 as shown above.

Configure in your MCP client

claude mcp add -s user paddleocr -- python /path/to/paddleocr_mcp.py

Or add to ~/.claude.json:

{
  "mcpServers": {
    "paddleocr": {
      "type": "stdio",
      "command": "python",
      "args": ["/path/to/paddleocr_mcp.py"]
    }
  }
}

Add to ~/.hermes/config.yaml:

mcp_servers:
  paddleocr:
    command: python
    args: ["/path/to/paddleocr_mcp.py"]
    timeout: 120
    connect_timeout: 120

Then restart Hermes.

{
  "mcpServers": {
    "paddleocr": {
      "type": "stdio",
      "command": "python",
      "args": ["/path/to/paddleocr_mcp.py"]
    }
  }
}

Usage Example

Send to your AI agent:

Call ocr_image on `/path/to/screenshot.png" and tell me what it says.

Example result:

{"text": "Hello World\n你好世界", "lines": ["Hello World", "你好世界"]}

🔧 Local Test

# Start server
python paddleocr_mcp.py

# Send test request
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"ocr_image","arguments":{"image_path":"/path/to/test.png"}},"id":1}' | python paddleocr_mcp.py

⚠️ Known Issues

  • Windows + oneDNN: PaddlePaddle 3.3.1 has a oneDNN attribute conversion bug. Use 3.2.0 or set FLAGS_use_onednn=0.

  • First run: Model weights (~50 MB) auto-downloaded on first use. Subsequent runs use cache.

📄 License

MIT — free to use, modify, and distribute.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/YuanJinke/paddleocr_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server