PaddleOCR MCP Server

Official

Overview Schema Related Servers Score Discussions

text_image_unwarping.md•11.1 KiB

--- comments: true --- # 文本图像矫正模块使用教程 ## 一、概述文本图像矫正的主要目的是针对图像进行几何变换，以纠正图像中的文档扭曲、倾斜、透视变形等问题，以供后续的文本识别进行更加准确。 ## 二、支持模型列表 > 推理耗时仅包含模型推理耗时，不包含前后处理耗时。 <table> <thead> <tr> <th>模型</th><th>模型下载链接</th> <th>CER </th> <th>GPU推理耗时（ms）<br/>[常规模式 / 高性能模式]</th> <th>CPU推理耗时（ms）<br/>[常规模式 / 高性能模式]</th> <th>模型存储大小（MB）</th> <th>介绍</th> </tr> </thead> <tbody> <tr> <td>UVDoc</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/UVDoc_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UVDoc_pretrained.pdparams">训练模型</a></td> <td>0.179</td> <td>19.05 / 19.05</td> <td>- / 869.82</td> <td>30.3</td> <td>高精度文本图像矫正模型</td> </tr> </tbody> </table> <strong>测试环境说明:</strong> <ul> <li><b>性能测试环境</b> <ul> <li><strong>测试数据集：</strong><a href="https://www3.cs.stonybrook.edu/~cvl/docunet.html">DocUNet benchmark</a>数据集。</li> <li><strong>硬件配置：</strong> <ul> <li>GPU：NVIDIA Tesla T4</li> <li>CPU：Intel Xeon Gold 6271C @ 2.60GHz</li> </ul> </li> <li><strong>软件环境：</strong> <ul> <li>Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9 / TensorRT 8.6.1.6</li> <li>paddlepaddle 3.0.0 / paddleocr 3.0.3</li> </ul> </li> </ul> </li> <li><b>推理模式说明</b></li> </ul> <table border="1"> <thead> <tr> <th>模式</th> <th>GPU配置</th> <th>CPU配置</th> <th>加速技术组合</th> </tr> </thead> <tbody> <tr> <td>常规模式</td> <td>FP32精度 / 无TRT加速</td> <td>FP32精度 / 8线程</td> <td>PaddleInference</td> </tr> <tr> <td>高性能模式</td> <td>选择先验精度类型和加速策略的最优组合</td> <td>FP32精度 / 8线程</td> <td>选择先验最优后端（Paddle/OpenVINO/TRT等）</td> </tr> </tbody> </table> ## 三、快速开始 > ❗ 在快速开始前，请先安装 PaddleOCR 的 wheel 包，详细请参考 [安装教程](../installation.md)。使用一行命令即可快速体验： ```bash paddleocr text_image_unwarping -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/doc_test.jpg ``` <b>注：</b>PaddleOCR 官方模型默认从 HuggingFace 获取，如运行环境访问 HuggingFace 不便，可通过环境变量修改模型源为 BOS：`PADDLE_PDX_MODEL_SOURCE="BOS"`，未来将支持更多主流模型源；您也可以将图像矫正的模块中的模型推理集成到您的项目中。运行以下代码前，请您下载[示例图片](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/doc_test.jpg)到本地。 ```python from paddleocr import TextImageUnwarping model = TextImageUnwarping(model_name="UVDoc") output = model.predict("doc_test.jpg", batch_size=1) for res in output: res.print() res.save_to_img(save_path="./output/") res.save_to_json(save_path="./output/res.json") ``` 运行后，得到的结果为： ```bash {'res': {'input_path': 'doc_test.jpg', 'page_index': None, 'doctr_img': '...'}} ``` 运行结果参数含义如下： - `input_path`：表示输入待矫正图像的路径 - `doctr_img`：表示矫正后的图像结果，由于数据过多不便于直接print，所以此处用`...`替换，可以通过`res.save_to_img()`将预测结果保存为图片，通过`res.save_to_json()`将预测结果保存为json文件。可视化图片如下： <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/image_unwarp/doc_test_res.jpg"> 相关方法、参数等说明如下： * `TextImageUnwarping`实例化图像矫正模型（此处以`UVDoc`为例），具体说明如下： <table> <thead> <tr> <th>参数</th> <th>参数说明</th> <th>参数类型</th> <th>默认值</th> </tr> </thead> <tbody> <tr> <td><code>model_name</code></td> <td>模型名称。如果设置为<code>None</code>，则使用<code>UVDoc</code>。</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>model_dir</code></td> <td>模型存储路径。</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>device</code></td> <td>用于推理的设备。<br/> <b>例如：</b><code>"cpu"</code>、<code>"gpu"</code>、<code>"npu"</code>、<code>"gpu:0"</code>、<code>"gpu:0,1"</code>。<br/> 如指定多个设备，将进行并行推理。<br/> 默认情况下，优先使用 GPU 0；若不可用则使用 CPU。 </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>enable_hpi</code></td> <td>是否启用高性能推理。</td> <td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>use_tensorrt</code></td> <td>是否启用 Paddle Inference 的 TensorRT 子图引擎。如果模型不支持通过 TensorRT 加速，即使设置了此标志，也不会使用加速。<br/> 对于 CUDA 11.8 版本的飞桨，兼容的 TensorRT 版本为 8.x（x>=6），建议安装 TensorRT 8.6.1.6。<br/> </td> <td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>precision</code></td> <td>当使用 Paddle Inference 的 TensorRT 子图引擎时设置的计算精度。<br/><b>可选项：</b><code>"fp32"</code>、<code>"fp16"</code>。</td> <td><code>str</code></td> <td><code>"fp32"</code></td> </tr> <tr> <td><code>enable_mkldnn</code></td> <td> 是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速，即使设置了此标志，也不会使用加速。<br/> </td> <td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>mkldnn_cache_capacity</code></td> <td> MKL-DNN 缓存容量。 </td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>cpu_threads</code></td> <td>在 CPU 上推理时使用的线程数量。</td> <td><code>int</code></td> <td><code>10</code></td> </tr> </tbody> </table> * 调用图像矫正模型的 `predict()` 方法进行推理预测，该方法会返回一个结果列表。另外，本模块还提供了 `predict_iter()` 方法。两者在参数接受和结果返回方面是完全一致的，区别在于 `predict_iter()` 返回的是一个 `generator`，能够逐步处理和获取预测结果，适合处理大型数据集或希望节省内存的场景。可以根据实际需求选择使用这两种方法中的任意一种。`predict()` 方法参数有 `input` 和 `batch_size`，具体说明如下： <table> <thead> <tr> <th>参数</th> <th>参数说明</th> <th>参数类型</th> <th>默认值</th> </tr> </thead> <tr> <td><code>input</code></td> <td>待预测数据，支持多种输入类型，必填。 <ul> <li><b>Python Var</b>：如 <code>numpy.ndarray</code> 表示的图像数据</li> <li><b>str</b>：如图像文件或者PDF文件的本地路径：<code>/root/data/img.jpg</code>；<b>如URL链接</b>，如图像文件或PDF文件的网络URL：<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/img_rot180_demo.jpg">示例</a>；<b>如本地目录</b>，该目录下需包含待预测图像，如本地路径：<code>/root/data/</code>(当前不支持目录中包含PDF文件的预测，PDF文件需要指定到具体文件路径)</li> <li><b>list</b>：列表元素需为上述类型数据，如<code>[numpy.ndarray, numpy.ndarray]</code>，<code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>，<code>["/root/data1", "/root/data2"]</code></li> </ul> </td> <td><code>Python Var|str|list</code></td> <td></td> </tr> <tr> <td><code>batch_size</code></td> <td>批大小，可设置为任意正整数。</td> <td><code>int</code></td> <td>1</td> </tr> </table> * 对预测结果进行处理，每个样本的预测结果均为对应的Result对象，且支持打印、保存为图片、保存为`json`文件的操作: <table> <thead> <tr> <th>方法</th> <th>方法说明</th> <th>参数</th> <th>参数类型</th> <th>参数说明</th> <th>默认值</th> </tr> </thead> <tr> <td rowspan = "3"><code>print()</code></td> <td rowspan = "3">打印结果到终端</td> <td><code>format_json</code></td> <td><code>bool</code></td> <td>是否对输出内容进行使用 <code>JSON</code> 缩进格式化</td> <td><code>True</code></td> </tr> <tr> <td><code>indent</code></td> <td><code>int</code></td> <td>指定缩进级别，以美化输出的 <code>JSON</code> 数据，使其更具可读性，仅当 <code>format_json</code> 为 <code>True</code> 时有效</td> <td>4</td> </tr> <tr> <td><code>ensure_ascii</code></td> <td><code>bool</code></td> <td>控制是否将非 <code>ASCII</code> 字符转义为 <code>Unicode</code>。设置为 <code>True</code> 时，所有非 <code>ASCII</code> 字符将被转义；<code>False</code> 则保留原始字符，仅当<code>format_json</code>为<code>True</code>时有效</td> <td><code>False</code></td> </tr> <tr> <td rowspan = "3"><code>save_to_json()</code></td> <td rowspan = "3">将结果保存为json格式的文件</td> <td><code>save_path</code></td> <td><code>str</code></td> <td>保存的文件路径，当为目录时，保存文件命名与输入文件类型命名一致</td> <td>无</td> </tr> <tr> <td><code>indent</code></td> <td><code>int</code></td> <td>指定缩进级别，以美化输出的 <code>JSON</code> 数据，使其更具可读性，仅当 <code>format_json</code> 为 <code>True</code> 时有效</td> <td>4</td> </tr> <tr> <td><code>ensure_ascii</code></td> <td><code>bool</code></td> <td>控制是否将非 <code>ASCII</code> 字符转义为 <code>Unicode</code>。设置为 <code>True</code> 时，所有非 <code>ASCII</code> 字符将被转义；<code>False</code> 则保留原始字符，仅当<code>format_json</code>为<code>True</code>时有效</td> <td><code>False</code></td> </tr> <tr> <td><code>save_to_img()</code></td> <td>将结果保存为图像格式的文件</td> <td><code>save_path</code></td> <td><code>str</code></td> <td>保存的文件路径，当为目录时，保存文件命名与输入文件类型命名一致</td> <td>无</td> </tr> </table> * 此外，也支持通过属性获取带结果的可视化图像和预测结果，具体如下： <table> <thead> <tr> <th>属性</th> <th>属性说明</th> </tr> </thead> <tr> <td rowspan = "1"><code>json</code></td> <td rowspan = "1">获取预测的<code>json</code>格式的结果</td> </tr> <tr> <td rowspan = "1"><code>img</code></td> <td rowspan = "1">获取格式为<code>dict</code>的可视化图像</td> </tr> </table> ## 四、二次开发当前模块暂时不支持微调训练，仅支持推理集成。关于该模块的微调训练，计划在未来支持。 ## 五、FAQ

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PaddlePaddle/PaddleOCR'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

text_image_unwarping.md•11.1 KiB