References arXiv papers for OCR models including DeepSeek-OCR and Qwen-Image-Layered, providing access to research documentation for the integrated OCR technologies.
Uses FastAPI as the backend framework for the WebApp interface, providing RESTful API server with async processing for document OCR operations.
Integrates multiple OCR models hosted on GitHub repositories including GOT-OCR2.0, providing access to state-of-the-art OCR engines and their source code.
Integrates multiple state-of-the-art OCR models from Hugging Face including DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, and Qwen-Image-Layered for comprehensive document processing capabilities.
Integrates PaddlePaddle's PP-OCRv5 OCR system for industrial-grade text extraction with high accuracy, fast inference, and edge deployment capabilities.
Uses Poetry for dependency management and installation of the OCR-MCP server and its required packages.
Leverages PyTorch for GPU-accelerated OCR model inference, enabling high-performance document processing with CUDA support.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@OCR-MCPscan this receipt and extract the total amount with DeepSeek-OCR"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
OCR-MCP
Complete AI OCR webapp and MCP server. A web app for people (drag‑and‑drop OCR, scanner, batch) and a FastMCP 3.1 MCP server for agentic IDEs—Claude, Cursor, Windsurf—so agents can run OCR, preprocessing, and workflows as tools. Same 10+ engines, WIA scanner (Windows), and pipelines; one repo.
Topics: ocr, mcp, fastmcp, document-processing, scanner, wia, pdf, computer-vision, model-context-protocol, llm
What it does
Web app — React + FastAPI: upload or scan, pick engine, get text/PDF/JSON. Ports 10858 (frontend) and 10859 (backend).
MCP server — Tools for OCR, preprocessing, scanner, workflows. Sampling and agentic workflow (SEP-1577) supported.
Features: 10+ backends (PaddleOCR-VL-1.5, DeepSeek-OCR-2, Mistral OCR, …) · Auto backend selection · Preprocessing (deskew, enhance, crop) · Layout & table extraction · Quality assessment · WIA scanner · Batch & pipelines · Multi-format export
Docs
Doc | Description |
Install, run MCP, Web UI (ports 10858/10859), client config | |
Architecture, tools, config, development, packaging | |
Engines, capabilities, hardware (see also AI_MODELS.md) | |
Sampling, SEP-1577, agentic workflows, prompts |
Also: JUSTFILE.md (just recipes) · OCR-MCP_MASTER_PLAN.md (roadmap) · tests/README.md (testing)
Quick start
uv sync
just runWeb UI: just webapp → http://localhost:10858
License
MIT — see LICENSE.