References arXiv papers for OCR models including DeepSeek-OCR and Qwen-Image-Layered, providing access to research documentation for the integrated OCR technologies.
Uses FastAPI as the backend framework for the WebApp interface, providing RESTful API server with async processing for document OCR operations.
Integrates multiple OCR models hosted on GitHub repositories including GOT-OCR2.0, providing access to state-of-the-art OCR engines and their source code.
Integrates multiple state-of-the-art OCR models from Hugging Face including DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, and Qwen-Image-Layered for comprehensive document processing capabilities.
Integrates PaddlePaddle's PP-OCRv5 OCR system for industrial-grade text extraction with high accuracy, fast inference, and edge deployment capabilities.
Uses Poetry for dependency management and installation of the OCR-MCP server and its required packages.
Leverages PyTorch for GPU-accelerated OCR model inference, enabling high-performance document processing with CUDA support.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@OCR-MCPscan this receipt and extract the total amount with DeepSeek-OCR"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
OCR-MCP
Complete AI OCR webapp and MCP server. A web app for people (drag‑and‑drop OCR, scanner, batch) and a FastMCP 3.1 MCP server for agentic IDEs—Claude, Cursor, Windsurf—so agents can run OCR, preprocessing, and workflows as tools. Same 10+ engines, WIA scanner (Windows), and pipelines; one repo.
Topics: ocr, mcp, fastmcp, document-processing, scanner, wia, pdf, computer-vision, model-context-protocol, llm
What it does
Web app — React (
web_sota/) + FastAPI (backend/app.py): upload or scan, pick engine, get text/PDF/JSON. Ports 10858 (Vite) and 10859 (API). In-app Help (/help) documents the web UI, the MCP server, and OCR backends.MCP server — FastMCP 3.1 stdio: tools for OCR, preprocessing, scanner, workflows. Sampling and agentic workflow (SEP-1577) supported. Same Python env and engines as the web backend; Mistral key for agents is typically
MISTRAL_API_KEYin the client config (web Settings only affect the FastAPI process).
Features: 10+ backends (PaddleOCR-VL-1.5, DeepSeek-OCR-2, Mistral OCR, …) · Auto backend selection · Preprocessing (deskew, enhance, crop) · Layout & table extraction · Quality assessment · WIA scanner · Batch & pipelines · Multi-format export
Docs
Doc | Description |
Install, run MCP, Web UI ( | |
Web FastAPI backend: same venv as | |
Architecture, tools, config, development, packaging | |
Engines, capabilities, hardware (see also AI_MODELS.md) | |
Per-model pip packages, system deps, env/config | |
Sampling, SEP-1577, agentic workflows, prompts | |
Source for | |
🚀 Verified SOTA v12.0 Architecture |
Also: JUSTFILE.md (just recipes) · OCR-MCP_MASTER_PLAN.md (roadmap) · tests/README.md (testing)
Quick start
uv sync
just runWeb UI (recommended): from repo root run web_sota\start.ps1 (PowerShell). It clears ports 10858/10859, runs uv sync, restores PyYAML if needed (see docs/INSTALL.md), starts the FastAPI backend in a new window, starts Vite in another window, then opens http://localhost:10858 in your browser.
Or: just webapp if your justfile wraps the same flow.
If the start script fails, use two terminals from the ocr-mcp repo root:
Terminal 1 (backend):
$env:PYTHONPATH = (Get-Location).Path; uv run uvicorn backend.app:app --host 127.0.0.1 --port 10859Terminal 2 (frontend):
cd web_sota; npm run dev -- --port 10858 --host
Then open http://localhost:10858
Tests: uv sync --extra dev then uv run python -m pytest or python scripts/run_tests.py --suite quick. See tests/README.md.
License
MIT — see LICENSE.