OCR-MCP
References arXiv papers for OCR models including DeepSeek-OCR and Qwen-Image-Layered, providing access to research documentation for the integrated OCR technologies.
Uses FastAPI as the backend framework for the WebApp interface, providing RESTful API server with async processing for document OCR operations.
Integrates multiple OCR models hosted on GitHub repositories including GOT-OCR2.0, providing access to state-of-the-art OCR engines and their source code.
Integrates multiple state-of-the-art OCR models from Hugging Face including DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, and Qwen-Image-Layered for comprehensive document processing capabilities.
Integrates PaddlePaddle's PP-OCRv5 OCR system for industrial-grade text extraction with high accuracy, fast inference, and edge deployment capabilities.
Uses Poetry for dependency management and installation of the OCR-MCP server and its required packages.
Leverages PyTorch for GPU-accelerated OCR model inference, enabling high-performance document processing with CUDA support.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@OCR-MCPscan this receipt and extract the total amount with DeepSeek-OCR"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
OCR-MCP
Complete AI OCR webapp and MCP server. A web app for people (draganddrop OCR, scanner, batch) and a FastMCP 3.1 MCP server for agentic IDEsClaude, Cursor, Windsurfso agents can run OCR, preprocessing, and workflows as tools. Same 10+ engines, WIA scanner (Windows), and pipelines; one repo.
Topics: ocr, mcp, fastmcp, document-processing, scanner, wia, pdf, computer-vision, model-context-protocol, llm
What it does
Web app React (
web_sota/) + FastAPI (backend/app.py): upload or scan, pick engine, get text/PDF/JSON. Ports 10858 (Vite) and 10859 (API). In-app Help (/help) documents the web UI, the MCP server, and OCR backends.MCP server FastMCP 3.1 stdio: tools for OCR, preprocessing, scanner, workflows. Sampling defaults to local Ollama (
http://127.0.0.1:11434/v1, modelllama3.2) no cloud API key. SetOCR_SAMPLING_USE_CLIENT_LLM=1to use the host IDEs LLM instead. Mistral OCR usesMISTRAL_API_KEYwhen you call that backend. See AI_FEATURES.md.
Features: 10+ backends (PaddleOCR-VL-1.5, DeepSeek-OCR-2, Mistral OCR, ) Auto backend selection Preprocessing (deskew, enhance, crop) Layout & table extraction Quality assessment WIA scanner Batch & pipelines Multi-format export
Docs
Doc | Description |
Install, run MCP, Web UI ( | |
Web FastAPI backend: same venv as | |
Architecture, tools, config, development, packaging | |
Engines, capabilities, hardware (see also AI_MODELS.md) | |
Per-model pip packages, system deps, env/config | |
Portmanteau tools, operation status, corpus v0 | |
Sampling, SEP-1577, agentic workflows, prompts | |
Source for | |
Verified SOTA v12.0 Architecture |
Also: JUSTFILE.md (just recipes) OCR-MCP_MASTER_PLAN.md (roadmap) tests/README.md (testing)
Install
Repository: github.com/sandraschi/ocr-mcp. Clone first uv sync needs a project on disk:
git clone https://github.com/sandraschi/ocr-mcp.git
Set-Location ocr-mcp
uv syncQuick start
uv sync
just runWeb UI (recommended): from repo root run web_sota\start.ps1 (PowerShell). It clears ports 10858/10859, runs uv sync, restores PyYAML if needed (see docs/INSTALL.md), starts the FastAPI backend in a new window, starts Vite in another window, then opens http://localhost:10858 in your browser.
Or: just webapp if your justfile wraps the same flow.
If the start script fails, use two terminals from the ocr-mcp repo root:
Terminal 1 (backend):
$env:PYTHONPATH = (Get-Location).Path; uv run uvicorn backend.app:app --host 127.0.0.1 --port 10859Terminal 2 (frontend):
cd web_sota; npm run dev -- --port 10858 --host
Then open http://localhost:10858
Tests: uv sync --extra dev then uv run python -m pytest or python scripts/run_tests.py --suite quick. See tests/README.md.
🛡️ Industrial Quality Stack
This project adheres to SOTA 14.1 industrial standards for high-fidelity agentic orchestration:
Python (Core): Ruff for linting and formatting. Zero-tolerance for
printstatements in core handlers (T201).Webapp (UI): Biome for sub-millisecond linting. Strict
noConsoleLogenforcement.Protocol Compliance: Hardened
stdout/stderrisolation to ensure crash-resistant JSON-RPC communication.Automation: Justfile recipes for all fleet operations (
just lint,just fix,just dev).Security: Automated audits via
banditandsafety.
License
MIT see LICENSE.
This server cannot be installed
Maintenance
Appeared in Searches
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sandraschi/ocr-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server