OCR-MCP
References arXiv papers for OCR models including DeepSeek-OCR and Qwen-Image-Layered, providing access to research documentation for the integrated OCR technologies.
Uses FastAPI as the backend framework for the WebApp interface, providing RESTful API server with async processing for document OCR operations.
Integrates multiple OCR models hosted on GitHub repositories including GOT-OCR2.0, providing access to state-of-the-art OCR engines and their source code.
Integrates multiple state-of-the-art OCR models from Hugging Face including DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, and Qwen-Image-Layered for comprehensive document processing capabilities.
Integrates PaddlePaddle's PP-OCRv5 OCR system for industrial-grade text extraction with high accuracy, fast inference, and edge deployment capabilities.
Uses Poetry for dependency management and installation of the OCR-MCP server and its required packages.
Leverages PyTorch for GPU-accelerated OCR model inference, enabling high-performance document processing with CUDA support.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@OCR-MCPscan this receipt and extract the total amount with DeepSeek-OCR"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
OCR-MCP
Complete AI OCR webapp and MCP server. A web app for people (draganddrop OCR, scanner, batch) and a FastMCP 3.1 MCP server for agentic IDEsClaude, Cursor, Windsurfso agents can run OCR, preprocessing, and workflows as tools. Same 13 engines, WIA scanner (Windows), and pipelines; one repo.
Topics: ocr, mcp, fastmcp, document-processing, scanner, wia, pdf, computer-vision, model-context-protocol, llm
What it does
Web app React (
web_sota/) + FastAPI (backend/app.py): upload or scan, pick engine, get text/PDF/JSON. Ports 10858 (Vite) and 10859 (API). In-app Help (/help) documents the web UI, the MCP server, and OCR backends.MCP server FastMCP 3.1 stdio: tools for OCR, preprocessing, scanner, workflows. Sampling defaults to local Ollama (
http://127.0.0.1:11434/v1, modelllama3.2) no cloud API key. SetOCR_SAMPLING_USE_CLIENT_LLM=1to use the host IDEs LLM instead. Mistral OCR usesMISTRAL_API_KEYwhen you call that backend. See AI_FEATURES.md.
Features: 13 backends (PaddleOCR-VL-1.5, Nemotron VL 8B, DeepSeek-OCR-2, Mistral OCR, ) Auto backend selection Preprocessing (deskew, enhance, crop) Layout & table extraction Quality assessment WIA scanner Batch & pipelines Multi-format export
Related MCP server: MCP PDF Reader Server
Docs
Doc | Description |
Install, run MCP, Web UI ( | |
Web FastAPI backend: same venv as | |
Architecture, tools, config, development, packaging | |
Engines, capabilities, hardware (see also AI_MODELS.md) | |
Per-model pip packages, system deps, env/config | |
Portmanteau tools, operation status, corpus v0 | |
Sampling, SEP-1577, agentic workflows, prompts | |
Source for | |
Verified SOTA v12.0 Architecture |
Also: JUSTFILE.md (just recipes) OCR-MCP_MASTER_PLAN.md (roadmap) tests/README.md (testing)
Quick Start
git clone https://github.com/sandraschi/ocr-mcp
cd ocr-mcp
justThis opens an interactive dashboard showing all available commands. Run just bootstrap to install dependencies, then just serve or just dev to start.
Manual Setup
If you don't have just installed:
🛡️ Industrial Quality Stack
This project adheres to SOTA 14.1 industrial standards for high-fidelity agentic orchestration:
Python (Core): Ruff for linting and formatting. Zero-tolerance for
printstatements in core handlers (T201).Webapp (UI): Biome for sub-millisecond linting. Strict
noConsoleLogenforcement.Protocol Compliance: Hardened
stdout/stderrisolation to ensure crash-resistant JSON-RPC communication.Automation: Justfile recipes for all fleet operations (
just lint,just fix,just dev).Security: Automated audits via
banditandsafety.
License
MIT see LICENSE.
This server cannot be installed
Maintenance
Appeared in Searches
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sandraschi/ocr-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server