Which integrations are available for this server?

References arXiv papers for OCR models including DeepSeek-OCR and Qwen-Image-Layered, providing access to research documentation for the integrated OCR technologies. Uses FastAPI as the backend framework for the WebApp interface, providing RESTful API server with async processing for document OCR operations. Integrates multiple OCR models hosted on GitHub repositories including GOT-OCR2.0, providing access to state-of-the-art OCR engines and their source code. Integrates multiple state-of-the-art OCR models from Hugging Face including DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, and Qwen-Image-Layered for comprehensive document processing capabilities. Integrates PaddlePaddle's PP-OCRv5 OCR system for industrial-grade text extraction with high accuracy, fast inference, and edge deployment capabilities. Uses Poetry for dependency management and installation of the OCR-MCP server and its required packages. Leverages PyTorch for GPU-accelerated OCR model inference, enabling high-performance document processing with CUDA support.

How do I use OCR-MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@OCR-MCP scan this receipt and extract the total amount with DeepSeek-OCR" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

OCR-MCP

by sandraschi

Overview Schema Related Servers Score Discussions

Python

Hybrid

OCR-MCP

Complete AI OCR webapp and MCP server. A web app for people (draganddrop OCR, scanner, batch) and a FastMCP 3.1 MCP server for agentic IDEsClaude, Cursor, Windsurfso agents can run OCR, preprocessing, and workflows as tools. Same 13 engines, WIA scanner (Windows), and pipelines; one repo.

Topics: ocr, mcp, fastmcp, document-processing, scanner, wia, pdf, computer-vision, model-context-protocol, llm

Version Python FastMCP License OCR Engines Scanner Web UI Status

What it does

Web app React (web_sota/) + FastAPI (backend/app.py): upload or scan, pick engine, get text/PDF/JSON. Ports 10858 (Vite) and 10859 (API). In-app Help (/help) documents the web UI, the MCP server, and OCR backends.
MCP server FastMCP 3.1 stdio: tools for OCR, preprocessing, scanner, workflows. Sampling defaults to local Ollama (http://127.0.0.1:11434/v1, model llama3.2) no cloud API key. Set OCR_SAMPLING_USE_CLIENT_LLM=1 to use the host IDEs LLM instead. Mistral OCR uses MISTRAL_API_KEY when you call that backend. See AI_FEATURES.md.

Features: 13 backends (PaddleOCR-VL-1.5, Nemotron VL 8B, DeepSeek-OCR-2, Mistral OCR, ) Auto backend selection Preprocessing (deskew, enhance, crop) Layout & table extraction Quality assessment WIA scanner Batch & pipelines Multi-format export

Related MCP server: PDF MCP Flow

Docs

Doc	Description
Install	Install, run MCP, Web UI (`start.ps1`, ports 10858/10859), PyYAML notes, client config
Backend deps	Web FastAPI backend: same venv as `ocr-mcp`, `pyproject.toml`, PyTorch, `OCR_AUTO_INSTALL_DEPS`
Technical	Architecture, tools, config, development, packaging
OCR models	Engines, capabilities, hardware (see also AI_MODELS.md)
Backend requirements	Per-model pip packages, system deps, env/config
MCP toolset matrix	Portmanteau tools, operation status, corpus v0
AI features	Sampling, SEP-1577, agentic workflows, prompts
In-app Help	Source for `/help`: webapp vs MCP vs backends (mirrors INSTALL / TECHNICAL)
SOTA Compliance	Verified SOTA v12.0 Architecture

Also: JUSTFILE.md (just recipes) OCR-MCP_MASTER_PLAN.md (roadmap) tests/README.md (testing)

Quick Start

git clone https://github.com/sandraschi/ocr-mcp
cd ocr-mcp
just

This opens an interactive dashboard showing all available commands. Run just bootstrap to install dependencies, then just serve or just dev to start.

Manual Setup

If you don't have just installed:

🛡️ Industrial Quality Stack

This project adheres to SOTA 14.1 industrial standards for high-fidelity agentic orchestration:

Python (Core): Ruff for linting and formatting. Zero-tolerance for print statements in core handlers (T201).
Webapp (UI): Biome for sub-millisecond linting. Strict noConsoleLog enforcement.
Protocol Compliance: Hardened stdout/stderr isolation to ensure crash-resistant JSON-RPC communication.
Automation: Justfile recipes for all fleet operations (just lint, just fix, just dev).
Security: Automated audits via bandit and safety.

License

MIT see LICENSE.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

40dResponse time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Appeared in Searches

A server for finding information about OCR (Optical Character Recognition)

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sandraschi/ocr-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server