Skip to main content
Glama

OCR-MCP: Professional Document Processing Suite

Version Python FastMCP OCR Engines Web Interface License CI/CD Coverage Status

Complete document processing solution with 7 state-of-the-art OCR engines, intelligent preprocessing, document analysis, quality assessment, workflow automation, and professional web interface.

πŸ“‹ Table of Contents

🎯 What is OCR-MCP?

OCR-MCP is a complete document processing suite built on FastMCP, providing enterprise-grade OCR capabilities with intelligent automation, professional web interface, and comprehensive document understanding tools.

πŸš€ Complete Document Processing Suite (Integrated)

OCR-MCP provides a full document processing ecosystem:

πŸ“₯ Input Sources: Direct scanner control, file upload, batch processing πŸ–ΌοΈ Preprocessing: Deskew, enhance, crop, rotate, noise reduction πŸ” Analysis: Layout detection, table extraction, form analysis, metadata πŸ“Š Quality: OCR validation, backend comparison, confidence scoring πŸ”„ Workflows: Custom pipelines, intelligent routing, batch automation πŸ“„ Output: Multiple formats (text, HTML, PDF, JSON, searchable PDFs)

πŸ€– Intelligent Automation

  • Auto-Backend Selection: Automatically chooses best OCR engine per document

  • Quality-Gated Processing: Multiple attempts with quality thresholds

  • Document Classification: Auto-detects document types (invoices, forms, etc.)

  • Workflow Orchestration: Custom processing pipelines with conditional logic

  • Batch Optimization: Concurrent processing with intelligent resource management

Primary OCR Engines

πŸš€ Mistral OCR 3 (December 2025) - State-of-the-Art Document Processing

  • Performance: 74% win rate over Mistral OCR 2 on forms, scanned docs, complex tables, handwriting.

  • Latency: ~0.7s average processing time (OCR-2512 SOTA API).

  • Integration: Dedicated SOTA OCR payload for high-fidelity Markdown extraction.

  • Capabilities: Advanced handwriting recognition, form processing, scanned document handling, complex table reconstruction

  • Strengths: Superior accuracy on enterprise document types, cost-effective at $2/1K pages, HTML table reconstruction

  • Repository: https://mistral.ai/products/ocr

  • API: https://mistral.ai/docs (mistral-ocr-2512 model)

πŸ”₯ DeepSeek-OCR (October 2025) - Current State-of-the-Art

🎯 Florence-2 (June 2024) - Microsoft's Vision Foundation Model

  • Architecture: Unified vision-language model for various vision tasks

  • OCR Capabilities: Excellent text extraction and layout understanding

  • Strengths: Multi-task learning, fine-grained text recognition

  • Repository: https://huggingface.co/microsoft/Florence-2-base

πŸ“Š DOTS.OCR (July 2025) - Document Understanding Specialist

πŸš€ PP-OCRv5 (2025) - Industrial-Grade OCR

🎨 Qwen-Image-Layered (December 2025) - Advanced Image Decomposition

  • Technology: Decomposes images into multiple independent RGBA layers

  • OCR Integration: Isolate text, background, and structural elements for better OCR

  • Capabilities: Layer-independent editing, resizing, repositioning, recoloring

  • Repository: https://huggingface.co/Qwen/Qwen-Image-Layered

  • Paper: https://arxiv.org/abs/2512.15603

  • Use Case: Pre-process complex documents by separating text layers from backgrounds

OCR Capabilities

  • Plain Text OCR: Standard text extraction from images

  • Formatted Text OCR: Preserves layout and formatting structure

  • Fine-Grained OCR: Extract text from specific regions with coordinate precision

  • Multi-Crop OCR: Process documents with complex layouts by dividing into regions

  • HTML Rendering: Generate HTML output with visual layout preservation

  • Document Understanding: Table extraction, formula recognition, layout analysis

Auto-Backend Selection

OCR-MCP automatically selects the best backend based on:

  • Document Type: PDF, image, scanned document, or comic

  • Content Complexity: Plain text vs. structured documents

  • Language Requirements: Multilingual content detection

  • Performance Needs: Speed vs. accuracy trade-offs

Advanced Document Pre-processing

Qwen-Image-Layered Integration revolutionizes OCR through intelligent image decomposition:

  • Layer Separation: Decompose documents into independent RGBA layers (text, background, images, graphics)

  • Selective OCR: Process text layers independently for improved accuracy on complex documents

  • Noise Reduction: Isolate and remove background noise, watermarks, and interfering elements

  • Content Isolation: Separate handwritten notes, stamps, and annotations from main text

  • Layout Preservation: Maintain document structure while enabling targeted OCR processing

  • Multi-modal Enhancement: Combine with traditional OCR for hybrid processing pipelines

Community & Industry Adoption

Current OCR landscape shows rapid evolution:

  • DeepSeek-OCR: Leading downloads indicate community preference

  • Florence-2: Academic and research adoption

  • DOTS.OCR: Document processing industry standard

  • PP-OCRv5: Production deployment in enterprise applications

✨ Complete Feature Suite

🎯 Core OCR Capabilities

  • 7 State-of-the-Art OCR Engines: Mistral OCR 3, DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, Qwen-Image-Layered, EasyOCR

  • Intelligent Backend Selection: Auto-chooses optimal engine per document type

  • Multiple Processing Modes: Text, formatted, layout preservation, fine-grained extraction

  • Multi-language Support: 80+ languages across all backends

πŸ–ΌοΈ Advanced Image Preprocessing

  • Deskew: Automatic text straightening with multiple algorithms

  • Enhancement: Contrast, brightness, sharpness, noise reduction

  • Cropping: Auto-detect content boundaries, manual coordinates

  • Rotation: Auto-detect orientation, manual angle correction

  • Quality Pipeline: Complete preprocessing workflow

πŸ” Document Structure Analysis

  • Layout Detection: Headers, paragraphs, columns, sections

  • Table Extraction: Structured data from complex tables

  • Form Analysis: Checkbox, text field, signature detection

  • Reading Order: Logical text flow determination

  • Document Classification: Auto-detect document types

πŸ“Š Quality Assessment & Validation

  • OCR Accuracy Scoring: Character, word, and sequence accuracy

  • Backend Comparison: Performance analysis across engines

  • Confidence Analysis: Detailed confidence metrics and thresholds

  • Ground Truth Validation: Compare against known correct text

  • Quality Recommendations: Automated improvement suggestions

πŸ”„ Intelligent Workflow Automation

  • Custom Pipeline Builder: Drag-and-drop workflow creation

  • Quality Gates: Conditional processing based on results

  • Batch Orchestration: Concurrent processing with progress tracking

  • Error Recovery: Automatic retry with fallback strategies

  • Resource Optimization: Intelligent load balancing

πŸ”„ Professional Format Conversion

  • PDF Processing: Extract images, create searchable PDFs

  • Image Conversion: Format conversion with quality control

  • Document Assembly: Combine images into PDFs

  • Searchable PDFs: OCR text embedded as invisible layers

  • Multi-format Export: Text, HTML, JSON, XML, Word

πŸ“· Complete Scanner Integration

  • WIA Support: Direct Windows scanner control

  • Device Discovery: Auto-detect connected scanners

  • Advanced Settings: DPI, color modes, paper sizes, brightness/contrast

  • Batch Scanning: ADF support with page separation

  • Preview Mode: Positioning and cropping verification

🌐 Professional Web Interface

The OCR-MCP web interface is accessible at:

  • URL: http://localhost:8765

  • Dashboard: Real-time monitoring of all OCR and scanner operations

  • Scanner Control: Direct hardware acquisition with live preview

  • Batch Processing: Parallel document processing with progress tracking

  • Hardware Backend: Robust WIA 2.0 implementation with global singleton management for device stability.

πŸ—οΈ Architecture

AI Models & OCR Engines

OCR-MCP integrates 8 state-of-the-art AI models for comprehensive document processing:

Primary AI Models (7 Advanced Backends)

πŸš€ DeepSeek-OCR - Vision-language model for complex documents 🎨 Florence-2 - Microsoft's unified vision foundation model πŸ“Š DOTS.OCR - Document table and structure specialist 🏭 PP-OCRv5 - Industrial-grade PaddlePaddle OCR πŸ–ΌοΈ Qwen-Image-Layered - Advanced image decomposition 🎯 GOT-OCR 2.0 - General OCR theory implementation

Legacy/Compatibility Models

πŸ“– Tesseract OCR - Classic open-source OCR engine πŸ”€ EasyOCR - Ready-to-use OCR with GPU support

Model Capabilities Matrix

Model

Text OCR

Tables

Forms

Handwriting

Multi-lang

GPU Support

Speed

DeepSeek-OCR

βœ…

βœ…

βœ…

βœ…

βœ…

βœ…

Medium

Florence-2

βœ…

βœ…

βœ…

⚠️

βœ…

βœ…

Fast

DOTS.OCR

βœ…

βœ…

βœ…

⚠️

βœ…

βœ…

Fast

PP-OCRv5

βœ…

⚠️

⚠️

⚠️

βœ…

βœ…

Very Fast

Qwen-Layered

βœ…

βœ…

βœ…

βœ…

βœ…

βœ…

Slow

GOT-OCR 2.0

βœ…

βœ…

βœ…

βœ…

βœ…

βœ…

Medium

EasyOCR

βœ…

⚠️

⚠️

βœ…

βœ…

βœ…

Medium

Tesseract

βœ…

⚠️

⚠️

⚠️

βœ…

❌

Very Fast

πŸ“– Complete AI Models Documentation - Detailed information about all integrated AI models, performance benchmarks, and technical specifications.

Portmanteau Tool Ecosystem (6 Tools)

🎯 Document Processing (Portmanteau Tool)

document_processing(operation="...") - Consolidates OCR, analysis, and quality assessment

  • "process_document": Single document OCR with backend selection

  • "process_batch": Concurrent batch document processing

  • "extract_regions": Fine-grained region-based OCR

  • "analyze_layout": Document structure and layout detection

  • "extract_table_data": Structured table data extraction

  • "detect_form_fields": Form element identification

  • "analyze_reading_order": Logical text flow determination

  • "classify_document": Auto-document type classification

  • "extract_metadata": Dates, names, numbers extraction

  • "assess_quality": Comprehensive OCR quality scoring

  • "compare_backends": Backend performance comparison

  • "validate_accuracy": Ground truth accuracy validation

  • "analyze_image_quality": Pre-OCR quality assessment

πŸ–ΌοΈ Image Management (Portmanteau Tool)

image_management(operation="...") - Consolidates preprocessing and conversion operations

  • "deskew": Straighten skewed/scanned documents

  • "enhance": Improve image quality (contrast, sharpness, noise reduction)

  • "rotate": Rotate images by angle or auto-detect orientation

  • "crop": Remove unwanted borders or focus on content areas

  • "preprocess": Complete preprocessing pipeline for OCR

  • "convert_format": Convert between image formats with quality control

  • "convert_pdf_to_images": Extract images from PDF documents

  • "embed_ocr_text": Create searchable PDFs with embedded OCR text

πŸ“· Scanner Operations (Portmanteau Tool)

scanner_operations(operation="...") - Consolidates all scanner hardware control

  • "list_scanners": Discover and enumerate available scanners

  • "scanner_properties": Get detailed scanner capabilities and settings

  • "configure_scan": Set scan parameters (DPI, color mode, paper size)

  • "scan_document": Perform single document scan

  • "scan_batch": Batch scan multiple documents with ADF support

  • "preview_scan": Low-resolution preview scan for positioning

πŸ”„ Workflow Management (Portmanteau Tool)

workflow_management(operation="...") - Consolidates batch processing and system operations

  • "process_batch_intelligent": Intelligent batch processing with quality control

  • "create_processing_pipeline": Create custom processing workflows

  • "execute_pipeline": Run custom pipelines on documents

  • "monitor_batch_progress": Track batch processing status and metrics

  • "optimize_processing": Optimize batch processing parameters

  • "ocr_health_check": System health and backend status

  • "list_backends": Available OCR backends and capabilities

  • "manage_models": GPU memory and model lifecycle management

❓ Help & Documentation (Portmanteau Tool)

help(level="...", topic="...") - Contextual help and documentation

  • "basic": Quick start guide and essential commands

  • "intermediate": Detailed tool descriptions and workflows

  • "advanced": Technical architecture and implementation details

  • "expert": Development troubleshooting and system internals

πŸ“Š System Status (Portmanteau Tool)

status(level="...", focus="...") - System monitoring and diagnostics

  • "basic": Quick system health overview

  • "intermediate": Detailed backend and resource status

  • "advanced": Comprehensive diagnostics with performance metrics

  • Custom focus areas: "backends", "memory", "disk", "network"

WebApp Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Professional Web Interface β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Single β”‚ β”‚ Batch β”‚ β”‚ Image β”‚ β”‚ Doc β”‚ β”‚ β”‚ β”‚ Upload β”‚ β”‚ Processing β”‚ β”‚ Preproc β”‚ β”‚ Analysis β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Quality β”‚ β”‚ Workflows β”‚ β”‚ Conversion β”‚ β”‚ Scanner β”‚ β”‚ β”‚ β”‚ Assess β”‚ β”‚ & Pipelinesβ”‚ β”‚ & Export β”‚ β”‚ Control β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ FastMCP Server (20+ Tools) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ OCR Engines β”Œβ”€β”€β”¬β”€β”€β”¬β”€β”€β”¬β”€β”€β”¬β”€β”€β”¬β”€β”€β”¬β”€β”€β” Document Processing β”‚ β”‚ β”‚M β”‚D β”‚F β”‚D β”‚P β”‚Q β”‚E β”‚ Image Analysis β”‚ β”‚ β”‚3 β”‚S β”‚2 β”‚O β”‚P β”‚I β”‚O β”‚ Quality Assessment β”‚ β”‚ β””β”€β”€β”΄β”€β”€β”΄β”€β”€β”΄β”€β”€β”΄β”€β”€β”΄β”€β”€β”΄β”€β”€β”˜ Workflow Automation β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Python 3.11+

  • GPU recommended (for GOT-OCR2.0 and other ML models)

  • 8GB+ VRAM for optimal performance

Installation

# Clone the repository git clone https://github.com/sandraschi/ocr-mcp.git cd ocr-mcp # Install dependencies with Poetry (recommended) poetry install # For GPU support (optional but recommended) poetry run pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

MCP Configuration

Add to your claude_desktop_config.json:

{ "mcpServers": { "ocr-mcp": { "command": "python", "args": ["-m", "ocr_mcp.server"], "env": { "OCR_CACHE_DIR": "/path/to/model/cache", "OCR_DEVICE": "cuda" } } } }

WebApp Mode

OCR-MCP includes a full-featured web interface for document processing. The webapp can connect to a separately running OCR-MCP server instance.

# Run the web application (automatically starts MCP server) poetry run ocr-mcp-webapp # Or use the script directly python scripts/run_webapp.py

Option 2: Run MCP Server and Webapp Separately

If the automatic MCP server startup doesn't work, run them separately:

Terminal 1 - Start MCP Server:

python -m src.ocr_mcp.server

Terminal 2 - Start Webapp:

python scripts/run_webapp.py

The web interface provides:

  • πŸ“€ Drag & drop file upload - Support for PDF, images, CBZ

  • πŸ”„ Real-time processing - Live status updates and progress

  • πŸ“· Scanner integration - Direct scanner control via web interface

  • πŸ“Š Batch processing - Process multiple documents simultaneously

  • 🎨 OCR backend selection - Choose from 5 different OCR engines

  • πŸ“‹ Results visualization - Text, JSON, and HTML output formats

Access the webapp at: http://localhost:15550

🌐 Professional Web Interface

OCR-MCP features a comprehensive professional web interface designed for enterprise document processing workflows.

🎨 Interface Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ πŸ” OCR-MCP Professional Document Processing Suite β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”Œβ”€ Input ─┬─ Processing ─┬─ Analysis ─┬─ Quality ─┬─ Output ┐ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Upload β”‚ Preprocess β”‚ Structure β”‚ Assess β”‚ Export β”‚ β”‚ β”‚ β”‚ Batch β”‚ Enhance β”‚ Tables β”‚ Compare β”‚ Convert β”‚ β”‚ β”‚ β”‚ Scanner β”‚ Deskew β”‚ Forms β”‚ Validate β”‚ Search- β”‚ β”‚ β”‚ β”‚ β”‚ Rotate β”‚ Metadata β”‚ Monitor β”‚ able PDFβ”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Workflow Dashboard | Quality Metrics | Progress Tracking β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Key Features

  • πŸ“Š Workflow-Based Processing: Step-by-step guidance through complex document processing

  • 🎯 Intelligent Automation: Auto-selection of optimal tools and settings

  • πŸ“ˆ Real-Time Analytics: Live quality metrics, confidence scores, processing times

  • πŸ”„ Batch Orchestration: Concurrent processing with detailed progress monitoring

  • 🎨 Visual Results: Multiple output viewers (text, structured data, analysis)

  • βš™οΈ Advanced Configuration: Fine-grained control over all processing parameters

  • πŸ“± Responsive Design: Works on desktop, tablet, and mobile devices

πŸ“± Interface Sections

πŸ“€ Single Document Processing

4-Step Intelligent Workflow:

  1. Upload: Drag-drop with format validation and preview

  2. Preprocessing: Visual before/after with deskew, enhance, crop tools

  3. OCR Processing: Backend selection with advanced options

  4. Results & Analysis: Multi-format output with quality metrics

Features:

  • Real-time processing status with progress bars

  • Quality score display (A-F grading system)

  • Confidence metrics and accuracy analysis

  • Export to 6+ formats (Text, JSON, HTML, PDF, Word, XML)

πŸ“¦ Intelligent Batch Processing

Smart Multi-Document Processing:

  • Strategy Selection: Auto, Quality-Focused, Speed, Custom Pipeline

  • Quality Gates: Configurable thresholds with automatic retries

  • Progress Dashboard: Real-time status for up to hundreds of documents

  • Concurrent Processing: Optimized resource utilization

  • Results Aggregation: Summary statistics and error reporting

Dashboard Features:

  • Individual document status tracking

  • Success/failure rates and time estimates

  • Quality distribution analysis

  • Bulk export and reporting tools

πŸ–ΌοΈ Image Preprocessing Studio

Professional Image Enhancement:

  • Visual Editor: Before/after comparison with split-view

  • Tool Palette: Deskew, enhance, crop, rotate with live preview

  • Quality Analysis: Automatic assessment of improvement effectiveness

  • Batch Processing: Apply pipelines to multiple images

  • Parameter Control: Fine-grained adjustment of all enhancement settings

πŸ” Document Analysis Lab

Advanced Structure Detection:

  • Layout Analysis: Header/footer detection, column identification

  • Table Extraction: Structured data from complex table layouts

  • Form Detection: Checkbox, text field, signature recognition

  • Reading Order: Logical text flow determination

  • Type Classification: Auto-document type identification

  • Metadata Extraction: Dates, names, numbers, addresses

πŸ“Š Quality Assessment Center

OCR Validation & Optimization:

  • Single Assessment: Comprehensive quality scoring for individual results

  • Backend Comparison: Performance analysis across all OCR engines

  • Accuracy Validation: Ground truth comparison with detailed metrics

  • Image Quality Check: Pre-OCR quality analysis and recommendations

  • Confidence Analysis: Detailed confidence scoring and error patterns

πŸ”„ Custom Pipeline Builder

Workflow Orchestration:

  • Visual Designer: Drag-and-drop pipeline creation

  • Step Library: All 20+ tools as reusable components

  • Conditional Logic: Quality gates and decision branches

  • Template System: Pre-built pipelines for common scenarios

  • Execution Monitoring: Real-time pipeline progress and debugging

πŸ“· Scanner Control Center

Professional Scanning:

  • Device Discovery: Auto-detection of WIA-compatible scanners

  • Advanced Settings: DPI, color modes, paper sizes, brightness/contrast

  • Preview Mode: Positioning verification before final scan

  • Batch Scanning: ADF support with automatic page separation

  • Integration: Seamless workflow connection to OCR processing

πŸ”§ Technical Architecture

Frontend Stack

  • Vanilla JavaScript: No heavy frameworks, fast loading

  • Modern CSS: Grid, Flexbox, CSS Variables, Animations

  • Responsive Design: Mobile-first approach

  • Progressive Enhancement: Works without JavaScript

  • Accessibility: WCAG 2.1 AA compliance

Backend Integration

  • FastAPI Server: Async processing with automatic MCP server management

  • RESTful API: Clean endpoints for all functionality

  • Real-time Updates: WebSocket-based progress monitoring

  • File Security: Secure temporary file handling

  • Error Recovery: Comprehensive error handling and user feedback

Performance Optimizations

  • Lazy Loading: Components load on demand

  • Background Processing: Non-blocking operations

  • Smart Caching: Results caching to avoid redundant processing

  • Resource Management: Intelligent memory and CPU utilization

  • Progressive Rendering: Fast initial load with incremental enhancement

🎯 User Experience Highlights

Smart Defaults

  • Intelligent backend selection based on document type

  • Automatic preprocessing pipeline recommendations

  • Quality threshold suggestions per document type

Guided Workflows

  • Step-by-step processing guidance

  • Contextual help and tooltips

  • Progressive disclosure of advanced options

Quality Assurance

  • Real-time quality metrics during processing

  • Automatic suggestions for improvement

  • Validation against quality thresholds

Batch Intelligence

  • Optimal concurrent processing limits

  • Automatic retry on failures

  • Quality-based prioritization

Export Flexibility

  • Multiple format support with one-click conversion

  • Bulk export capabilities

  • Custom export profiles

πŸ“Š Monitoring & Analytics

System Health

  • Real-time backend availability status

  • Resource utilization monitoring

  • Performance metrics dashboard

Processing Analytics

  • Success/failure rate tracking

  • Average processing times by backend

  • Quality score distributions

Batch Monitoring

  • Individual document status

  • Overall progress visualization

  • Error pattern analysis

πŸ”’ Security & Privacy

  • File Security: Secure temporary file handling with automatic cleanup

  • No External Calls: All processing happens locally

  • Data Privacy: No document content sent to external services

  • Local Processing: Complete offline capability

  • Audit Trail: Processing history and error logging

πŸ’‘ Usage Examples

Basic OCR Processing

# Auto-select best available backend result = await document_processing( operation="process_document", source_path="/path/to/document.png" ) print(result["text"]) # Extracted text

Formatted OCR with HTML Output

# DeepSeek-OCR formatted text preservation result = await document_processing( operation="process_document", source_path="/path/to/scanned_page.png", backend="deepseek-ocr", ocr_mode="format", output_format="html" ) # Returns: HTML with preserved layout and formatting

Fine-grained Region Extraction

# Extract text from specific coordinates result = await document_processing( operation="extract_regions", source_path="/path/to/document.png", region=[100, 200, 400, 300] # [x1,y1,x2,y2] ) # Returns: Structured text extraction by region

Batch Processing

# Process multiple documents results = await workflow_management( operation="process_batch_intelligent", document_paths=[ "/path/to/doc1.png", "/path/to/doc2.png", "/path/to/doc3.png" ], workflow_type="auto", quality_threshold=0.8 ) # Returns: Intelligent batch processing with quality control

🎨 Advanced Features

Document Layout Analysis

# Analyze document structure layout = await document_processing( operation="analyze_layout", source_path="/path/to/complex_document.png", analysis_type="comprehensive", detect_tables=True, detect_forms=True ) # Returns: Detected tables, columns, headers, text blocks

Multi-Backend Comparison

# Compare OCR accuracy across backends comparison = await document_processing( operation="compare_backends", source_path="/path/to/test_image.png", backends=["deepseek-ocr", "florence-2", "pp-ocrv5"] ) # Returns: Accuracy scores, processing times, confidence metrics

Image Preprocessing

# Enhance image quality for better OCR enhanced = await image_management( operation="preprocess", image_path="/path/to/skewed_document.png", operations=["deskew", "enhance", "crop"] ) # Returns: Preprocessed image optimized for OCR

πŸ”§ Configuration Options

Environment Variables

  • OCR_CACHE_DIR: Model cache directory (default: ~/.cache/ocr-mcp)

  • OCR_DEVICE: Computing device (cuda, cpu, auto)

  • OCR_MAX_MEMORY: Maximum GPU memory usage in GB

  • OCR_DEFAULT_BACKEND: Default OCR backend (got-ocr, tesseract, etc.)

  • OCR_BATCH_SIZE: Default batch processing size

Backend-Specific Settings

# config/ocr_config.yaml backends: got_ocr: model_size: "base" # or "large" cache_dir: "/models/got-ocr" device: "cuda:0" tesseract: language: "eng+fra+deu" config: "--psm 6" easyocr: languages: ["en", "fr", "de"] gpu: true

πŸ“Š Performance Benchmarks

Single Image Processing (GTX 3080)

Backend

Plain OCR

Formatted OCR

Fine-grained

GOT-OCR2.0

2.3s

3.1s

4.2s

Tesseract

0.8s

N/A

1.2s

EasyOCR

1.5s

N/A

2.1s

PaddleOCR

1.8s

2.9s

3.5s

Accuracy Comparison (Clean Documents)

Backend

Print Text

Handwriting

Mixed Content

GOT-OCR2.0

97.2%

89.1%

94.8%

Tesseract

92.1%

45.3%

78.9%

EasyOCR

94.7%

78.2%

88.5%

PaddleOCR

95.8%

82.1%

91.2%

πŸ› οΈ Development Status

  • βœ… Planning: Complete master plan and architecture

  • βœ… Phase 1: Core infrastructure (Completed)

  • βœ… Phase 2: Multi-backend OCR support (Completed)

  • βœ… Phase 3: Professional web interface (Completed)

  • βœ… Phase 4: Advanced document processing (Completed)

  • βœ… Phase 5: Scanner integration (Completed)

  • 🟑 Phase 6: Production deployment and optimization (Alpha Release)

  • πŸ”„ Phase 7: Beta testing and community feedback (Next)

  • πŸ”„ Phase 8: Production release preparation (Future)

βœ… Completed Features

  • FastMCP 2.14.3 Integration: State-of-the-art MCP server with conversational features

  • 8 AI Models: DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, Qwen-Image-Layered, GOT-OCR 2.0, EasyOCR, Tesseract

  • Professional React Webapp: Complete TypeScript frontend with modern UI/UX

  • Intelligent Backend Selection: Automatic model routing based on document analysis

  • Document Processing Pipeline: Multi-stage OCR with quality assessment

  • Advanced Image Preprocessing: Real-time enhancement with visual feedback

  • Scanner Integration: Direct WIA hardware control for Windows scanners

  • Batch Processing: Concurrent document processing with progress monitoring

  • Quality Assessment: OCR validation with accuracy metrics and recommendations

  • Format Conversion: Export to PDF, Word, JSON, HTML, and searchable PDFs

  • Comprehensive Error Handling: Structured errors with recovery suggestions

  • Cross-Platform Support: Windows and Linux with appropriate abstractions

  • Complete Documentation: AI models guide, technical specifications, testing framework

See OCR-MCP_MASTER_PLAN.md for detailed roadmap.

πŸ“š Documentation

πŸ“– Complete Documentation Suite

  • AI_MODELS.md - Comprehensive documentation of all 8 AI models used in OCR-MCP

    • Detailed model specifications and capabilities

    • Performance benchmarks and accuracy comparisons

    • Technical implementation details and integration guides

    • Model selection algorithms and optimization strategies

  • OCR-MCP_MASTER_PLAN.md - Technical master plan and architecture

    • System design and component architecture

    • Implementation roadmap and milestones

    • Technical specifications and requirements

    • Future development plans

  • tests/README.md - Testing framework documentation

    • Test organization and execution

    • Performance benchmarking procedures

    • Security testing methodologies

    • CI/CD integration guides

πŸ› οΈ Development Resources

πŸ“‹ Quick Reference

Resource

Purpose

Location

AI Models Guide

Model specifications & benchmarks

AI_MODELS.md

Technical Architecture

System design & roadmap

OCR-MCP_MASTER_PLAN.md

Testing Framework

Test execution & validation

tests/README.md

API Documentation

Interactive API explorer

http://localhost:15550/docs

Health Monitoring

System status & diagnostics

http://localhost:15550/api/health

🀝 Integration with Existing MCP Servers

CalibreMCP Integration

OCR-MCP enhances CalibreMCP's OCR capabilities:

# CalibreMCP can now use OCR-MCP for advanced processing result = await calibre_ocr( source="/path/to/scanned_book.pdf", provider="ocr-mcp", # New option! mode="format", render_html=True )

Document Processing Workflows

  • Research Papers: Extract structured text from academic PDFs

  • Receipt Processing: Automated data extraction from scanned receipts

  • Book Digitization: High-quality OCR for scanned books

  • Accessibility: Convert images to readable text for screen readers

πŸ“ˆ Roadmap

βœ… Completed Milestones

  • FastMCP 2.13+ Core Infrastructure

  • GOT-OCR2.0 Multi-mode Integration

  • Robust WIA 2.0 Hardware Integration (Canon LiDE 400 verified)

  • Professional React/Next.js Web Interface

  • Mistral OCR 3 (OCR-2512) SOTA Backend Implementation

  • Multi-format Pipeline (PDF, CBZ, Scanned Docs)

Immediate (Next 2-4 weeks)

  • Performance Benchmarking Suite

  • Advanced Image Preprocessing (Deskew/Enhance)

  • TWAIN Backend Support

  • Multi-language Model Fine-tuning

Medium-term (2-3 months)

  • Advanced Layout Intelligence (Panel analysis for Manga)

  • Batch processing concurrency optimizations

  • Cloud deployment (Docker/Kubernetes)

  • Mobile scanning workflow integration

🀝 Contributing

Development Setup

  1. Clone the repository

    git clone https://github.com/your-username/ocr-mcp.git cd ocr-mcp
  2. Install Poetry (if not already installed)

    pip install poetry
  3. Install dependencies

    poetry install
  4. Set up development environment (recommended)

    poetry run ocr-mcp-setup-dev # This installs pre-commit hooks and sets up the development environment
  5. Run tests

    poetry run pytest
  6. Start developing!

    • Pre-commit hooks will automatically format and lint your code

    • Run poetry run pre-commit run --all-files to check everything

    • Use poetry run python scripts/run_webapp.py to start the webapp

Pre-commit Hooks

This project uses pre-commit hooks to maintain code quality. The following tools are automatically run on each commit:

  • Ruff: Fast Python linter, formatter, and import sorter

  • MyPy: Type checker

  • Bandit: Security linter

  • Detect-secrets: Secret detection

  • Markdownlint: Markdown linter

To manually run all checks:

poetry run pre-commit run --all-files

OCR-MCP welcomes contributions! Areas of particular interest:

  • New OCR Backends: Integration of additional OCR engines

  • Performance Optimization: GPU memory management, batch processing

  • Specialized Models: Domain-specific OCR improvements

  • Documentation: Usage examples, integration guides

  • Testing: Comprehensive test coverage and benchmarks

πŸ“„ License

MIT License - see LICENSE for details.

πŸ™ Acknowledgments

  • GOT-OCR2.0 Team (UCAS): Revolutionary OCR model that inspired this project

  • FastMCP Community: Excellent framework for MCP server development

  • Open Source OCR Community: Tesseract, EasyOCR, PaddleOCR, and others


OCR-MCP: Democratizing state-of-the-art document understanding for the MCP ecosystem! 🌟

See OCR-MCP_MASTER_PLAN.md for technical details and implementation roadmap.

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sandraschi/ocr-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server