OCR-MCP

CHANGELOG.md•7.66 KiB

# Changelog All notable changes to OCR-MCP will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [0.2.0-alpha.0] - 2026-01-19 ### 🚀 **Major Improvements** #### **Development Environment Modernization** - **Streamlined Tooling**: Removed redundant Black and isort dependencies - Ruff now handles all linting, formatting, and import sorting - **Enhanced Ruff Configuration**: Configured Ruff for comprehensive code quality with import sorting and first-party package recognition - **Comprehensive Pre-commit Hooks**: Added 20+ quality checks including security scanning, complexity analysis, and secret detection - **Automated Dev Setup**: Created `ocr-mcp-setup-dev` script for one-command development environment setup - **CI/CD Pipeline Enhancement**: Improved workflow with better error handling, security audits, and quality reports #### **Code Quality & Standards** - **Advanced Linting**: Integrated MyPy, Bandit, Pip-Audit, Radon, and Detect-Secrets - **Security Hardening**: Added comprehensive security scanning and vulnerability detection - **Documentation Generation**: Added pDoc for automatic API documentation generation - **Type Safety**: Enhanced type checking with proper dependency management #### **Infrastructure Improvements** - **Port Standardization**: Consolidated all ports to 15550 for consistent development experience - **Frontend-Backend Integration**: Fixed React app serving issues with proper static file handling - **Testing Framework**: Enhanced test suite with advanced fixtures, mock servers, and comprehensive coverage - **Build System**: Improved Poetry configuration with proper dependency grouping #### **Project Maturity** - **Professional Documentation**: Created dedicated `AI_MODELS.md` with detailed backend specifications - **Comprehensive README**: Added development setup guides, pre-commit documentation, and troubleshooting - **Changelog Management**: Established proper version tracking and release notes - **Badge System**: Added version, CI/CD, coverage, and status badges ### 🛠️ **Technical Enhancements** #### **Backend Improvements** - **Ruff Integration**: Single tool for linting, formatting, and import sorting - **Error Handling**: Improved exception chaining and bare except fixes - **Static File Serving**: Proper React SPA routing with catch-all handlers - **CORS Configuration**: Updated origins for localhost:15550 #### **Frontend Updates** - **API Configuration**: Standardized backend URL to localhost:15550 - **Build Process**: Fixed static file generation and distribution - **Settings Management**: Updated default backend URLs #### **Testing Infrastructure** - **Advanced Fixtures**: Enhanced pytest configuration with proper path handling - **Mock Servers**: Improved testing utilities with configurable ports - **Performance Testing**: Added benchmark framework and load testing capabilities - **Cross-Platform**: Windows-compatible test execution #### **CI/CD Pipeline** - **Multi-OS Testing**: Ubuntu and Windows CI with Python 3.9-3.11 - **Quality Gates**: Security audits, complexity analysis, and coverage requirements - **Documentation Deployment**: Automated pDoc generation and GitHub Pages deployment - **Release Automation**: GitHub releases with comprehensive test validation ### 📊 **Quality Metrics** - **Code Coverage**: Maintained 90%+ test coverage requirement - **Security**: Zero high-severity vulnerabilities (pip-audit, Bandit) - **Complexity**: Cyclomatic complexity analysis with Radon - **Dependencies**: Vulnerability scanning and license compliance ### 🔄 **Breaking Changes** - **Tooling Migration**: Black and isort removed in favor of Ruff - **Port Changes**: All services now use port 15550 - **Import Structure**: Ruff import sorting may reorganize imports ### 🧪 **Testing & Validation** - **Unit Tests**: Comprehensive backend and frontend test coverage - **Integration Tests**: End-to-end workflow validation - **WebApp Tests**: Playwright-based UI testing with server readiness checks - **Performance Benchmarks**: Automated performance regression detection ## [0.1.2] - 2026-01-01 ### Added - **Singleton Backend Manager**: Refactored `BackendManager` in `app.py` to a global singleton, ensuring COM context stability. - **Robust WIA 2.0 Acquisition**: Implemented explicitly scoped `CoInitialize` calls and reconnection logic in `wia_scanner.py` for hardware stability. - **Hardware Stability**: Successfully resolved the `WIA_ERROR_BUSY` (0x8021006B) and acquisition failures for Canon LiDE 400 scanners. - **Professional Web Interface**: Finalized integration of the modern React-based UI with the stable backend. ### Fixed - Indentation errors and logic flow in `webapp/backend/app.py` `/api/scan` endpoint. - Redundant backend re-initialization that caused resource churn and COM instability. - Port conflict resolution and documentation (Standardized on port 8765). ## [0.1.1] - 2025-12-23 ### Added - Complete implementation of all 6 advanced OCR backends: - Mistral OCR 3 (State-of-the-art API-based OCR, 74% win rate over OCR2) - DeepSeek-OCR (4.7M+ downloads) - Florence-2 (Microsoft vision foundation model) - DOTS.OCR (Document structure specialist) - PP-OCRv5 (Industrial PaddlePaddle OCR) - Qwen-Image-Layered (Advanced image decomposition) - Full scanner integration with WIA (Windows Image Acquisition) - Comprehensive document processing for PDF, CBZ/CBR, and images - Modern web application with FastAPI backend and responsive frontend - 7 fully functional MCP tools with portmanteau design - Advanced comic/manga processing with scaffold separation - Batch processing capabilities with concurrent operations - Complete project documentation and usage guides ### Changed - Updated README with current backend matrix and tool ecosystem - Enhanced documentation with detailed backend descriptions - Improved error handling and user feedback throughout ### Fixed - Unicode encoding issues in Windows environment - Server startup problems with stdio mode - Logging configuration conflicts - Backend interface inconsistencies - Missing dependencies and import errors - PP-OCRv5 backend availability check (removed deprecated PaddleOCR parameters) - PP-OCRv5 backend now fully functional with automatic model downloading - Webapp startup issues with MCP client initialization - MCP client JSON parsing errors from server log messages - Incorrect webapp port documentation (8000 → 7460) - Blocking webapp startup during MCP client initialization ### Tested - PP-OCRv5 backend successfully tested and verified working - All 5 OCR models automatically downloaded and initialized: - PP-LCNet_x1_0_doc_ori (document orientation detection) - UVDoc (document layout analysis) - PP-LCNet_x1_0_textline_ori (text line orientation) - PP-OCRv5_server_det (text detection) - en_PP-OCRv5_mobile_rec (text recognition) - All 9 OCR backends successfully initialized with graceful fallback system - 6 out of 9 backends fully functional and available for use - Backend manager properly handles failed backends with mock implementations ### Added - MockOCRBackend class for graceful degradation of failed backends - Comprehensive backend fallback system preventing crashes - All 9 OCR backends now available in the system: - DeepSeek-OCR: Working (4.7M+ downloads) - Florence-2: Working (Microsoft vision model) - DOTS.OCR: Working (document structure) - PP-OCRv5: Working (industrial PaddlePaddle) - GOT-OCR2.0: Working (legacy backend) - Tesseract: Working (classic OCR) - Mistral OCR 3: Ready (API-based, requires key) - Qwen-Image-Layered: Available (model not found) - EasyOCR: Available (Unicode issues)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sandraschi/ocr-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CHANGELOG.md•7.66 KiB