OwlOCR MCP

README.md•5.98 KiB

# OwlOCR MCP [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![macOS](https://img.shields.io/badge/platform-macOS-lightgrey.svg)](https://www.apple.com/macos/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) MCP (Model Context Protocol) server for PDF and image OCR on macOS. Supports two backends: - **OwlOCR CLI** - Higher accuracy (recommended) - **Vision Framework** - No external dependencies ## Features - 📄 **PDF OCR** - Extract text from PDF files page by page with separators - 🖼️ **Image OCR** - Extract text from PNG, JPEG, and other image formats - 🌏 **Multi-language** - Korean + English by default (configurable) - 🔄 **Dual Backend** - Auto-selects OwlOCR if available, falls back to Vision Framework - ⚡ **Async** - Non-blocking execution for MCP clients ## Benchmark Results Tested on a 4-page Korean theological document with Hebrew text: | Metric | Vision Framework | OwlOCR CLI | |--------|------------------|------------| | **Time** | 9.87s | 9.30s | | **Time/Page** | 2.47s | 2.33s | | **Word Accuracy** | 85.62% | **91.79%** | | **Character Accuracy** | 94.46% | **95.07%** | **Winner: OwlOCR CLI** - Faster and more accurate. ## Requirements - **macOS** (uses Apple Vision Framework / OwlOCR.app) - **Python 3.11+** - **[OwlOCR.app](https://owlocr.com)** (optional, for better accuracy) ## Installation ### Using uv (recommended) ```bash git clone https://github.com/yourusername/owlocr-mcp.git cd owlocr-mcp uv sync ``` ### Using pip ```bash git clone https://github.com/yourusername/owlocr-mcp.git cd owlocr-mcp pip install -e . ``` ## MCP Client Configuration ### Claude Desktop Add to `~/Library/Application Support/Claude/claude_desktop_config.json`: ```json { "mcpServers": { "owlocr": { "command": "uv", "args": ["run", "--directory", "/path/to/owlocr-mcp", "owlocr-mcp"] } } } ``` ### Generic MCP Client ```json { "mcpServers": { "owlocr": { "command": "/path/to/owlocr-mcp/.venv/bin/python", "args": ["-m", "owlocr_mcp.server"] } } } ``` ## Available Tools ### `ocr_pdf_to_text` Extract text from a PDF file. **Parameters:** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `pdf_path` | string | required | Absolute path to the PDF file | | `pages` | list[int] | null | Page numbers to process (1-based). If null, all pages | | `dpi` | int | 200 | Resolution for rendering. Higher = better quality but slower | | `backend` | string | "auto" | `"auto"`, `"owlocr"`, or `"vision"` | | `languages` | list[string] | null | Language codes (Vision only). Default: `["ko-KR", "en-US"]` | **Example:** ``` Extract text from /Users/me/document.pdf using OwlOCR ``` **Output:** ``` 첫 번째 페이지 내용... ===== Page 2 ===== 두 번째 페이지 내용... --- OCR Complete: 2 page(s) processed using OwlOCR CLI --- ``` ### `ocr_image_to_text` Extract text from an image file. **Parameters:** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `image_path` | string | required | Absolute path to the image file | | `backend` | string | "auto" | `"auto"`, `"owlocr"`, or `"vision"` | | `languages` | list[string] | null | Language codes (Vision only) | ### `check_ocr_backends` Check available OCR backends on the system. **Output:** ``` OCR Backend Status: ✅ Vision Framework: Available (macOS built-in) ✅ OwlOCR CLI: Available (/Applications/OwlOCR.app) Recommendation: Use backend='owlocr' for best accuracy ``` ## Backend Selection | Backend | Accuracy | Speed | Requirements | |---------|----------|-------|--------------| | `owlocr` | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | OwlOCR.app installed | | `vision` | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | None (macOS built-in) | | `auto` | Best available | - | Uses OwlOCR if available | ## Running the Benchmark Compare backends on your own PDF: ```bash # Both backends uv run python benchmark.py /path/to/your.pdf # With accuracy comparison (requires ground truth) uv run python benchmark.py /path/to/your.pdf --show-text # Specific backend only uv run python benchmark.py /path/to/your.pdf --method owlocr uv run python benchmark.py /path/to/your.pdf --method vision ``` ## Project Structure ``` owlocr-mcp/ ├── src/owlocr_mcp/ │ ├── __init__.py │ ├── server.py # MCP server with tools │ ├── ocr.py # Vision Framework backend │ ├── ocr_owlocr.py # OwlOCR CLI backend │ └── pdf.py # PDF processing utilities ├── benchmark.py # Performance comparison script ├── pyproject.toml └── README.md ``` ## How It Works ### OwlOCR Backend 1. Render PDF pages to PNG using `pypdfium2` 2. Copy images to OwlOCR sandbox: `~/Library/Containers/JonLuca-DeCaro.OwlOCR/Data/tmp/` 3. Run CLI: `/Applications/OwlOCR.app/Contents/MacOS/OwlOCR --cli --input <file>` 4. Combine results with page separators ### Vision Framework Backend 1. Render PDF pages to PNG using `pypdfium2` 2. Load as `CIImage` via PyObjC 3. Create `VNRecognizeTextRequest` with accurate recognition level 4. Process with `VNImageRequestHandler` 5. Sort results by position and combine ## Troubleshooting ### "OwlOCR.app not found" Install OwlOCR from [owlocr.com](https://owlocr.com) or use `backend="vision"`. ### File picker dialog appears This happens when OwlOCR can't access files outside its sandbox. The MCP server handles this by copying files to the sandbox temp directory automatically. ### Poor accuracy on specific languages For Vision Framework, specify languages explicitly: ```python ocr_pdf_to_text(pdf_path, languages=["ja-JP", "en-US"]) ``` Supported language codes: `ko-KR`, `en-US`, `ja-JP`, `zh-Hans`, `zh-Hant`, etc. ## License MIT License - see [LICENSE](LICENSE) file. ## Acknowledgments - [OwlOCR](https://owlocr.com) by JonLuca DeCaro - [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk) - Apple Vision Framework

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jangisaac-dev/owlocr-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•5.98 KiB