Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@OwlOCR MCPextract the text from /Users/username/Desktop/invoice.pdf"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
OwlOCR MCP
MCP (Model Context Protocol) server for PDF and image OCR on macOS. Supports two backends:
OwlOCR CLI - Higher accuracy (recommended)
Vision Framework - No external dependencies
Features
π PDF OCR - Extract text from PDF files page by page with separators
πΌοΈ Image OCR - Extract text from PNG, JPEG, and other image formats
π Multi-language - Korean + English by default (configurable)
π Dual Backend - Auto-selects OwlOCR if available, falls back to Vision Framework
β‘ Async - Non-blocking execution for MCP clients
Benchmark Results
Tested on a 4-page Korean theological document with Hebrew text:
Metric | Vision Framework | OwlOCR CLI |
Time | 9.87s | 9.30s |
Time/Page | 2.47s | 2.33s |
Word Accuracy | 85.62% | 91.79% |
Character Accuracy | 94.46% | 95.07% |
Winner: OwlOCR CLI - Faster and more accurate.
Requirements
macOS (uses Apple Vision Framework / OwlOCR.app)
Python 3.11+
OwlOCR.app (optional, for better accuracy)
Installation
Using uv (recommended)
Using pip
MCP Client Configuration
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
Generic MCP Client
Available Tools
ocr_pdf_to_text
Extract text from a PDF file.
Parameters:
Parameter | Type | Default | Description |
| string | required | Absolute path to the PDF file |
| list[int] | null | Page numbers to process (1-based). If null, all pages |
| int | 200 | Resolution for rendering. Higher = better quality but slower |
| string | "auto" |
|
| list[string] | null | Language codes (Vision only). Default: |
Example:
Output:
ocr_image_to_text
Extract text from an image file.
Parameters:
Parameter | Type | Default | Description |
| string | required | Absolute path to the image file |
| string | "auto" |
|
| list[string] | null | Language codes (Vision only) |
check_ocr_backends
Check available OCR backends on the system.
Output:
Backend Selection
Backend | Accuracy | Speed | Requirements |
| βββββ | ββββ | OwlOCR.app installed |
| ββββ | ββββ | None (macOS built-in) |
| Best available | - | Uses OwlOCR if available |
Running the Benchmark
Compare backends on your own PDF:
Project Structure
How It Works
OwlOCR Backend
Render PDF pages to PNG using
pypdfium2Copy images to OwlOCR sandbox:
~/Library/Containers/JonLuca-DeCaro.OwlOCR/Data/tmp/Run CLI:
/Applications/OwlOCR.app/Contents/MacOS/OwlOCR --cli --input <file>Combine results with page separators
Vision Framework Backend
Render PDF pages to PNG using
pypdfium2Load as
CIImagevia PyObjCCreate
VNRecognizeTextRequestwith accurate recognition levelProcess with
VNImageRequestHandlerSort results by position and combine
Troubleshooting
"OwlOCR.app not found"
Install OwlOCR from owlocr.com or use backend="vision".
File picker dialog appears
This happens when OwlOCR can't access files outside its sandbox. The MCP server handles this by copying files to the sandbox temp directory automatically.
Poor accuracy on specific languages
For Vision Framework, specify languages explicitly:
Supported language codes: ko-KR, en-US, ja-JP, zh-Hans, zh-Hant, etc.
License
MIT License - see LICENSE file.
Acknowledgments
OwlOCR by JonLuca DeCaro
Apple Vision Framework