# OwlOCR MCP
[](https://www.python.org/downloads/)
[](https://www.apple.com/macos/)
[](https://opensource.org/licenses/MIT)
MCP (Model Context Protocol) server for PDF and image OCR on macOS. Supports two backends:
- **OwlOCR CLI** - Higher accuracy (recommended)
- **Vision Framework** - No external dependencies
## Features
- π **PDF OCR** - Extract text from PDF files page by page with separators
- πΌοΈ **Image OCR** - Extract text from PNG, JPEG, and other image formats
- π **Multi-language** - Korean + English by default (configurable)
- π **Dual Backend** - Auto-selects OwlOCR if available, falls back to Vision Framework
- β‘ **Async** - Non-blocking execution for MCP clients
## Benchmark Results
Tested on a 4-page Korean theological document with Hebrew text:
| Metric | Vision Framework | OwlOCR CLI |
|--------|------------------|------------|
| **Time** | 9.87s | 9.30s |
| **Time/Page** | 2.47s | 2.33s |
| **Word Accuracy** | 85.62% | **91.79%** |
| **Character Accuracy** | 94.46% | **95.07%** |
**Winner: OwlOCR CLI** - Faster and more accurate.
## Requirements
- **macOS** (uses Apple Vision Framework / OwlOCR.app)
- **Python 3.11+**
- **[OwlOCR.app](https://owlocr.com)** (optional, for better accuracy)
## Installation
### Using uv (recommended)
```bash
git clone https://github.com/yourusername/owlocr-mcp.git
cd owlocr-mcp
uv sync
```
### Using pip
```bash
git clone https://github.com/yourusername/owlocr-mcp.git
cd owlocr-mcp
pip install -e .
```
## MCP Client Configuration
### Claude Desktop
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"owlocr": {
"command": "uv",
"args": ["run", "--directory", "/path/to/owlocr-mcp", "owlocr-mcp"]
}
}
}
```
### Generic MCP Client
```json
{
"mcpServers": {
"owlocr": {
"command": "/path/to/owlocr-mcp/.venv/bin/python",
"args": ["-m", "owlocr_mcp.server"]
}
}
}
```
## Available Tools
### `ocr_pdf_to_text`
Extract text from a PDF file.
**Parameters:**
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `pdf_path` | string | required | Absolute path to the PDF file |
| `pages` | list[int] | null | Page numbers to process (1-based). If null, all pages |
| `dpi` | int | 200 | Resolution for rendering. Higher = better quality but slower |
| `backend` | string | "auto" | `"auto"`, `"owlocr"`, or `"vision"` |
| `languages` | list[string] | null | Language codes (Vision only). Default: `["ko-KR", "en-US"]` |
**Example:**
```
Extract text from /Users/me/document.pdf using OwlOCR
```
**Output:**
```
첫 λ²μ§Έ νμ΄μ§ λ΄μ©...
===== Page 2 =====
λ λ²μ§Έ νμ΄μ§ λ΄μ©...
--- OCR Complete: 2 page(s) processed using OwlOCR CLI ---
```
### `ocr_image_to_text`
Extract text from an image file.
**Parameters:**
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `image_path` | string | required | Absolute path to the image file |
| `backend` | string | "auto" | `"auto"`, `"owlocr"`, or `"vision"` |
| `languages` | list[string] | null | Language codes (Vision only) |
### `check_ocr_backends`
Check available OCR backends on the system.
**Output:**
```
OCR Backend Status:
β
Vision Framework: Available (macOS built-in)
β
OwlOCR CLI: Available (/Applications/OwlOCR.app)
Recommendation: Use backend='owlocr' for best accuracy
```
## Backend Selection
| Backend | Accuracy | Speed | Requirements |
|---------|----------|-------|--------------|
| `owlocr` | βββββ | ββββ | OwlOCR.app installed |
| `vision` | ββββ | ββββ | None (macOS built-in) |
| `auto` | Best available | - | Uses OwlOCR if available |
## Running the Benchmark
Compare backends on your own PDF:
```bash
# Both backends
uv run python benchmark.py /path/to/your.pdf
# With accuracy comparison (requires ground truth)
uv run python benchmark.py /path/to/your.pdf --show-text
# Specific backend only
uv run python benchmark.py /path/to/your.pdf --method owlocr
uv run python benchmark.py /path/to/your.pdf --method vision
```
## Project Structure
```
owlocr-mcp/
βββ src/owlocr_mcp/
β βββ __init__.py
β βββ server.py # MCP server with tools
β βββ ocr.py # Vision Framework backend
β βββ ocr_owlocr.py # OwlOCR CLI backend
β βββ pdf.py # PDF processing utilities
βββ benchmark.py # Performance comparison script
βββ pyproject.toml
βββ README.md
```
## How It Works
### OwlOCR Backend
1. Render PDF pages to PNG using `pypdfium2`
2. Copy images to OwlOCR sandbox: `~/Library/Containers/JonLuca-DeCaro.OwlOCR/Data/tmp/`
3. Run CLI: `/Applications/OwlOCR.app/Contents/MacOS/OwlOCR --cli --input <file>`
4. Combine results with page separators
### Vision Framework Backend
1. Render PDF pages to PNG using `pypdfium2`
2. Load as `CIImage` via PyObjC
3. Create `VNRecognizeTextRequest` with accurate recognition level
4. Process with `VNImageRequestHandler`
5. Sort results by position and combine
## Troubleshooting
### "OwlOCR.app not found"
Install OwlOCR from [owlocr.com](https://owlocr.com) or use `backend="vision"`.
### File picker dialog appears
This happens when OwlOCR can't access files outside its sandbox. The MCP server handles this by copying files to the sandbox temp directory automatically.
### Poor accuracy on specific languages
For Vision Framework, specify languages explicitly:
```python
ocr_pdf_to_text(pdf_path, languages=["ja-JP", "en-US"])
```
Supported language codes: `ko-KR`, `en-US`, `ja-JP`, `zh-Hans`, `zh-Hant`, etc.
## License
MIT License - see [LICENSE](LICENSE) file.
## Acknowledgments
- [OwlOCR](https://owlocr.com) by JonLuca DeCaro
- [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk)
- Apple Vision Framework