ocr_image
Extract text from local images and PDFs with Apple Vision OCR, returning plain text or structured blocks with bounding boxes. Works offline on macOS.
Instructions
Extract text from a local image or PDF file using Apple Vision OCR (offline, no API key needed).
USE WHEN: The user provides a local file path to an image, screenshot, scanned document, or PDF and wants to extract the text from it. DO NOT USE for: images hosted on URLs (download first), non-macOS systems, or when the user wants face/barcode detection (use the dedicated tools).
Supported formats: jpg, jpeg, png, heic, heif, tiff, bmp, pdf
Parameters: path — absolute or relative path to the image/PDF file format — "text" returns a single plain-text string (default) "blocks" returns JSON { pages: [{ page, paragraphs, textBlocks }] } with reading-order paragraphs and per-block bounding boxes. Each textBlock carries lineId, paragraphId, confidence, and page-local bbox (0–1). PDFs return one entry per page. start_page — PDFs only — 1-based index of the first page to OCR (default 1). Ignored for images. start_page past the end returns an empty result. max_pages — PDFs only — maximum number of pages to OCR from start_page (default: all). Ignored for images.
Returns: extracted text as a string (format="text") or a JSON document with per-page paragraphs and text blocks (format="blocks").
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | Absolute or relative path to the image or PDF file | |
| format | No | "text" for plain string output, "blocks" for per-page paragraphs and text blocks | text |
| start_page | No | PDFs only — 1-based first page to OCR. Ignored for images. | |
| max_pages | No | PDFs only — maximum number of pages to OCR. Ignored for images. |