screen_ocr
Extract text from screen with Apple Vision OCR, returning pixel coordinates for each text element to enable clicking on text.
Instructions
OCR the screen using Apple Vision. Returns every text element with pixel coordinates (x, y, centerX, centerY). Use centerX/centerY with click_at to click on any text.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| app | No | App to OCR. Omit for full screen. | |
| min_confidence | No | Filter out OCR matches below this confidence (0.0-1.0). | |
| max_elements | No | Max OCR elements to return (for smaller/faster responses). | |
| compact | No | Return compact OCR objects (text + click coordinates + confidence). | |
| include_bounds | No | Include bounding boxes in results (default true). |