ocr_screenshot
Capture screenshots and extract text with tap coordinates using OCR to locate and interact with UI elements across iOS and Android platforms.
Instructions
RECOMMENDED: Use this tool FIRST when you need to find and tap UI elements. Takes a screenshot and extracts all visible text with tap-ready coordinates using OCR. ADVANTAGES over accessibility trees: (1) Works on ANY visible text regardless of accessibility labels, (2) Returns ready-to-use tapX/tapY coordinates - no conversion needed, (3) Faster than parsing accessibility hierarchies, (4) Works consistently across iOS and Android. USE THIS FOR: Finding buttons, labels, menu items, tab bars, or any text you need to tap. Simply find the text in the results and use its tapX/tapY with the tap command.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| platform | Yes | Platform to capture screenshot from | |
| deviceId | No | Optional device ID (Android) or UDID (iOS). Uses first available device if not specified. |