screenshot_ocr
Extract text and click coordinates from Windows applications when standard UI automation fails, enabling interaction with custom-drawn interfaces, game overlays, and PDF viewers.
Instructions
Run Windows OCR on a window and return word-level text with screen-pixel clickAt coordinates — use when UIA returns no actionable elements (WinUI3 custom-drawn UIs, game overlays, PDF viewers). Note: screenshot(detail='text') auto-falls back to OCR when UIA is sparse (ocrFallback='auto' default) — call screenshot_ocr directly only when forcing OCR unconditionally. language: BCP-47 tag (default 'ja'). Caveats: First call may take ~1s (WinRT cold-start). Requires the matching Windows OCR language pack installed.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| windowTitle | Yes | Title (partial match) of the window to OCR | |
| language | No | BCP-47 language tag (e.g. 'ja', 'en-US') | ja |
| region | No | Optional sub-region in window-local coordinates |