analyze_screen
Capture a screen region, run NPU UI detection and OCR in parallel, and return an ordered list of interactive elements with their visible text, enabling agents to understand and act on the screen.
Instructions
Capture a screen region, run NPU YOLO UI detection and system OCR in parallel, then spatially fuse the results. Returns an ordered list of interactive elements (buttons, fields, headings, …) each annotated with the visible text inside them — ideal for agents that need to understand and act on the current screen. region=[x1,y1,x2,y2] in screen coords; omit for full screen.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| region | No | [x1, y1, x2, y2] | |
| min_confidence | No | Minimum confidence threshold (default 0.30) |