parse_screen
Identify on-screen UI elements with attributes like type, label, and interactivity, and save an annotated screenshot for analysis.
Instructions
Detect on-screen elements (id, type, label, interactive, pixel-center) for an instance (backend = GLOVEBOX_VISION). Saves a numbered image to /tmp/glovebox_annotated_.png.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| instance | No | ||
| box_threshold | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |