vision_find
Analyze a webpage screenshot to identify and number interactive elements like buttons and links, returning an annotated image for precise element targeting.
Instructions
Find elements using vision-based screenshot analysis. Returns annotated screenshot with numbered elements.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| tabId | Yes | Tab ID to analyze | |
| instruction | No | Optional hint about what to look for (for future use) | |
| showGrid | No | Overlay coordinate grid on screenshot. Default: false | |
| showBoundingBoxes | No | Show bounding boxes around elements. Default: true | |
| interactiveOnly | No | Only show interactive elements (buttons, links, inputs). Default: true | |
| format | No | Output format: legacy text+image, provider-neutral snapshot JSON, or both. Default: legacy. | |
| includeImage | No | Include annotated image output. Defaults to true for legacy/both and false for snapshot. | |
| occlusionFilter | No | When true, drops elements whose center is covered by another element via elementFromPoint. Defaults to false to preserve today's output; set to true for stricter accuracy. | |
| iframes | No | Frame traversal mode. "all" still respects same-origin policy; cross-origin frames are listed in iframes.skipped. | none |
| mode | No | viewport: today's single-shot capture. tiled: full document scrolled in viewport-tall steps; returns per-tile screenshots and a unified element map. | viewport |
| recordTrajectory | No | Opt-in visual trajectory artifact capture for this call. Also enabled by OPENCHROME_VISUAL_TRAJECTORY=1. |