interact
Find page elements by natural language and perform click, hover, double-click, or type actions. Waits for DOM to settle after each interaction.
Instructions
Find element by natural language; click/hover/double_click it; wait for DOM settle; return state.
When to use: One described element action, with coordinate fallback for Shadow DOM/canvas/iframes. When NOT to use: Use act for multi-step flows; computer for general coordinate clicks.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| tabId | No | Tab ID to execute on | |
| taskId | No | Task id when using a task-scoped browser lane. | |
| laneId | No | Task-scoped browser lane id; validates tabId or defaults to the lane current target. | |
| query | No | Element to act on (natural language). Required when mode is "ref" (default). | |
| mode | No | Dispatch mode. "ref" (default) resolves the element by query; "coordinate" sends a CDP mouse event directly to pixel coordinates. | ref |
| coordinate | No | Pixel coordinates for coordinate mode. Required when mode is "coordinate". | |
| action | No | Action to perform. Default: click. Use type with value to enter text. | |
| value | No | Text to type when action is type. Supports vault://name in pilot mode. | |
| waitAfter | No | DOM settle wait in ms. Default: 500 | |
| returnFormat | No | Response content. Default: both | |
| verify | No | Verify mode. boolean is legacy: true→"screenshot", false→"none". String enum returns a compact diff signal (AX-hash delta + pHash, ≤4KB). | |
| returnAfterState | No | Optional chaining hint. When "ax" or "dom", the response includes a page snapshot of that mode captured after the post-action wait, removing the need for a follow-up read_page call. Default: "none". | |
| waitForMs | No | Poll timeout for element in ms. Max: 30000 | |
| pollInterval | No | Poll interval in ms. Default: 200 | |
| ref | No | Snapshot ref ID (from read_page refs map). When provided, skips AX re-resolution and clicks the element directly via its cached backendDOMNodeId. | |
| intent | No | Optional short label (≤120 chars) describing the user-facing goal of this action, e.g. "submit login form". Recorded in the task journal for observability. | |
| capture_artifact | No | When true, stage a replay artifact step for oc_skill_record after a successful click. Default false is a strict no-op. | |
| locatorFallback | No | Opt-in AI locator fallback extension point. Disabled by default; when enabled, provider candidates are validated before any action. |