act
Automate multi-step browser workflows by describing actions in natural language. Parses and executes clicks, typing, navigation, and more in sequence.
Instructions
Execute multi-step browser actions from a natural language instruction. Parses and runs click, type, select, scroll, hover, navigate, and wait steps in sequence.
When to use: Automating a known multi-step flow (login, form fill, navigation) in one call. When NOT to use: Use interact for a single element action, or computer for raw coordinate input.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| tabId | Yes | Tab ID to execute on | |
| instruction | Yes | Natural language description of actions (e.g., "click login, type admin in username, click submit") | |
| context | No | Additional context (e.g., "on the login page") | |
| verify | No | Verify mode. boolean is legacy: true→"screenshot", false→"none". String enum returns a compact diff signal (AX-hash delta + pHash, ≤4KB). | |
| timeout | No | Max time in ms for entire sequence. Default: 30000 | |
| use_workflow_cache | No | Opt-in: try guarded structured workflow cache before legacy action cache. Default: false | |
| record_workflow_cache | No | Opt-in: record safe successful parsed sequences into the structured workflow cache. Default: false | |
| allow_risky_replay | No | Allow replay of workflow cache entries marked risky. Default: false | |
| workflow_debug | No | Include concise workflow cache accept/reject metadata in the response. Default: false | |
| returnAfterState | No | Optional chaining hint. When "ax" or "dom", the response includes a page snapshot of that mode captured after the post-action wait, removing the need for a follow-up read_page call. Default: "none". |