computer
Perform mouse clicks, keyboard input, scrolling, and screenshots on a browser tab using pixel coordinates or element references. Supports precise automation of user interactions.
Instructions
Mouse, keyboard, and screenshot actions on a tab. Supports click, type, scroll, key, hover, and screenshot by pixel coordinate or element ref.
When to use: Precise coordinate-based input, screenshots, or keyboard shortcuts. When NOT to use: Use interact for natural-language element actions, or act for multi-step sequences.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| tabId | Yes | Tab ID | |
| action | Yes | Action to perform | |
| coordinate | No | [x, y] for click/scroll actions | |
| text | No | Text to type or key to press | |
| duration | No | Wait duration in seconds | |
| scroll_direction | No | Scroll direction | |
| scroll_amount | No | Scroll wheel ticks. Default: 3 | |
| ref | No | Element ref or backendNodeId | |
| screenshotQuality | No | Screenshot quality. low: reduced resolution and quality for smaller payload. | |
| screenshotFormat | No | Only for action "screenshot". Image format for the returned base64. Default: "webp" (smallest payload). Use "png" for clients that cannot decode webp inline (e.g. some MCP UIs), at the cost of larger payloads. "screenshotQuality" is ignored for png (PNG is lossless). | |
| includeUserAgentShadowDOM | No | Include user-agent shadow DOM in hit detection. Default: false | |
| force | No | Only for action "screenshot". Force full screenshot, bypassing adaptive degradation. Default: false. | |
| returnAfterState | No | Optional chaining hint. When "ax" or "dom", the response includes a page snapshot of that mode captured after the post-action wait, removing the need for a follow-up read_page call. Default: "none". |