mouse_click
Click at specific screen coordinates or convert image-based coordinates to screen positions for automated desktop interaction when pixel-level precision is required.
Instructions
Click at screen-absolute coordinates (virtual screen pixels), or pass origin+scale from a dotByDot=true screenshot response to let the server convert image-local coords automatically: screen = origin + (x,y) / (scale ?? 1). doubleClick:true for double-click; tripleClick:true for triple-click (selects a full line of text) — if both are set, tripleClick wins. windowTitle optionally focuses the window first (for pinned-dock setups). Prefer click_element (UIA) for stable text-addressed clicking in native apps. Prefer browser_click_element for Chrome. Use mouse_click only when pixel coords are the only available option. Pass lensId (from perception_register) to run safety guards (identity stable, foreground, coordinates in rect) before clicking and receive post.perception state feedback without a screenshot. Caveats: origin+scale are meaningful ONLY with dotByDot=true screenshot responses — applying them to scaled detail='text'/'meta' output lands clicks in the wrong positions.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| x | Yes | X coordinate. Screen-absolute by default. When 'origin' is provided, treated as image-local (pixel position within the screenshot). | |
| y | Yes | Y coordinate. Screen-absolute by default. When 'origin' is provided, treated as image-local. | |
| origin | No | When set, (x,y) are image-local coords from a screenshot. Server converts to screen coords: screen_x = origin.x + x / (scale ?? 1), screen_y = origin.y + y / (scale ?? 1). Copy origin values directly from the screenshot response text. This eliminates manual coord math and prevents out-of-window clicks. | |
| scale | No | Scale factor from screenshot response (only when dotByDotMaxDimension caused a resize). Omit if the screenshot was 1:1. Only used when 'origin' is also provided. | |
| button | No | Mouse button to click | left |
| doubleClick | No | Whether to double-click | |
| tripleClick | No | Whether to triple-click (select a line of text). Takes precedence over doubleClick when both are true. | |
| narrate | No | Narration level. rich includes UIA or browser state diff when supported. | minimal |
| speed | No | Cursor movement speed in px/sec. 0 = instant. | |
| homing | No | Enable homing correction if the target window moved. | |
| windowTitle | No | Partial title of the target window. | |
| elementName | No | Name or label of the UI element. | |
| elementId | No | AutomationId of the UI element. | |
| forceFocus | No | Bypass Windows foreground-stealing protection before focusing. | |
| trackFocus | No | Detect if focus was stolen after the action. | |
| settleMs | No | Milliseconds to wait before checking post-action state. | |
| lensId | No | Optional perception lens ID from perception_register. When provided, guards are evaluated before clicking (safe.clickCoordinates, target.identityStable) and a perception envelope is attached to post.perception in the response. |