mouse_click
Click at screen coordinates, optionally guarded by target window identity and foreground. Supports double-click, triple-click, and coordinate conversion from screenshots using origin and scale.
Instructions
Click at screen coordinates. Normally pass windowTitle so the server auto-guards the click (verifies target identity, foreground, coordinate is inside the target rect) and returns post.perception without a confirmation screenshot. origin+scale from dotByDot=true screenshots are converted to screen coords before guarding. doubleClick:true for double-click; tripleClick:true for triple-click (selects a full line of text). Prefer click_element (UIA) for native apps, prefer browser_click for Chrome. Examples: mouse_click({windowTitle:'Notepad', x:200, y:150}) // guarded — post.perception.status='ok'. mouse_click({x:100, y:100}) // unguarded — post.perception.status='unguarded'. If a guard failure returns a suggestedFix, pass its fixId to approve the fix: mouse_click({fixId:'fix-...'}) // one-shot, expires in 15s. lensId is optional and only for advanced pinned-target workflows; omit it for normal use. Caveats: origin+scale are meaningful ONLY with dotByDot=true screenshot responses.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| x | Yes | X coordinate. Screen-absolute by default. When 'origin' is provided, treated as image-local (pixel position within the screenshot). | |
| y | Yes | Y coordinate. Screen-absolute by default. When 'origin' is provided, treated as image-local. | |
| origin | No | When set, (x,y) are image-local coords from a screenshot. Server converts to screen coords: screen_x = origin.x + x / (scale ?? 1), screen_y = origin.y + y / (scale ?? 1). Copy origin values directly from the screenshot response text. This eliminates manual coord math and prevents out-of-window clicks. | |
| scale | No | Scale factor from screenshot response (only when dotByDotMaxDimension caused a resize). Omit if the screenshot was 1:1. Only used when 'origin' is also provided. | |
| button | No | Mouse button to click | left |
| doubleClick | No | Whether to double-click | |
| tripleClick | No | Whether to triple-click (select a line of text). Takes precedence over doubleClick when both are true. | |
| narrate | No | Narration level. rich includes UIA or browser state diff when supported. | minimal |
| speed | No | Cursor movement speed in px/sec. 0 = instant. | |
| homing | No | Enable homing correction if the target window moved. | |
| windowTitle | No | Partial title of the target window. | |
| elementName | No | Name or label of the UI element. | |
| elementId | No | AutomationId of the UI element. | |
| hwnd | No | Direct window handle ID (takes precedence over windowTitle). Obtain from get_windows response (hwnd field). String type to avoid 64-bit precision issues. | |
| forceFocus | No | Bypass Windows foreground-stealing protection before focusing. | |
| trackFocus | No | Detect if focus was stolen after the action. | |
| settleMs | No | Milliseconds to wait before checking post-action state. | |
| lensId | No | Optional perception lens ID for advanced pinned-target workflows. When provided, guards are evaluated before clicking (safe.clickCoordinates, target.identityStable) and a perception envelope is attached to post.perception in the response. For normal use, omit lensId and pass windowTitle directly — Auto Perception handles tracking. | |
| fixId | No | One-shot fix approval ID. If a previous mouse_click returned a suggestedFix, pass that fixId here to approve it. The server revalidates the fix and executes with corrected args. fixId expires in 15 seconds and can only be used once. | |
| include | No | Optional response-shape opt-in. `['envelope']` returns the self-documenting envelope (`_version` / `data` / `as_of` / `confidence`). `['raw']` forces raw shape (overrides DESKTOP_TOUCH_ENVELOPE=1 server default). Default behaviour is raw shape (compat with existing clients). |