| apps | List applications with accessible UI elements. Returns application names visible in the accessibility tree.
Use these names to scope other tools (find, elements, screenshot). |
| windows | List all open windows. Returns window IDs, titles, sizes, and app names.
Use window IDs to scope find/elements queries or activate_window.
|
| find | Search for UI elements by name. Finds elements matching a text query, ranked by match quality.
Returns element IDs that you can use with click, set_value, etc.
Use the FULL visible text for best results (e.g. "Send Message"
not just "Send").
Args:
query: Text to search for (e.g. "Send Message", "Submit", "Search").
app: Scope to this application (e.g. "Firefox", "Slack").
window_id: Scope to this window.
role: Only match this role (e.g. "button", "text_field", "link").
states: Only match elements with ALL these states (e.g. ["enabled", "visible"]).
max_results: Maximum matches to return.
fields: Which fields to search -- ["name"], ["name", "value"], or ["name", "value", "description"].
source: "full" (default, merged native+web), "ax" (CDP accessibility tree only), "native" (platform only), or "dom" (live DOM).
|
| elements | Get UI elements from the accessibility tree. Returns a broad view of available elements. Use find() instead
when you know the element's name -- it is faster and ranked.
Args:
app: Scope to this application.
window_id: Scope to this window.
tree: If true, include parent/child hierarchy.
max_depth: Maximum tree depth (0 = immediate children only).
root_element: Start from this element ID (drill into a container).
max_elements: Maximum elements to return (prevents huge results).
role: Only include this role (e.g. "button", "text_field").
states: Only include elements with ALL these states.
named_only: If true, exclude elements with empty names.
sort_by: None (default, tree order) or "position" for reading order (top-to-bottom, left-to-right).
source: "full" (default, merged native+web), "ax" (CDP AX tree only), "native" (platform only), or "dom" (live DOM).
|
| get_element | Get a single element by its ID with full detail. Returns a fresh snapshot with current states, value, supported
actions, and description. Use this to inspect an element
before calling the ``action()`` tool — the actions list shows
exactly which raw action names are available.
Args:
element_id: The element ID (from find/elements results).
|
| screenshot | Capture the screen and return an image. With no arguments, captures the full desktop. Specify one
parameter to crop to a specific target.
Args:
app: Crop to this application's window.
window_id: Crop to this specific window.
element_id: Crop to this element's bounding box.
padding: Extra pixels around the crop region.
monitor: Capture only this monitor (0-indexed).
|
| click | Click an element by ID, or at screen coordinates. Pass ``element_id`` to click via the element's native
accessibility action (most reliable). Pass ``x`` and ``y``
to click directly at screen coordinates instead — useful
when clicking by ID triggers an unintended action (e.g.
opens a dropdown instead of focusing a text entry).
Every element shows its position as @(x,y) in listings.
Coordinate clicks always report OK even if nothing was hit —
verify the result with a screenshot or find().
Args:
element_id: The element ID to click.
x: Screen X coordinate (use with y instead of element_id).
y: Screen Y coordinate (use with x instead of element_id).
button: "left" (default) or "right".
double_click: If true, perform a double-click instead.
Cannot be combined with button="right".
|
| set_value | Set text content of an editable element. Args:
element_id: The element ID (a text field, combo box, etc.).
value: The text to write.
replace: If true, clear the field first and replace all content.
If false (default), insert at the current cursor position.
|
| set_numeric_value | Set the numeric value of a range element (slider, spinbox). Args:
element_id: The element ID (a slider, spin button, etc.).
value: The numeric value to set.
|
| focus | Move keyboard focus to an element. Args:
element_id: The element ID to focus.
|
| action | Perform a raw accessibility action by exact name. Use this when the convenience functions (click, focus, etc.)
do not cover what you need. Call ``get_element`` first to
see the element's actions list, then pass the exact name here.
Args:
element_id: The element ID.
action_name: Exact action name (e.g. "activate", "expand or collapse", "ShowMenu").
|
| type_text | Type text into the currently focused element. Simulates keyboard input. Focus a text field first with
click() or focus(), then type into it.
Special characters:
\n = Enter (line break), \t = Tab (next field),
\b = Backspace (delete previous character).
Args:
text: The text to type.
|
| press_key | Press a key or key combination. Single key: "enter", "tab", "escape", "f5", "backspace".
Combination: ["ctrl", "s"], ["ctrl", "shift", "p"], ["alt", "f4"].
Args:
keys: A single key name, or a list of keys for a combination
(all held together, then released in reverse order).
repeat: Number of times to press (default 1).
|
| mouse_move | Move the mouse cursor to an element or to screen coordinates. Use this before scroll() to scroll within a specific area.
Args:
element_id: The element ID to move the cursor to.
x: Screen X coordinate (use with y instead of element_id).
y: Screen Y coordinate (use with x instead of element_id).
|
| scroll | Scroll at the current cursor position. Move the cursor to the target area first with mouse_move(),
then call scroll().
Args:
direction: One of "up", "down", "left", "right".
amount: Number of scroll ticks (default 3).
|
| activate_window | Bring a window to the foreground. Use windows() to find the window ID first.
Args:
window_id: The window ID to activate.
|
| wait_for | Wait for elements to appear or disappear. Polls until matching elements are found (or gone) or timeout.
Use after actions that trigger UI changes.
Args:
element: Text to search for. Pass a single string (e.g.
"Submit") or a list of strings (e.g. ["Success", "Error"])
for multi-query mode. With mode="any", returns as soon
as any query matches. With mode="all", waits until every
query has matched.
app: Scope to this application.
window_id: Scope to this window.
role: Only match this role.
states: Only match elements with ALL these states.
fields: Which fields to search (default: ["name"]).
mode: "any" (return when any query matches) or "all"
(wait for all queries to match). Only meaningful when
element is a list.
timeout: Maximum seconds to wait (default 10).
source: "full" (default), "ax", "native", or "dom".
max_results: Maximum elements to return (default 5).
wait_for_new: If true, ignore elements already present -- wait for NEW ones.
gone: If true, wait for matching elements to DISAPPEAR instead.
|
| wait_for_app | Wait for an application to appear or disappear. Polls the application list until the app is found (or gone).
Use after launching or closing an application.
Args:
app: Application name to wait for (e.g. "Firefox", "Slack").
timeout: Maximum seconds to wait (default 10).
gone: If true, wait for the app to DISAPPEAR instead.
|
| wait_for_window | Wait for a window to appear or disappear. Polls the window list until a window with a matching title is
found (or gone). Use after actions that open or close windows.
Args:
title: Window title to search for (substring match).
app: Only look for windows in this application.
timeout: Maximum seconds to wait (default 10).
gone: If true, wait for the window to DISAPPEAR instead.
|