Skip to main content
Glama

get_ui_elements

Read-only

Query visible UI elements in macOS applications using Accessibility API to retrieve roles, positions, sizes, and text content for desktop automation tasks.

Instructions

Query visible UI elements of an application via macOS Accessibility API. Returns element roles, titles, positions (screen coordinates), sizes, and states. May return text content from visible UI elements including sensitive data (passwords in non-secure fields, messages, etc.). Positions are in logical screen coordinates — pass directly to click tool. Coverage varies: native apps expose rich trees; Electron/web apps may expose partial trees; games/custom UIs may expose nothing. Requires Accessibility permission.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
appNoTarget application name. Default: frontmost app. Supports fuzzy matching.
roleNoFilter by AX role: "AXButton", "AXTextField", "AXStaticText", etc.
titleNoFilter by element title (substring, case-insensitive).
max_depthYesMax tree traversal depth (default: 5).

Implementation Reference

  • Handler function for the get_ui_elements tool.
    async function handleGetUIElements(
      args: Record<string, unknown>,
    ): Promise<CallToolResult> {
      const parsed = GetUIElementsInputSchema.parse(args);
    
      const helperArgs: Record<string, unknown> = {
        max_depth: parsed.max_depth,
      };
      if (parsed.app) {
        helperArgs.app = await resolveAppName(parsed.app);
      }
      if (parsed.role) helperArgs.role = parsed.role;
      if (parsed.title) helperArgs.title = parsed.title;
    
      const response = await runInputHelper("get_ui_elements", helperArgs);
    
      return {
        content: [
          {
            type: "text" as const,
            text: JSON.stringify(response, null, 2),
          },
        ],
      };
    }
  • Input schema definition for get_ui_elements tool.
    const GetUIElementsInputSchema = z.object({
      app: z
        .string()
        .max(1_000)
        .optional()
        .describe(
          "Target application name. Default: frontmost app. Supports fuzzy matching.",
        ),
      role: z
        .string()
        .max(200)
        .optional()
        .describe(
          'Filter by AX role: "AXButton", "AXTextField", "AXStaticText", etc.',
        ),
      title: z
        .string()
        .max(1_000)
        .optional()
        .describe("Filter by element title (substring, case-insensitive)."),
      max_depth: z
        .number()
        .int()
        .min(1)
        .max(10)
        .default(5)
        .describe("Max tree traversal depth (default: 5)."),
    });
  • Tool definition registration for get_ui_elements.
    export const accessibilityToolDefinitions: Tool[] = [
      {
        name: "get_ui_elements",
        description:
          "Query visible UI elements of an application via macOS Accessibility API. " +
          "Returns element roles, titles, positions (screen coordinates), sizes, and states. " +
          "May return text content from visible UI elements including sensitive data (passwords in non-secure fields, messages, etc.). " +
          "Positions are in logical screen coordinates — pass directly to click tool. " +
          "Coverage varies: native apps expose rich trees; Electron/web apps may expose partial trees; " +
          "games/custom UIs may expose nothing. Requires Accessibility permission.",
        inputSchema: zodToToolInputSchema(GetUIElementsInputSchema),
        annotations: {
          readOnlyHint: true,
          destructiveHint: false,
        },
      },
    ];
  • Handler dispatcher registration for get_ui_elements.
    export const accessibilityToolHandlers: Record<
      string,
      (args: Record<string, unknown>) => Promise<CallToolResult>
    > = {
      get_ui_elements: (args) => enqueue(() => handleGetUIElements(args)),
    };
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and destructiveHint=false, which the description aligns with by describing a query operation. The description adds valuable context beyond annotations: security implications ('May return text content from visible UI elements including sensitive data'), coordinate system details ('Positions are in logical screen coordinates — pass directly to click tool'), and coverage limitations across different app types.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with zero wasted sentences. It front-loads the core purpose, then sequentially covers output details, security notes, coordinate usage, coverage variability, and permission requirements—all in a compact, logical flow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (UI element querying with security and coverage nuances), the description is complete despite no output schema. It explains what data is returned (roles, titles, positions, sizes, states, text content), how to use outputs (coordinates for 'click' tool), limitations, and prerequisites. No critical gaps remain for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters. The description does not add any parameter-specific information beyond what the schema provides, such as explaining 'app' fuzzy matching details or 'role' filter examples. Baseline 3 is appropriate when the schema handles parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Query visible UI elements') and resource ('via macOS Accessibility API'), with detailed output information. It distinguishes itself from siblings like 'screenshot' or 'list_windows' by focusing on UI element properties rather than screenshots or window lists.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('Query visible UI elements') and when not to use ('Coverage varies: native apps expose rich trees; Electron/web apps may expose partial trees; games/custom UIs may expose nothing'). It also mentions prerequisites ('Requires Accessibility permission') and implies alternatives like 'screenshot' for visual capture instead of element data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/antbotlab/mac-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server