mobile_list_elements_on_screen
Retrieve on-screen elements, their coordinates, and display text or accessibility labels in real-time for mobile automation, without caching results. Ensures accurate interaction with iOS and Android applications.
Instructions
List elements on screen and their coordinates, with display text or accessibility label. Do not cache this result.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| noParams | Yes |
Implementation Reference
- src/server.ts:377-412 (registration)Registration of the 'mobile_list_elements_on_screen' tool. Includes Zod input schema for 'device' parameter and the handler function that retrieves elements using the Robot interface and formats them into a JSON string for output.tool( "mobile_list_elements_on_screen", "List elements on screen and their coordinates, with display text or accessibility label. Do not cache this result.", { device: z.string().describe("The device identifier to use. Use mobile_list_available_devices to find which devices are available to you.") }, async ({ device }) => { const robot = getRobotFromDevice(device); const elements = await robot.getElementsOnScreen(); const result = elements.map(element => { const out: any = { type: element.type, text: element.text, label: element.label, name: element.name, value: element.value, identifier: element.identifier, coordinates: { x: element.rect.x, y: element.rect.y, width: element.rect.width, height: element.rect.height, }, }; if (element.focused) { out.focused = true; } return out; }); return `Found these elements on screen: ${JSON.stringify(result)}`; } );
- src/robot.ts:26-37 (schema)Type definition for ScreenElement, which structures the data returned by the tool's output.export interface ScreenElement { type: string; label?: string; text?: string; name?: string; value?: string; identifier?: string; rect: ScreenElementRect; // currently only on android tv focused?: boolean; }
- src/android.ts:350-355 (handler)Handler implementation in AndroidRobot: parses UI hierarchy from uiautomator dump XML.public async getElementsOnScreen(): Promise<ScreenElement[]> { const parsedXml = await this.getUiAutomatorXml(); const hierarchy = parsedXml.hierarchy; const elements = this.collectElements(hierarchy.node); return elements; }
- src/webdriver-agent.ts:281-284 (handler)Handler implementation in WebDriverAgent for iOS: fetches page source tree and filters interactive elements.public async getElementsOnScreen(): Promise<ScreenElement[]> { const source = await this.getPageSource(); return this.filterSourceElements(source.value); }
- src/android.ts:311-348 (helper)Recursive helper function in AndroidRobot to traverse UI XML and collect relevant ScreenElement instances.private collectElements(node: UiAutomatorXmlNode): ScreenElement[] { const elements: Array<ScreenElement> = []; if (node.node) { if (Array.isArray(node.node)) { for (const childNode of node.node) { elements.push(...this.collectElements(childNode)); } } else { elements.push(...this.collectElements(node.node)); } } if (node.text || node["content-desc"] || node.hint) { const element: ScreenElement = { type: node.class || "text", text: node.text, label: node["content-desc"] || node.hint || "", rect: this.getScreenElementRect(node), }; if (node.focused === "true") { // only provide it if it's true, otherwise don't confuse llm element.focused = true; } const resourceId = node["resource-id"]; if (resourceId !== null && resourceId !== "") { element.identifier = resourceId; } if (element.rect.width > 0 && element.rect.height > 0) { elements.push(element); } } return elements; }