get_ui_tree
Capture the current Android screen's UI hierarchy to analyze visible elements, their properties, text, bounds, and states for automation and debugging.
Instructions
Capture the current UI hierarchy from the Android screen. Returns a structured representation of all visible UI elements with their properties, text, bounds, and states. This is the primary way to understand what is currently on screen.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | Output format: "summary" (readable, compact) or "full" (complete JSON tree) | summary |
| device_id | No | Device serial number |
Implementation Reference
- The implementation of getUITree which executes adb shell uiautomator dump, reads the XML, and parses it into a UIElement tree.
export async function getUITree(deviceId?: string): Promise<UIElement> { const resolved = await deviceManager.resolveDeviceId(deviceId); // Dump UI hierarchy to device file await adbShell(['uiautomator', 'dump', '/sdcard/window_dump.xml'], resolved, 15000); // Read the XML content const catResult = await adbShell(['cat', '/sdcard/window_dump.xml'], resolved); const xmlContent = catResult.stdout; if (!xmlContent || xmlContent.includes('ERROR') || !xmlContent.includes('<hierarchy')) { throw new Error(`UIAutomator dump failed or returned empty content: ${xmlContent.substring(0, 200)}`); } // Parse XML const parser = new XMLParser({ ignoreAttributes: false, attributeNamePrefix: '@_', isArray: (name) => name === 'node', }); const parsed = parser.parse(xmlContent); const hierarchy = parsed?.hierarchy; if (!hierarchy) { throw new Error('Failed to parse UI hierarchy XML'); } // The root hierarchy node wraps all content const rootElement = parseNode(hierarchy); rootElement.className = 'hierarchy'; // Clean up temp file adbShell(['rm', '-f', '/sdcard/window_dump.xml'], resolved).catch(() => {}); log.info('UI tree captured', { deviceId: resolved }); return rootElement; } - src/controllers/ui-tools.ts:21-65 (registration)Registration of the get_ui_tree tool within the MCP server.
server.registerTool( 'get_ui_tree', { description: 'Capture the current UI hierarchy from the Android screen. Returns a structured representation of all visible UI elements with their properties, text, bounds, and states. This is the primary way to understand what is currently on screen.', inputSchema: { format: z.enum(['summary', 'full']).optional().default('summary').describe('Output format: "summary" (readable, compact) or "full" (complete JSON tree)'), device_id: z.string().optional().describe('Device serial number'), }, }, async ({ format, device_id }) => { return await metrics.measure('get_ui_tree', device_id || 'default', async () => { const tree = await getUITree(device_id); if (format === 'full') { return { content: [{ type: 'text' as const, text: JSON.stringify({ success: true, tree }, null, 2), }], }; } // Summary format — more AI-friendly const summary = summarizeTree(tree); const allElements = flattenTree(tree); const clickableCount = allElements.filter(e => e.clickable).length; const textElements = allElements.filter(e => e.text).length; return { content: [{ type: 'text' as const, text: JSON.stringify({ success: true, stats: { totalElements: allElements.length, clickableElements: clickableCount, textElements: textElements, }, uiTree: summary, }, null, 2), }], }; }); } );