Skip to main content
Glama

detect_elements_visually

Identify clickable and focusable UI elements on Android screens using screenshot analysis and UI tree data to locate interactive components for automation.

Instructions

Detect interactive elements on the screen using a combination of screenshots and UI tree analysis. Returns a list of all clickable/focusable elements with their coordinates and descriptions. Use this when you need to find elements to interact with.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
device_idNoDevice serial number

Implementation Reference

  • The implementation of detectElementsVisually, which captures a screenshot and parses the UI tree to identify interactive elements.
    export async function detectElementsVisually(deviceId?: string): Promise<{
      screenshot: ScreenshotResult;
      uiTree: string;
      interactiveElements: Array<{
        description: string;
        centerX: number;
        centerY: number;
        bounds: string;
      }>;
    }> {
      const resolved = await deviceManager.resolveDeviceId(deviceId);
    
      const screenshot = await captureScreenshot(resolved);
    
      let uiTree = '';
      const interactiveElements: Array<{
        description: string;
        centerX: number;
        centerY: number;
        bounds: string;
      }> = [];
    
      try {
        const tree = await getUITree(resolved);
        uiTree = summarizeTree(tree);
    
        // Extract interactive elements
        const { flattenTree } = await import('../uiautomator/ui-tree-parser.js');
        const allElements = flattenTree(tree);
    
        for (const el of allElements) {
          if ((el.clickable || el.focusable) && el.bounds.width > 0 && el.bounds.height > 0) {
            const desc = el.text || el.contentDesc || el.resourceId || el.className.split('.').pop() || 'unknown';
            interactiveElements.push({
              description: desc,
              centerX: el.bounds.centerX,
              centerY: el.bounds.centerY,
              bounds: `[${el.bounds.left},${el.bounds.top}][${el.bounds.right},${el.bounds.bottom}]`,
            });
          }
        }
      } catch (error) {
        log.warn('UI tree unavailable for visual detection', {
          error: error instanceof Error ? error.message : String(error),
        });
      }
    
      log.info('Visual detection completed', {
        deviceId: resolved,
        interactiveCount: interactiveElements.length,
      });
    
      return { screenshot, uiTree, interactiveElements };
    }
  • The MCP tool registration for detect_elements_visually.
    server.registerTool(
      'detect_elements_visually',
      {
        description: 'Detect interactive elements on the screen using a combination of screenshots and UI tree analysis. Returns a list of all clickable/focusable elements with their coordinates and descriptions. Use this when you need to find elements to interact with.',
        inputSchema: {
          device_id: z.string().optional().describe('Device serial number'),
        },
      },
      async ({ device_id }) => {
        return await metrics.measure('detect_elements_visually', device_id || 'default', async () => {
          const detection = await detectElementsVisually(device_id);
    
          const content: Array<{ type: 'text'; text: string } | { type: 'image'; data: string; mimeType: string }> = [];
    
          content.push({
            type: 'image' as const,
            data: detection.screenshot.base64,
            mimeType: 'image/png',
          });
    
          content.push({
            type: 'text' as const,
            text: JSON.stringify({
              success: true,
              interactiveElements: detection.interactiveElements,
              elementCount: detection.interactiveElements.length,
              uiTree: detection.uiTree,
            }, null, 2),
          });
    
          return { content };
        });
      }
    );

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/divineDev-dotcom/android_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server