Skip to main content
Glama
rahulkr
by rahulkr

get_clickable_elements

Identify all interactive screen elements with coordinates to determine what can be tapped during Android UI testing and debugging.

Instructions

Get all clickable/interactive elements on screen with their coordinates. Perfect for understanding what can be tapped.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
device_serialNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The handler function decorated with @mcp.tool() that implements the get_clickable_elements tool. It fetches the UI hierarchy XML, parses for clickable nodes using regex, extracts properties like text, content-desc, resource-id, bounds (with calculated center and size), and class name. Returns a list of dictionaries for each clickable element.
    @mcp.tool()
    def get_clickable_elements(device_serial: str | None = None) -> list[dict]:
        """
        Get all clickable/interactive elements on screen with their coordinates.
        Perfect for understanding what can be tapped.
        """
        xml = get_ui_hierarchy(device_serial)
        elements = []
        
        # Parse clickable elements
        pattern = r'<node[^>]*clickable="true"[^>]*>'
        for match in re.finditer(pattern, xml):
            node = match.group()
            
            element = {}
            
            # Extract text
            text_match = re.search(r'text="([^"]*)"', node)
            if text_match:
                element['text'] = text_match.group(1)
            
            # Extract content-desc
            desc_match = re.search(r'content-desc="([^"]*)"', node)
            if desc_match:
                element['content_desc'] = desc_match.group(1)
            
            # Extract resource-id
            id_match = re.search(r'resource-id="([^"]*)"', node)
            if id_match:
                element['resource_id'] = id_match.group(1)
            
            # Extract bounds and calculate center
            bounds_match = re.search(r'bounds="\[(\d+),(\d+)\]\[(\d+),(\d+)\]"', node)
            if bounds_match:
                x1, y1 = int(bounds_match.group(1)), int(bounds_match.group(2))
                x2, y2 = int(bounds_match.group(3)), int(bounds_match.group(4))
                element['bounds'] = {'x1': x1, 'y1': y1, 'x2': x2, 'y2': y2}
                element['center'] = {'x': (x1 + x2) // 2, 'y': (y1 + y2) // 2}
                element['size'] = {'width': x2 - x1, 'height': y2 - y1}
            
            # Extract class
            class_match = re.search(r'class="([^"]*)"', node)
            if class_match:
                element['class'] = class_match.group(1)
            
            if element:
                elements.append(element)
        
        return elements
  • Helper function get_ui_hierarchy used by get_clickable_elements to dump and retrieve the UI hierarchy XML from the device via uiautomator.
    @mcp.tool()
    def get_ui_hierarchy(device_serial: str | None = None) -> str:
        """
        Dump the complete UI hierarchy as XML.
        Shows all visible elements, their properties, bounds, and content descriptions.
        """
        run_adb(["shell", "uiautomator", "dump", "/sdcard/ui_dump.xml"], device_serial)
        output = run_adb(["shell", "cat", "/sdcard/ui_dump.xml"], device_serial)
        run_adb(["shell", "rm", "/sdcard/ui_dump.xml"], device_serial)
        return output
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the tool retrieves elements 'on screen' and includes coordinates, but fails to describe critical behaviors: whether it requires device connectivity, how it handles no elements, if it's read-only or has side effects, performance implications, or error conditions. This leaves significant gaps for safe and effective use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and well-structured: two short sentences that directly state the tool's function and a high-level use case. Every word earns its place, with no redundancy or fluff, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (interacting with device UI), no annotations, and an output schema (which likely handles return values), the description is incomplete. It covers the basic purpose but misses behavioral details, parameter context, and usage nuances. The output schema may help, but the description alone doesn't provide enough for reliable agent operation without additional inference.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 0% description coverage, and the tool description provides no parameter information. Since there's only one parameter and no schema details, the baseline is 4, but the description doesn't compensate by explaining the 'device_serial' parameter's role (e.g., optional device targeting). This results in a score of 3, as the description adds no value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get all clickable/interactive elements on screen with their coordinates.' It specifies the verb ('Get'), resource ('clickable/interactive elements'), and key output ('coordinates'). However, it doesn't explicitly differentiate from sibling tools like 'find_element_by_id' or 'get_all_text_on_screen', which reduces the score from a perfect 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal usage guidance with 'Perfect for understanding what can be tapped,' which implies a context for UI exploration or automation. However, it lacks explicit when-to-use instructions, alternatives (e.g., vs. 'find_element_by_text'), or exclusions. No specific scenarios or prerequisites are mentioned, leaving the agent with vague direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/rahulkr/r_adb_mcp_server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server