Skip to main content
Glama

scroll_element

Scrolls a specific UI element in a chosen direction for precise pixel distance control, enabling targeted navigation within Android app interfaces.

Instructions

Scroll a specific UI element in the given direction for a specified distance.

Args:
    element: Either an integer (element index from annotated screenshot) or string (element name)
    direction: Direction to scroll - 'up', 'down', 'left', 'right'
    distance: Distance to scroll in pixels (default: 200)
    duration: Duration of scroll gesture in milliseconds (default: 300)
    device_id: Optional device ID to target specific device/emulator

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
device_idNo
directionYes
distanceNo
durationNo
elementYes

Implementation Reference

  • The handler function for the 'scroll_element' tool. It is decorated with @mcp.tool(), which registers the tool in the FastMCP server and automatically generates the input schema from the function signature and docstring. The function locates the target UI element by index or name, calculates scroll gesture coordinates within the element's bounds, and executes a swipe gesture using ADB to simulate scrolling.
    @mcp.tool()
    async def scroll_element(element, direction: str, distance: int = 200, duration: int = 300, device_id: str = None) -> dict:
        """Scroll a specific UI element in the given direction for a specified distance.
    
        Args:
            element: Either an integer (element index from annotated screenshot) or string (element name)
            direction: Direction to scroll - 'up', 'down', 'left', 'right'
            distance: Distance to scroll in pixels (default: 200)
            duration: Duration of scroll gesture in milliseconds (default: 300)
            device_id: Optional device ID to target specific device/emulator
        """
        try:
            # Validate direction parameter
            direction = direction.lower()
            if direction not in ['up', 'down', 'left', 'right']:
                return {
                    "success": False,
                    "error": f'Invalid direction: {direction}. Use "up", "down", "left", or "right"',
                    "element": element
                }
    
            # Validate distance parameter
            if distance <= 0:
                return {
                    "success": False,
                    "error": "Distance must be a positive integer",
                    "element": element,
                    "distance": distance
                }
    
            # Get UI elements
            elements = get_ui_elements(device_id)
    
            if not elements:
                return {
                    "success": False,
                    "error": "No UI elements found on screen",
                    "element": element
                }
    
            # Find target element by index or name
            target_element = None
            if isinstance(element, int):
                # Find by index
                if 0 <= element < len(elements):
                    target_element = elements[element]
                else:
                    return {
                        "success": False,
                        "error": f"Element index {element} is out of range (0-{len(elements)-1})",
                        "element": element,
                        "available_count": len(elements)
                    }
            else:
                # Find by name
                element_str = str(element)
                for elem in elements:
                    if elem.name == element_str:
                        target_element = elem
                        break
    
                if not target_element:
                    return {
                        "success": False,
                        "error": f"Element with name '{element_str}' not found",
                        "element": element,
                        "available_elements": [elem.name for elem in elements[:10]]  # Show first 10 for reference
                    }
    
            # Check if element is likely scrollable
            scrollable_classes = [
                "android.widget.ScrollView",
                "android.widget.HorizontalScrollView",
                "android.support.v7.widget.RecyclerView",
                "androidx.recyclerview.widget.RecyclerView",
                "android.widget.ListView",
                "android.widget.GridView",
                "androidx.viewpager.widget.ViewPager",
                "androidx.viewpager2.widget.ViewPager2"
            ]
    
            is_scrollable = target_element.class_name in scrollable_classes
    
            # Get element bounds
            bbox = target_element.bounding_box
            element_width = bbox.x2 - bbox.x1
            element_height = bbox.y2 - bbox.y1
    
            # Calculate center point of the element
            center_x = bbox.x1 + element_width // 2
            center_y = bbox.y1 + element_height // 2
    
            # Calculate scroll coordinates within element boundaries with margins
            margin = 20  # Keep scroll gesture away from element edges
    
            if direction == 'up':
                # Scroll up: start lower in element, end higher
                start_y = min(center_y + distance // 2, bbox.y2 - margin)
                end_y = max(center_y - distance // 2, bbox.y1 + margin)
                start_x = end_x = center_x
            elif direction == 'down':
                # Scroll down: start higher in element, end lower
                start_y = max(center_y - distance // 2, bbox.y1 + margin)
                end_y = min(center_y + distance // 2, bbox.y2 - margin)
                start_x = end_x = center_x
            elif direction == 'left':
                # Scroll left: start right in element, end left
                start_x = min(center_x + distance // 2, bbox.x2 - margin)
                end_x = max(center_x - distance // 2, bbox.x1 + margin)
                start_y = end_y = center_y
            else:  # right
                # Scroll right: start left in element, end right
                start_x = max(center_x - distance // 2, bbox.x1 + margin)
                end_x = min(center_x + distance // 2, bbox.x2 - margin)
                start_y = end_y = center_y
    
            # Ensure coordinates are within element bounds
            start_x = max(bbox.x1 + margin, min(start_x, bbox.x2 - margin))
            end_x = max(bbox.x1 + margin, min(end_x, bbox.x2 - margin))
            start_y = max(bbox.y1 + margin, min(start_y, bbox.y2 - margin))
            end_y = max(bbox.y1 + margin, min(end_y, bbox.y2 - margin))
    
            # Build adb swipe command to perform scroll gesture
            cmd = ['adb']
            if device_id:
                cmd.extend(['-s', device_id])
            cmd.extend(['shell', 'input', 'swipe', str(start_x), str(start_y), str(end_x), str(end_y), str(duration)])
    
            # Execute scroll command
            subprocess.run(cmd, capture_output=True, text=True, check=True)
    
            return {
                "success": True,
                "message": f"Successfully scrolled element '{target_element.name}' {direction} for {distance} pixels in {duration}ms",
                "element": {
                    "name": target_element.name,
                    "class_name": target_element.class_name,
                    "bounding_box": {
                        "x1": bbox.x1, "y1": bbox.y1,
                        "x2": bbox.x2, "y2": bbox.y2
                    }
                },
                "scroll_coordinates": {
                    "start": {"x": start_x, "y": start_y},
                    "end": {"x": end_x, "y": end_y}
                },
                "direction": direction,
                "distance": distance,
                "duration": duration,
                "is_scrollable_element": is_scrollable,
                "action_type": f"element_scroll_{direction}",
                "device_id": device_id or "default"
            }
    
        except ConnectionError as e:
            return {
                "success": False,
                "error": f"Device connection failed: {e}",
                "element": element,
                "action_type": "element_scroll"
            }
        except subprocess.CalledProcessError as e:
            return {
                "success": False,
                "error": f"Failed to execute scroll: {e}",
                "stderr": e.stderr if e.stderr else "",
                "element": element,
                "action_type": "element_scroll"
            }
        except FileNotFoundError:
            return {
                "success": False,
                "error": "ADB not found. Please ensure Android SDK is installed and adb is in PATH.",
                "element": element,
                "action_type": "element_scroll"
            }
        except Exception as e:
            return {
                "success": False,
                "error": f"Unexpected error: {e}",
                "element": element,
                "action_type": "element_scroll"
            }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states what the tool does, not behavioral traits like whether scrolling is smooth/animated, if it waits for completion, error conditions, or performance implications. It mentions default values but lacks operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Perfectly structured with a clear purpose statement followed by a well-organized parameter breakdown. Every sentence earns its place, and the information is front-loaded with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 5-parameter tool with no annotations and no output schema, the description adequately covers parameters but lacks behavioral context (how scrolling works, what happens on completion, error handling). It's minimally viable but has clear gaps in operational transparency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides clear semantics for all 5 parameters: 'element' (index or name), 'direction' (with enum values), 'distance' (pixels with default), 'duration' (milliseconds with default), and 'device_id' (optional targeting). This adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('scroll'), target ('UI element'), and scope ('in the given direction for a specified distance'). It distinguishes from sibling tools like 'swipe' (which moves across screen) and 'press' (which taps) by focusing on element-specific scrolling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for scrolling UI elements but doesn't explicitly state when to use this vs. alternatives like 'swipe' (which might scroll the entire screen) or provide exclusions. It mentions 'element' parameter options but lacks contextual guidance on tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pedro-rivas/android-puppeteer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server