Skip to main content
Glama

Android-MCP

Android-MCP is a lightweight, open-source tool that bridges between AI agents and Android devices. Running as an MCP server, it lets LLM agents perform real-world tasks such as app navigation, UI interaction and automated QA testing without relying on traditional computer-vision pipelines or preprogrammed scripts.

Forked and customized by Hady Ahmed

Features

  • Direct Device Control: Click, swipe, drag, type, and press buttons on Android devices

  • UI State Inspection: Get device state with UI hierarchy and optional annotated screenshots

  • MCP Integration: Works with any MCP-compatible client (Claude Desktop, VS Code, etc.)

  • Emulator Support: Works with Android emulators (tested on emulator-5554)

  • Physical Device Support: Connect to real Android devices via ADB

  • Vision Capabilities: Generate annotated screenshots with numbered UI elements for vision-based AI agents

Requirements

  • Python 3.12+

  • Android device or emulator running

  • ADB (Android Debug Bridge) installed and configured

  • uiautomator2 compatible Android device (Android 4.4+)

Installation

From Source

  1. Clone the repository:

git clone https://github.com/HadyAhmed00/Android-MCP.git cd Android-MCP
  1. Install dependencies:

pip install -e .

Or install manually:

pip install mcp uiautomator2 pillow ipykernel

Quick Start

1. Start the MCP Server

For emulator:

python main.py --emulator

For physical device:

python main.py

The server will connect to your Android device via ADB and start listening for MCP client connections.

2. Configure with MCP Client

To use this with Claude Code or other MCP clients, add the following to your MCP configuration:

Example MCP Config:

{ "mcpServers": { "android-mcp": { "command": "python", "args": [ "/path/to/Android-MCP/main.py", "--emulator" ] } } }

Available Tools

The MCP server exposes 9 tools for controlling Android devices:

1. State-Tool

Get the current state of the device including UI hierarchy and optional screenshot.

Parameters:

  • use_vision (bool, optional): Include annotated screenshot with labeled UI elements

Example:

Get device state with screenshot

2. Click-Tool

Click on a specific coordinate on the screen.

Parameters:

  • x (int): X coordinate

  • y (int): Y coordinate

Example:

Click on coordinates 540, 800

3. Long-Click-Tool

Long press (hold) on a specific coordinate.

Parameters:

  • x (int): X coordinate

  • y (int): Y coordinate

Example:

Long click on 540, 800 for 2 seconds

4. Swipe-Tool

Perform a swipe gesture from one point to another.

Parameters:

  • x1 (int): Starting X coordinate

  • y1 (int): Starting Y coordinate

  • x2 (int): Ending X coordinate

  • y2 (int): Ending Y coordinate

Example:

Swipe from top to bottom (refresh)

5. Type-Tool

Type text at a specific coordinate (automatically focuses the field).

Parameters:

  • text (str): Text to type

  • x (int): X coordinate

  • y (int): Y coordinate

  • clear (bool, optional): Clear existing text before typing

Example:

Type "hello world" into the search field

6. Drag-Tool

Drag from one location and drop at another.

Parameters:

  • x1 (int): Starting X coordinate

  • y1 (int): Starting Y coordinate

  • x2 (int): Ending X coordinate

  • y2 (int): Ending Y coordinate

Example:

Drag and drop item from position to trash

7. Press-Tool

Press device buttons (home, back, power, volume, etc.).

Parameters:

  • button (str): Button name (back, home, power, volume_up, volume_down)

Example:

Press the back button

8. Notification-Tool

Open the notification bar to access notifications.

Parameters: None

Example:

Open notification bar

9. Wait-Tool

Wait for a specified duration (useful for allowing apps to load).

Parameters:

  • duration (int): Seconds to wait

Example:

Wait for 2 seconds

Usage Workflow

Basic Example: Navigate and Click

  1. Get device state with vision:

    Get the current state of the device with a screenshot

    This returns the UI hierarchy and an annotated screenshot showing numbered UI elements.

  2. Click on an element:

    Click on element 0 (the button that was labeled as 0)
  3. Get updated state:

    Get the device state again

Example: Form Submission

1. Get device state with vision to see the form 2. Click on the name field (element 2) 3. Type "John Doe" 4. Click on the email field (element 3) 5. Type "john@example.com" 6. Scroll down to find the submit button 7. Click the submit button 8. Wait 2 seconds for the form to process 9. Get device state to confirm submission

Example: App Navigation

1. Get device state to see current screen 2. Click the "Settings" button 3. Wait 1 second for settings to load 4. Get device state to see settings menu 5. Click on "About Phone" 6. Get state to view device information

Architecture

The project is organized into three main modules:

main.py

  • Entry point that creates the FastMCP server

  • Defines and exposes all 9 tools

  • Handles command-line arguments (--emulator)

src/mobile/

  • Mobile class: Manages device connection and state

  • MobileState: Data class representing device state

  • Captures screenshots and processes them

src/tree/

  • Tree class: Parses Android UI XML hierarchy

  • Extracts interactive elements (buttons, inputs, etc.)

  • Generates annotated screenshots with numbered labels

  • Helper utilities for coordinate extraction

Troubleshooting

Device Not Connecting

  1. Check ADB:

    adb devices

    Your device should appear in the list.

  2. For emulators:

    adb connect emulator-5554
  3. Enable USB Debugging (for physical devices):

    • Go to Settings > Developer Options

    • Enable USB Debugging

    • Connect your device

No Interactive Elements Found

This can happen if:

  • The app hasn't fully loaded - try waiting with Wait-Tool

  • The UI uses unusual widgets not in the INTERACTIVE_CLASSES list

  • Screen is at an unusual rotation

Try getting device state with vision to see what's actually on screen.

Screenshots Not Working

  • Ensure the device screen is on

  • Check that the device has enough storage

  • Try waiting a moment before requesting screenshots

Development

Running Tests

Tests for the UI tree parsing:

pytest tests/

Adding New Tools

To add a new tool to the MCP server:

  1. Add a function in main.py decorated with @mcp.tool()

  2. Define parameters with type hints

  3. Return results that can be serialized (strings, lists, or Images)

Example:

@mcp.tool(name='My-Tool', description='Description of what this does') def my_tool(param1: str, param2: int): result = mobile.get_device().some_action(param1, param2) return f'Action completed: {result}'

License

This project is licensed under the MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Support

If you encounter issues:

  1. Check the Troubleshooting section above

  2. Verify your Android device setup with ADB

  3. Open an issue on GitHub with:

    • Device details (type, Android version)

    • Error messages

    • Steps to reproduce

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HadyAhmed00/Android-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server