Enables AI agents to control Android devices and emulators through direct UI interaction, including clicking, swiping, typing, and navigation, with UI state inspection and automated testing capabilities.
Android-MCP
Android-MCP is a lightweight, open-source tool that bridges between AI agents and Android devices. Running as an MCP server, it lets LLM agents perform real-world tasks such as app navigation, UI interaction and automated QA testing without relying on traditional computer-vision pipelines or preprogrammed scripts.
Forked and customized by Hady Ahmed
Features
Direct Device Control: Click, swipe, drag, type, and press buttons on Android devices
UI State Inspection: Get device state with UI hierarchy and optional annotated screenshots
MCP Integration: Works with any MCP-compatible client (Claude Desktop, VS Code, etc.)
Emulator Support: Works with Android emulators (tested on emulator-5554)
Physical Device Support: Connect to real Android devices via ADB
Vision Capabilities: Generate annotated screenshots with numbered UI elements for vision-based AI agents
Requirements
Python 3.12+
Android device or emulator running
ADB (Android Debug Bridge) installed and configured
uiautomator2compatible Android device (Android 4.4+)
Installation
From Source
Clone the repository:
Install dependencies:
Or install manually:
Quick Start
1. Start the MCP Server
For emulator:
For physical device:
The server will connect to your Android device via ADB and start listening for MCP client connections.
2. Configure with MCP Client
To use this with Claude Code or other MCP clients, add the following to your MCP configuration:
Example MCP Config:
Available Tools
The MCP server exposes 9 tools for controlling Android devices:
1. State-Tool
Get the current state of the device including UI hierarchy and optional screenshot.
Parameters:
use_vision(bool, optional): Include annotated screenshot with labeled UI elements
Example:
2. Click-Tool
Click on a specific coordinate on the screen.
Parameters:
x(int): X coordinatey(int): Y coordinate
Example:
3. Long-Click-Tool
Long press (hold) on a specific coordinate.
Parameters:
x(int): X coordinatey(int): Y coordinate
Example:
4. Swipe-Tool
Perform a swipe gesture from one point to another.
Parameters:
x1(int): Starting X coordinatey1(int): Starting Y coordinatex2(int): Ending X coordinatey2(int): Ending Y coordinate
Example:
5. Type-Tool
Type text at a specific coordinate (automatically focuses the field).
Parameters:
text(str): Text to typex(int): X coordinatey(int): Y coordinateclear(bool, optional): Clear existing text before typing
Example:
6. Drag-Tool
Drag from one location and drop at another.
Parameters:
x1(int): Starting X coordinatey1(int): Starting Y coordinatex2(int): Ending X coordinatey2(int): Ending Y coordinate
Example:
7. Press-Tool
Press device buttons (home, back, power, volume, etc.).
Parameters:
button(str): Button name (back, home, power, volume_up, volume_down)
Example:
8. Notification-Tool
Open the notification bar to access notifications.
Parameters: None
Example:
9. Wait-Tool
Wait for a specified duration (useful for allowing apps to load).
Parameters:
duration(int): Seconds to wait
Example:
Usage Workflow
Basic Example: Navigate and Click
Get device state with vision:
Get the current state of the device with a screenshotThis returns the UI hierarchy and an annotated screenshot showing numbered UI elements.
Click on an element:
Click on element 0 (the button that was labeled as 0)Get updated state:
Get the device state again
Example: Form Submission
Example: App Navigation
Architecture
The project is organized into three main modules:
main.py
Entry point that creates the FastMCP server
Defines and exposes all 9 tools
Handles command-line arguments (
--emulator)
src/mobile/
Mobile class: Manages device connection and state
MobileState: Data class representing device state
Captures screenshots and processes them
src/tree/
Tree class: Parses Android UI XML hierarchy
Extracts interactive elements (buttons, inputs, etc.)
Generates annotated screenshots with numbered labels
Helper utilities for coordinate extraction
Troubleshooting
Device Not Connecting
Check ADB:
adb devicesYour device should appear in the list.
For emulators:
adb connect emulator-5554Enable USB Debugging (for physical devices):
Go to Settings > Developer Options
Enable USB Debugging
Connect your device
No Interactive Elements Found
This can happen if:
The app hasn't fully loaded - try waiting with
Wait-ToolThe UI uses unusual widgets not in the
INTERACTIVE_CLASSESlistScreen is at an unusual rotation
Try getting device state with vision to see what's actually on screen.
Screenshots Not Working
Ensure the device screen is on
Check that the device has enough storage
Try waiting a moment before requesting screenshots
Development
Running Tests
Tests for the UI tree parsing:
Adding New Tools
To add a new tool to the MCP server:
Add a function in
main.pydecorated with@mcp.tool()Define parameters with type hints
Return results that can be serialized (strings, lists, or Images)
Example:
License
This project is licensed under the MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Support
If you encounter issues:
Check the Troubleshooting section above
Verify your Android device setup with ADB
Open an issue on GitHub with:
Device details (type, Android version)
Error messages
Steps to reproduce