Android Puppeteer is a visual-first MCP server that enables AI agents to automate interactions with Android devices through intelligent UI detection and comprehensive control tools.
Device Management: List connected Android devices/emulators with their status and dimensions
Visual Element Detection: Capture annotated screenshots with numbered UI element overlays and extract detailed element information and live UI hierarchy
Touch Interactions: Perform taps, long presses, and precise coordinate-based interactions
Navigation & Input: Press hardware back button, execute directional or custom coordinate swipes, and type text into input fields with optional clearing
Element Control: Scroll specific UI elements in any direction with customizable distance and duration
Screen Recording: Start and stop video recordings with customizable quality settings using scrcpy
Multi-Device Support: Target specific devices/emulators simultaneously for parallel automation using device IDs
Enables AI agents to interact with Android devices through visual UI element detection, touch interactions, text input, navigation gestures, and screen recording capabilities using uiautomator2 automation.
mcp-name: io.github.pedro-rivas/android-puppeteer-mcp
Android Puppeteer is a lightweight, visual-first MCP (Model Context Protocol) server that enables AI agents to interact with Android devices through intelligent UI element detection and automated interactions. Built on uiautomator2, it provides comprehensive Android automation capabilities including visual element detection, touch interactions, text input, and video recording.
Features
Visual Element Detection Automatically detects and annotates interactive UI elements with numbered overlays for precise targeting.
Comprehensive Touch Interactions Support for tap, long press, swipe, scroll, and drag gestures with coordinate-based precision.
Multi-Device Support Connect to multiple Android devices or emulators simultaneously with device-specific targeting.
Video Recording Integration Built-in screen recording capabilities using scrcpy for documentation and testing workflows.
Real-Time UI Analysis Live UI hierarchy parsing and element information extraction for dynamic interaction strategies.
MCP Protocol Integration Seamless integration with Claude Desktop and other MCP-compatible AI platforms.
Supported Operating Systems
Android 10+
Windows, macOS, Linux (host systems)
Installation
Prerequisites
Python 3.10+
uiautomator2
Android 10+ (Emulator or Physical Device)
ADB (Android Debug Bridge)
scrcpy (for video recording features)
Getting Started
Clone the repository
Install dependencies
Setup Android device
Connect to the MCP server
Locate your Claude Desktop configuration file:
Windows:
%APPDATA%\Claude\claude_desktop_config.json
macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
Add the following JSON to your Claude Desktop config:
{ "mcpServers": { "android-puppeteer": { "command": "path/to/uv", "args": [ "--directory", "path/to/android-puppeteer", "run", "puppeteer.py" ] } } }Replace:
path/to/uv
with the actual path to your uv executablepath/to/android-puppeteer
with the absolute path to where you have cloned this repo
Restart Claude Desktop
Restart your Claude Desktop. You should see "android-puppeteer" listed as an available integration.
Available Tools
Android Puppeteer provides the following tools for comprehensive Android device interaction:
Device Management
list_emulators
: List all available Android emulators and devices with their status and dimensionsget_device_dimensions
: Get the screen dimensions of a specific Android deviceget_ui_elements_info
: Get detailed information about all interactive UI elements on screen
Visual Interaction
take_screenshot
: Capture annotated screenshots with numbered UI element overlayspress
: Tap on specific coordinates with optional long press durationlong_press
: Perform long press gestures on specific coordinates
Navigation & Input
press_back
: Press the hardware back buttonswipe
: Perform directional or custom coordinate swipestype_text
: Type text into focused input fields with optional text clearingscroll_element
: Scroll specific UI elements in any direction
Recording & Documentation
record_video
: Start screen recording with customizable quality settingsstop_video
: Stop active screen recordings and save to local storage
Usage Examples
Basic Device Interaction
Multi-Device Automation
Video Recording Workflow
Project Structure
Important Notes
Device Permissions: Ensure USB debugging is enabled on target Android devices
Network Access: Some features require network connectivity for device communication
Storage: Screenshot and video files are saved locally in
ss/
andvideos/
directoriesPerformance: Response times depend on device performance and network latency
Troubleshooting
Common Issues
Device not found: Verify ADB connection with
adb devices
Permission denied: Check USB debugging and device authorization
Screenshot failures: Ensure device screen is unlocked and accessible
Video recording issues: Verify scrcpy installation and device compatibility
Debug Mode
Run the server directly for debugging:
License
This project is licensed under the MIT License. See LICENSE for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature
)Make your changes
Run tests and ensure code quality
Commit your changes (
git commit -m 'Add amazing feature'
)Push to the branch (
git push origin feature/amazing-feature
)Open a Pull Request
Related Projects
Android MCP - Alternative Android automation MCP server
uiautomator2 - Core Android automation library
MCP Protocol - Model Context Protocol specification
Star this repo if you find it useful!
local-only server
The server can only run on the client's local machine because it depends on local resources.
Tools
Enables AI agents to interact with Android devices through visual UI element detection and automated interactions. Provides comprehensive Android automation capabilities including touch gestures, text input, screenshots, and video recording via uiautomator2.