mobile-device-mcp
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| MCP_ADB_PATH | No | Custom ADB binary path | |
| MCP_AI_MODEL | No | Override AI model (e.g., gemini-2.5-flash or claude-sonnet-4-20250514) | gemini-2.5-flash |
| GEMINI_API_KEY | No | Google API key for Gemini vision (alias for GOOGLE_API_KEY) | |
| GOOGLE_API_KEY | No | Google API key for Gemini vision (recommended) | |
| MCP_AI_PROVIDER | No | Force AI provider: "anthropic" or "google" | |
| ANTHROPIC_API_KEY | No | Anthropic API key for Claude vision | |
| MCP_DEFAULT_DEVICE | No | Default device serial | |
| MCP_SCREENSHOT_FORMAT | No | "png" or "jpeg" | jpeg |
| MCP_SCREENSHOT_QUALITY | No | JPEG quality (1-100) | 80 |
| MOBILE_MCP_LICENSE_KEY | No | License key to unlock Pro tools | |
| MCP_SCREENSHOT_MAX_WIDTH | No | Resize screenshots to this max width | 720 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| list_devicesA | List all connected Android devices and emulators. Returns an array of DeviceInfo objects including id, model, manufacturer, Android version, connection status, and whether the device is an emulator. |
| get_device_infoA | Get detailed information about a specific Android device, including model, manufacturer, Android version, SDK version, connection status, screen size, and whether it is an emulator. |
| get_screen_sizeA | Get the screen resolution of a connected Android device. Returns the width and height in pixels. |
| take_screenshotA | Capture a screenshot of the device screen. Returns the image with metadata (width, height, file size). Supports PNG (lossless, larger) and JPEG (compressed, smaller). Use format='jpeg' with quality and max_width to reduce image size for AI analysis. |
| get_ui_elementsA | Retrieve the current UI element tree from the device screen. Each element includes its index, text, content description, class name, resource ID, bounding box with center coordinates (useful for tap targets), and boolean states (clickable, scrollable, focusable, enabled, selected, checked). By default only interactive elements are returned. Set interactive_only to false to get all elements. This is the primary tool for understanding what is on screen and deciding where to tap. |
| tapA | Perform a single tap at the given (x, y) screen coordinates. Use get_ui_elements first to find the centerX/centerY of the element you want to tap. |
| double_tapA | Perform a double tap at the given (x, y) screen coordinates. Useful for zooming into maps/images or selecting text. |
| long_pressA | Perform a long press (touch and hold) at the given (x, y) screen coordinates. Commonly used to open context menus, start drag operations, or trigger secondary actions. |
| swipeA | Perform a swipe gesture from (start_x, start_y) to (end_x, end_y). Use this to scroll through lists, dismiss notifications, navigate between pages, or pull down the notification shade. A shorter duration makes the swipe faster (flick), while a longer duration makes it slower (drag). |
| type_textA | Type text into the currently focused input field on the device. Make sure an input field is focused first (tap on it). Special characters and Unicode are supported. |
| press_keyA | Press a hardware or system key on the device. Accepts friendly key names such as 'home', 'back', 'enter', 'volume_up', 'volume_down', 'power', 'tab', 'delete', 'menu', 'search', 'app_switch', 'dpad_up', 'dpad_down', 'dpad_left', 'dpad_right', 'camera', 'escape', 'space', 'media_play_pause', 'media_next', 'media_previous'. You may also pass a raw Android keycode number as a string (e.g. '3' for HOME). |
| list_appsA | List all installed applications on the device. By default only user-installed (non-system) apps are returned. Set include_system to true to also include system apps. Each entry contains the package name, display name, version, and whether it is a system app. |
| get_current_appA | Get the package name and activity name of the app that is currently in the foreground on the device. Useful for determining what the user is currently looking at. |
| launch_appA | [Pro] Launch an installed application by its package name (e.g. 'com.android.chrome'). The app will be started with its default/main activity. Use list_apps to discover available package names. |
| stop_appA | [Pro] Force-stop a running application by its package name. This immediately terminates the app process. Useful for resetting app state or freeing resources. |
| install_appA | [Pro] Install an Android application from an APK file on the host machine. Provide the full path to the .apk file. The APK will be pushed to the device and installed. |
| uninstall_appA | [Pro] Uninstall an application from the device by its package name. This removes the app and all its data. System apps cannot be uninstalled without root access. |
| get_logsA | Retrieve recent Android logcat entries from the device. You can filter by minimum log level (V=Verbose, D=Debug, I=Info, W=Warning, E=Error, F=Fatal) and/or by tag name. Returns structured JSON with timestamp, PID, TID, level, tag, and message for each entry. |
| analyze_screenA | [Pro] Uses AI vision to analyze the current screen of a mobile device. Returns a structured analysis including app name, screen type, interactive elements with coordinates, visible text, and suggested next actions. This is the primary tool for understanding what is currently displayed on the device. |
| find_elementA | [Pro] Uses AI vision to find a specific UI element by natural language description. Returns the element's coordinates, type, and confidence score. Use this when you need to locate a specific button, field, or other UI element. Example queries: 'the login button', 'email input field', 'the red error message'. |
| suggest_actionsA | [Pro] Uses AI to analyze the current screen and suggest a sequence of actions to achieve a specified goal. Returns step-by-step instructions with exact coordinates for each action. Example goals: 'log into the app', 'navigate to settings', 'add an item to cart'. |
| visual_diffA | [Pro] Compares the current screen with a previous screenshot to identify what changed. Provide a base64 PNG screenshot as the 'before' image — the tool captures the current screen as the 'after' image. Returns a list of changes with descriptions and regions. Useful for verifying that an action had the expected effect. |
| smart_tapA | [Pro] Finds a UI element by natural language description and taps it. Combines element finding and tapping into a single action. Example: smart_tap('the Sign In button') will locate the button and tap its center coordinates. Returns whether the tap succeeded and which element was tapped. |
| smart_typeA | [Pro] Finds an input field by natural language description, taps it to focus, and types the specified text. Example: smart_type('email field', 'user@example.com') will find the email input, tap it, and type the email address. |
| extract_textA | [Pro] Uses AI vision to extract all visible text from the current screen. Returns text in reading order (top to bottom, left to right). Useful for reading content, checking labels, or getting text that isn't in the accessibility tree. |
| verify_screenB | [Pro] Uses AI to verify whether a specific assertion about the current screen is true. Returns a boolean result with confidence score and evidence. Example assertions: 'the login was successful', 'an error message is displayed', 'the cart has 3 items'. |
| wait_for_settleA | [Pro] Waits for the screen to stop changing after a navigation or action. Polls the UI tree until two consecutive snapshots are identical, indicating animations and loading have completed. Use this after tapping navigation buttons or triggering screen transitions to ensure the new screen is ready for interaction. |
| wait_for_elementA | [Pro] Waits for a specific UI element to appear on screen by polling the UI tree. More reliable and faster than wait_for_settle — returns as soon as the target element is found. Use after navigation to wait for a specific button, text, or field to appear. Example: wait_for_element('Find Routes') after selecting a station. |
| handle_popupB | [Pro] Detects and handles system dialogs and popups (permission requests, update prompts, system alerts). Can automatically dismiss or accept the dialog. Use 'dismiss' to decline/skip, 'accept' to allow/confirm, or 'auto' to handle it automatically. |
| fill_formA | [Pro] Fill multiple form fields in a single operation. Provide a map of field descriptions to values. Each field is found by natural language description, cleared, and filled with the specified value. More efficient than calling smart_type multiple times. |
| flutter_connectA | [Pro] Discover and connect to a running Flutter app on the device via the Dart VM Service Protocol. The app must be running in debug or profile mode. Returns connection details including the isolate ID and app name. Call this before using other flutter_* tools. Optionally pass vm_service_url from 'flutter run' output if auto-discovery fails. |
| flutter_disconnectA | [Pro] Disconnect from the currently connected Flutter app and clean up resources. |
| flutter_get_widget_treeA | [Pro] Get the widget tree from the connected Flutter app. By default returns the summary tree (user-created widgets only), which maps directly to your source code. Each widget includes its type, properties, source code location (file:line), and children. Call flutter_connect first. |
| flutter_get_widget_detailsB | [Pro] Get detailed information about a specific widget by its valueId (obtained from flutter_get_widget_tree). Returns properties, children, render bounds, and source location. |
| flutter_find_widgetB | [Pro] Search the widget tree for widgets matching a query. Searches by widget type name, description, and text content. Example queries: 'ElevatedButton', 'Text', 'AppBar', 'Login'. Returns matching widgets with their source locations. |
| flutter_get_source_mapA | [Pro] Map every user-created widget to its source code location (file:line:column). This is the key tool for connecting what you see on screen to where it is in code. Returns a list of {widget, file, line, column} entries. |
| flutter_screenshot_widgetB | [Pro] Take a screenshot of a specific Flutter widget by its valueId. Returns the widget rendered in isolation as a PNG image. |
| flutter_hot_reloadA | [Pro] Trigger a hot reload on the connected Flutter app. Pushes code changes without losing app state — variables, navigation stack, and scroll positions are preserved. Much faster than a full restart. Call flutter_connect first. |
| flutter_hot_restartA | [Pro] Trigger a hot restart on the connected Flutter app. Restarts the app from scratch, losing all state, but applies all code changes including static field initializers and global variables. Use when hot reload is insufficient. Call flutter_connect first. |
| flutter_debug_paintA | [Pro] Toggle the debug paint overlay on the Flutter app. Shows widget boundaries, padding, alignment guides, and construction lines. Useful for debugging layout issues. |
| record_screenA | [Pro] Start recording the device screen as MP4 video. Android has a 3-minute maximum per recording. Only one recording per device at a time. Call stop_recording to finish and save the video. |
| stop_recordingA | [Pro] Stop an active screen recording and save the MP4 video. Optionally pull the recording file from the device to the host machine. |
| start_test_recordingA | [Pro] Start recording all MCP tool calls to generate a reproducible test script. All subsequent tool calls will be logged until stop_test_recording is called. |
| stop_test_recordingA | [Pro] Stop recording and generate a test script from the recorded actions. Supports TypeScript, Python, and JSON output formats. |
| get_recorded_actionsA | [Pro] View the actions recorded so far without stopping the recording. Useful for inspecting what has been captured. |
| ios_list_simulatorsA | [Pro] List all available iOS simulators with their status (Booted/Shutdown), UDID, name, and iOS version. Works on macOS only. |
| ios_boot_simulatorA | [Pro] Boot an iOS simulator by its UDID. Get the UDID from ios_list_simulators. |
| ios_shutdown_simulatorA | [Pro] Shutdown a running iOS simulator by its UDID. |
| ios_screenshotA | [Pro] Take a screenshot of a running iOS simulator. Returns the image as base64. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/saranshbamania/mobile-device-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server