agentic-android
Provides tools for controlling an Android device via ADB, enabling AI agents to perform actions like tapping, swiping, typing, launching apps, and reading the screen.
Allows using a local Ollama model (tool-calling or vision) as the agent's brain, free and on-device.
Allows using OpenAI's API or any OpenAI-compatible endpoint (e.g., OpenRouter, Azure) as the agent's brain.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@agentic-androidOpen the Settings app and turn on Wi-Fi"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Agentic Android
Drive an Android phone or emulator with any LLM — Claude, OpenAI, or a local Ollama / LM Studio model — over ADB, in plain English. No root, no app to install.
Agentic Android is a computer-use agent for phones. It reads the screen (as a
screenshot or a text list of on-screen elements), picks an action
(tap / swipe / type / launch app), runs it over adb, sees the result, and loops
until your task is done.
You ──"install WhatsApp"──▶ Agentic Android ──screen + tools──▶ LLM
▲ │
└────────── adb (tap/swipe/type) ◀┘Brains (providers)
| What it is | Needs |
| spawns your logged-in Claude Code CLI (live chat) | the |
| Anthropic API |
|
| OpenAI or any OpenAI-compatible endpoint (OpenRouter, Azure, …) |
|
| a local model via Ollama | nothing — free, on-device |
| a local model via LM Studio | nothing — free, on-device |
Runs in vision mode (screenshots) or text mode (a structured element list
from the UI tree) — text mode is cheaper/faster and works with non-vision and
text-only local models. Set it all in one file, agentic-android.toml.
Related MCP server: DeepADB
Requirements
A device: a phone with USB debugging, or an emulator — including a networked one (e.g. MuMu/an emulator VM at
192.168.1.79:5555).adb: provided automatically by the
adbutilsdependency (bundled binary), so no system install is needed. A systemadbon PATH is used if present. Verify withpython -m agentic_android --list-devices.A brain — pick one (set in
agentic-android.toml):claude-cli(recommended): theclaudeCLI installed and logged in (runclaudeonce). No API key.anthropic: an Anthropic API key.openai: an OpenAI key — or any OpenAI-compatible endpoint (OpenRouter, Azure…) via a custom base URL.ollama: a local model, free, no key. Auto-detects the loaded model fromhttp://localhost:11434/v1. The model must support tool calling (e.g.qwen3,llama3.1/3.2,mistral-nemo); vision models likeqwen3-vl/llavacan setvision = true. (Uses Ollama's native API so it can set a usable context window — Ollama's 4096 default is too small.)lmstudio: a local model via LM Studio, free, no key. Start the server (Developer tab), load a tool-calling model (e.g. Gemma 4, Qwen, Llama 3.x), and it auto-detects it fromhttp://localhost:1234/v1. Set a generous context length in LM Studio when loading.
Install
Easiest — use the launcher (creates its own virtualenv on first run, so you never touch the system Python):
./run.sh --list-devices # Linux/macOS
run.bat --list-devices # Windows (or just double-click run.bat)run.sh/run.bat is just a wrapper around python -m agentic_android, so any
argument in this README works after it (e.g. ./run.sh --provider openai "…").
Manual install (if you prefer):
cd /mnt/windows/Work/Ai/projects/agentic-android
python3 -m venv .venv
.venv/bin/pip install -e . # or: pip install -r requirements.txt
.venv/bin/python -m agentic_android --list-devices⚠️ Use the venv's Python, not the system one.
python -m agentic_androidwith/usr/bin/pythonfails with "No module named agentic_android" — that Python doesn't have the package or its dependencies. Use./run.sh, or activate the venv (source .venv/bin/activate), or call.venv/bin/pythondirectly.
Pillow is optional — it only downscales large screenshots to cut token cost. Without it (e.g. on Python 3.14 where wheels may be missing) the agent sends full-resolution screenshots and still works.
Configure (one file) ⭐
Everything you'd normally tweak lives in agentic-android.toml — open it, it's
commented. The three things that matter:
provider = "claude-cli" # claude-cli | anthropic | openai
[device]
serial = "192.168.1.79:5555" # your phone/emulator (see --list-devices)
[agent]
effort = 3 # 0 = ask early/cheap … 5 = try everything/priceyThen fill in the section for your provider ([claude_cli], [anthropic], or
[openai]). For OpenAI-compatible servers, just change [openai].base_url:
[openai]
api_key = "" # or set env OPENAI_API_KEY
base_url = "https://api.openai.com/v1" # OpenRouter / http://localhost:11434/v1 / …
model = "gpt-4o" # must be vision-capableRun it with no arguments and it uses your config:
python -m agentic_androidUse
Live chat — brain is your claude CLI (no API key) ⭐
The recommended way. It spawns a headless claude agent (using your logged-in
Claude Code subscription — no ANTHROPIC_API_KEY needed), wires the device
tools in over MCP, and relays your terminal to/from the agent. You chat, it
acts, and you can keep typing to steer it mid-task.
python -m agentic_android --chat -s 192.168.1.79:5555you> Install WhatsApp on my phone
· screenshot {}
agent> Opening the Play Store and searching for WhatsApp…
· launch_app {"package":"com.android.vending"}
· tap {"x":360,"y":140}
· type_text {"text":"WhatsApp"}
...
agent> WhatsApp is installing. I'll let you know when it's done.
you> actually also install Telegram while you're at it ← steer mid-taskRequirements for chat: the claude CLI installed and logged in (run claude
once interactively). Options: --model sonnet|haiku|opus (default sonnet),
--budget 2 (USD cap via claude --max-budget-usd). The agent is spawned with
its working directory set to this project folder, so its Claude Code session
history stays isolated here.
How it works: chat.py launches
claude -p --input-format stream-json --output-format stream-json --permission-mode bypassPermissions --mcp-config <agentic_android>; the MCP server
(agentic_android/mcp_server.py) exposes screenshot / tap / swipe / type_text / press_key / launch_app / list_apps / dump_ui, with screen-changing tools returning a fresh
screenshot image the agent can see.
Persistence level (how hard it tries before asking) & cost
When the agent gets stuck or hits an ambiguous choice, it asks you a question
with options instead of guessing or thrashing. How hard it tries to recover on
its own before asking is the persistence level, 0–5, set in
agentic-android.toml (or --effort N per run):
Level | Behaviour | Cost |
0 | Ask at the first ambiguity or failure; almost no self-recovery | cheapest |
1 | One quick retry, then ask | |
2 | A couple of recovery attempts, then ask | |
3 | Default — several strategies (re-screenshot, | |
4 | Persistent: exhaust visual + UI-tree approaches; ask only when truly blocked | |
5 | Maximum: try everything; only ask for what it can't resolve (passwords, 2FA, purchase confirmation) | most expensive |
Cost: higher levels mean more screenshots, tool calls and model turns before the agent pauses — which increases API / usage cost. Lower levels check in with you sooner and spend less.
python -m agentic_android --chat -s 192.168.1.79:5555 --effort 4 # try hard before asking
python -m agentic_android --chat -s 192.168.1.79:5555 --effort 0 # ask early, spend littleWhen it asks in chat, just type your answer (a number or freeform); the agent
continues from there. In the API-agent path it uses an ask_user tool that
prompts in the terminal.
API agent — anthropic or openai
Set provider to anthropic or openai (in the config, or --provider). The
agent loop is provider-agnostic — it keeps one ongoing conversation, sends a
fresh screenshot with each of your messages, and acts. ask_user lets it ask
you a question with options when stuck.
# OpenAI (or any OpenAI-compatible endpoint — note the custom --base-url)
python -m agentic_android --provider openai "Open Settings and turn on Airplane mode"
python -m agentic_android --provider openai # interactive chat
# Local models, free, no key (auto-detect the loaded model)
./run.sh --provider ollama "Open the Chrome browser app"
./run.sh --provider lmstudio "Open the Chrome browser app" # LM Studio
# Anthropic
python -m agentic_android --provider anthropic --model claude-opus-4-8Keys come from the config file or the env (OPENAI_API_KEY / ANTHROPIC_API_KEY).
Example tasks: "Open the Clock app and set an alarm for 7:30 AM", "In Settings,
find the Android version".
Text mode — for models that can't see screenshots
Many models (most local/text-only LLMs) can't read images. Set vision = false
(or pass --no-vision) and the agent stops sending screenshots — instead it
describes the screen as a numbered text list of on-screen elements parsed
from the Android UI tree (the DopeGram approach), each with a tap point:
Screen 720x1280px — 45 elements. Tap with the @(x,y) point.
#1 [ImageView] "Back" @(48,57) [tap]
#4 [ImageView] "Open notification settings" @(560,57) [tap]
#9 [View] "1,380 posts" id=profile_header_post_count @(298,236) [tap]
#20 [EditText] "Search" id=search_input @(360,140) [tap,input]The model just taps the @(x,y) of the element it wants (coordinates are real
device pixels). Enable per provider:
[openai]
model = "llama3.2-vision" # or any text model
vision = falsepython -m agentic_android --provider openai --base-url http://localhost:11434/v1 --no-visionVision (screenshots) is still better when the model supports it; text mode is the fallback that makes non-vision models usable.
Debugging — save the session
Add --debug (or [agent] debug = true) to save every API request and
response for the run to debug/session-<timestamp>.jsonl (one JSON object per
line):
./run.sh --provider openai --no-vision --debug "Open the Clock app"For
anthropic/openai: each request (model, messages, tools) and its response — including tokenusage— is recorded. Base64 screenshots are redacted to a size marker so the file stays readable; the text element list is kept verbatim.For
claude-cli: the agent's full raw stream-json (every tool call, result, and message) is saved.
debug/ is git-ignored. Use --debug-dir <path> to change the location.
How it works
File | Role |
| Thin |
| Screenshot capture with optional downscaling; maps Claude's image coordinates back to real device pixels. |
| MCP stdio server exposing the device tools (used by the |
| Spawns a |
| Provider backends — |
| Provider-agnostic agent loop — drives a brain; one-shot |
| Tool schemas (converted to OpenAI function format by |
| Loads |
| CLI / provider selection / device selection / dispatch. |
Model is claude-opus-4-8 with adaptive thinking. Coordinates Claude returns
are in the shown image's pixel space; Device.scale converts them to physical
pixels so taps land correctly even when screenshots are downscaled.
Notes & limits
type_textgoes throughadb shell input text; it escapes common characters but is best with plain ASCII. For unusual unicode, prefer an on-screen keyboard or a per-app text method.This automates whatever device you point it at — review tasks before running destructive ones. There is no built-in confirmation gate (add one in
agent._dispatchif you want human-in-the-loop approval).One action set per step keeps the loop legible; raise
--max-stepsfor longer flows.
Roadmap ideas
A confirmation gate for destructive actions (uninstall, factory reset, purchases).
scroll_to_text/tap_texthelpers built ondump_uifor more robust targeting.Multi-device fan-out; record/replay of a successful action trace.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/xDope7137/agentic-android'
If you have feedback or need assistance with the MCP directory API, please join our Discord server