Skip to main content
Glama

Agentic Android

Drive an Android phone or emulator with any LLM — Claude, OpenAI, or a local Ollama / LM Studio model — over ADB, in plain English. No root, no app to install.

Agentic Android is a computer-use agent for phones. It reads the screen (as a screenshot or a text list of on-screen elements), picks an action (tap / swipe / type / launch app), runs it over adb, sees the result, and loops until your task is done.

You ──"install WhatsApp"──▶ Agentic Android ──screen + tools──▶ LLM
                                 ▲                                 │
                                 └────────── adb (tap/swipe/type) ◀┘

Brains (providers)

provider

What it is

Needs

claude-cli

spawns your logged-in Claude Code CLI (live chat)

the claude CLI — no API key

anthropic

Anthropic API

ANTHROPIC_API_KEY

openai

OpenAI or any OpenAI-compatible endpoint (OpenRouter, Azure, …)

OPENAI_API_KEY

ollama

a local model via Ollama

nothing — free, on-device

lmstudio

a local model via LM Studio

nothing — free, on-device

Runs in vision mode (screenshots) or text mode (a structured element list from the UI tree) — text mode is cheaper/faster and works with non-vision and text-only local models. Set it all in one file, agentic-android.toml.

Related MCP server: DeepADB

Requirements

  1. A device: a phone with USB debugging, or an emulator — including a networked one (e.g. MuMu/an emulator VM at 192.168.1.79:5555).

  2. adb: provided automatically by the adbutils dependency (bundled binary), so no system install is needed. A system adb on PATH is used if present. Verify with python -m agentic_android --list-devices.

  3. A brain — pick one (set in agentic-android.toml):

    • claude-cli (recommended): the claude CLI installed and logged in (run claude once). No API key.

    • anthropic: an Anthropic API key.

    • openai: an OpenAI key — or any OpenAI-compatible endpoint (OpenRouter, Azure…) via a custom base URL.

    • ollama: a local model, free, no key. Auto-detects the loaded model from http://localhost:11434/v1. The model must support tool calling (e.g. qwen3, llama3.1/3.2, mistral-nemo); vision models like qwen3-vl / llava can set vision = true. (Uses Ollama's native API so it can set a usable context window — Ollama's 4096 default is too small.)

    • lmstudio: a local model via LM Studio, free, no key. Start the server (Developer tab), load a tool-calling model (e.g. Gemma 4, Qwen, Llama 3.x), and it auto-detects it from http://localhost:1234/v1. Set a generous context length in LM Studio when loading.

Install

Easiest — use the launcher (creates its own virtualenv on first run, so you never touch the system Python):

./run.sh --list-devices          # Linux/macOS
run.bat --list-devices           # Windows (or just double-click run.bat)

run.sh/run.bat is just a wrapper around python -m agentic_android, so any argument in this README works after it (e.g. ./run.sh --provider openai "…").

Manual install (if you prefer):

cd /mnt/windows/Work/Ai/projects/agentic-android
python3 -m venv .venv
.venv/bin/pip install -e .        # or: pip install -r requirements.txt
.venv/bin/python -m agentic_android --list-devices

⚠️ Use the venv's Python, not the system one. python -m agentic_android with /usr/bin/python fails with "No module named agentic_android" — that Python doesn't have the package or its dependencies. Use ./run.sh, or activate the venv (source .venv/bin/activate), or call .venv/bin/python directly.

Pillow is optional — it only downscales large screenshots to cut token cost. Without it (e.g. on Python 3.14 where wheels may be missing) the agent sends full-resolution screenshots and still works.

Configure (one file) ⭐

Everything you'd normally tweak lives in agentic-android.toml — open it, it's commented. The three things that matter:

provider = "claude-cli"          # claude-cli | anthropic | openai

[device]
serial = "192.168.1.79:5555"     # your phone/emulator (see --list-devices)

[agent]
effort = 3                       # 0 = ask early/cheap … 5 = try everything/pricey

Then fill in the section for your provider ([claude_cli], [anthropic], or [openai]). For OpenAI-compatible servers, just change [openai].base_url:

[openai]
api_key  = ""                            # or set env OPENAI_API_KEY
base_url = "https://api.openai.com/v1"   # OpenRouter / http://localhost:11434/v1 / …
model    = "gpt-4o"                      # must be vision-capable

Run it with no arguments and it uses your config:

python -m agentic_android

Use

Live chat — brain is your claude CLI (no API key) ⭐

The recommended way. It spawns a headless claude agent (using your logged-in Claude Code subscription — no ANTHROPIC_API_KEY needed), wires the device tools in over MCP, and relays your terminal to/from the agent. You chat, it acts, and you can keep typing to steer it mid-task.

python -m agentic_android --chat -s 192.168.1.79:5555
you> Install WhatsApp on my phone
   · screenshot {}
agent> Opening the Play Store and searching for WhatsApp…
   · launch_app {"package":"com.android.vending"}
   · tap {"x":360,"y":140}
   · type_text {"text":"WhatsApp"}
   ...
agent> WhatsApp is installing. I'll let you know when it's done.
you> actually also install Telegram while you're at it     ← steer mid-task

Requirements for chat: the claude CLI installed and logged in (run claude once interactively). Options: --model sonnet|haiku|opus (default sonnet), --budget 2 (USD cap via claude --max-budget-usd). The agent is spawned with its working directory set to this project folder, so its Claude Code session history stays isolated here.

How it works: chat.py launches claude -p --input-format stream-json --output-format stream-json --permission-mode bypassPermissions --mcp-config <agentic_android>; the MCP server (agentic_android/mcp_server.py) exposes screenshot / tap / swipe / type_text / press_key / launch_app / list_apps / dump_ui, with screen-changing tools returning a fresh screenshot image the agent can see.

Persistence level (how hard it tries before asking) & cost

When the agent gets stuck or hits an ambiguous choice, it asks you a question with options instead of guessing or thrashing. How hard it tries to recover on its own before asking is the persistence level, 0–5, set in agentic-android.toml (or --effort N per run):

Level

Behaviour

Cost

0

Ask at the first ambiguity or failure; almost no self-recovery

cheapest

1

One quick retry, then ask

2

A couple of recovery attempts, then ask

3

Default — several strategies (re-screenshot, dump_ui, scroll, alt path) before asking

4

Persistent: exhaust visual + UI-tree approaches; ask only when truly blocked

5

Maximum: try everything; only ask for what it can't resolve (passwords, 2FA, purchase confirmation)

most expensive

Cost: higher levels mean more screenshots, tool calls and model turns before the agent pauses — which increases API / usage cost. Lower levels check in with you sooner and spend less.

python -m agentic_android --chat -s 192.168.1.79:5555 --effort 4     # try hard before asking
python -m agentic_android --chat -s 192.168.1.79:5555 --effort 0     # ask early, spend little

When it asks in chat, just type your answer (a number or freeform); the agent continues from there. In the API-agent path it uses an ask_user tool that prompts in the terminal.

API agent — anthropic or openai

Set provider to anthropic or openai (in the config, or --provider). The agent loop is provider-agnostic — it keeps one ongoing conversation, sends a fresh screenshot with each of your messages, and acts. ask_user lets it ask you a question with options when stuck.

# OpenAI (or any OpenAI-compatible endpoint — note the custom --base-url)
python -m agentic_android --provider openai "Open Settings and turn on Airplane mode"
python -m agentic_android --provider openai                 # interactive chat

# Local models, free, no key (auto-detect the loaded model)
./run.sh --provider ollama "Open the Chrome browser app"
./run.sh --provider lmstudio "Open the Chrome browser app"   # LM Studio

# Anthropic
python -m agentic_android --provider anthropic --model claude-opus-4-8

Keys come from the config file or the env (OPENAI_API_KEY / ANTHROPIC_API_KEY). Example tasks: "Open the Clock app and set an alarm for 7:30 AM", "In Settings, find the Android version".

Text mode — for models that can't see screenshots

Many models (most local/text-only LLMs) can't read images. Set vision = false (or pass --no-vision) and the agent stops sending screenshots — instead it describes the screen as a numbered text list of on-screen elements parsed from the Android UI tree (the DopeGram approach), each with a tap point:

Screen 720x1280px — 45 elements. Tap with the @(x,y) point.
#1 [ImageView] "Back" @(48,57) [tap]
#4 [ImageView] "Open notification settings" @(560,57) [tap]
#9 [View] "1,380 posts" id=profile_header_post_count @(298,236) [tap]
#20 [EditText] "Search" id=search_input @(360,140) [tap,input]

The model just taps the @(x,y) of the element it wants (coordinates are real device pixels). Enable per provider:

[openai]
model  = "llama3.2-vision"   # or any text model
vision = false
python -m agentic_android --provider openai --base-url http://localhost:11434/v1 --no-vision

Vision (screenshots) is still better when the model supports it; text mode is the fallback that makes non-vision models usable.

Debugging — save the session

Add --debug (or [agent] debug = true) to save every API request and response for the run to debug/session-<timestamp>.jsonl (one JSON object per line):

./run.sh --provider openai --no-vision --debug "Open the Clock app"
  • For anthropic/openai: each request (model, messages, tools) and its response — including token usage — is recorded. Base64 screenshots are redacted to a size marker so the file stays readable; the text element list is kept verbatim.

  • For claude-cli: the agent's full raw stream-json (every tool call, result, and message) is saved.

debug/ is git-ignored. Use --debug-dir <path> to change the location.

How it works

File

Role

agentic_android/adb.py

Thin adb wrapper: screencap, input tap/swipe/text/key, uiautomator dump, app launch.

agentic_android/device.py

Screenshot capture with optional downscaling; maps Claude's image coordinates back to real device pixels.

agentic_android/mcp_server.py

MCP stdio server exposing the device tools (used by the claude-cli provider).

agentic_android/chat.py

Spawns a claude CLI agent with the MCP server attached and relays the terminal chat (no API key).

agentic_android/brains.py

Provider backends — AnthropicBrain and OpenAIBrain (custom base URL) behind one interface.

agentic_android/agent.py

Provider-agnostic agent loop — drives a brain; one-shot run() and interactive chat().

agentic_android/tools.py

Tool schemas (converted to OpenAI function format by brains.py).

agentic_android/config.py

Loads agentic-android.toml; persistence-level (0–5) text.

agentic_android/__main__.py

CLI / provider selection / device selection / dispatch.

Model is claude-opus-4-8 with adaptive thinking. Coordinates Claude returns are in the shown image's pixel space; Device.scale converts them to physical pixels so taps land correctly even when screenshots are downscaled.

Notes & limits

  • type_text goes through adb shell input text; it escapes common characters but is best with plain ASCII. For unusual unicode, prefer an on-screen keyboard or a per-app text method.

  • This automates whatever device you point it at — review tasks before running destructive ones. There is no built-in confirmation gate (add one in agent._dispatch if you want human-in-the-loop approval).

  • One action set per step keeps the loop legible; raise --max-steps for longer flows.

Roadmap ideas

  • A confirmation gate for destructive actions (uninstall, factory reset, purchases).

  • scroll_to_text / tap_text helpers built on dump_ui for more robust targeting.

  • Multi-device fan-out; record/replay of a successful action trace.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/xDope7137/agentic-android'

If you have feedback or need assistance with the MCP directory API, please join our Discord server