Mobile E2E MCP
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Mobile E2E MCPexplore screens of com.example.app on Android"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Mobile E2E MCP (2026)
AI-safe mobile device control via MCP: a policy-guarded, session-oriented mobile automation harness for AI agents, with deterministic-first Android/iOS execution, bounded visual fallback, and evidence-rich outcomes.
This repository is a pnpm monorepo that combines MCP tooling, adapter execution, and architecture docs for AI agents that need to inspect, act on, and debug mobile apps without turning raw device commands into ungoverned side effects.
Quick Start
{
"mcpServers": {
"mobile-e2e-mcp": {
"command": "npx",
"args": ["-y", "@shenyuexin/mobile-e2e-mcp@latest"]
}
}
}Once installed, you get 66 MCP tools for governed mobile automation, plus a built-in Explorer for automatic page traversal.
Related MCP server: Mobile Testing AI Agent MCP Server
Primary Product Surface: Explorer
Explorer is the main outward-facing product capability of this repository. It is not a standalone crawler bolted onto the side; it is the clearest product surface for the harness because broad app exploration requires the same core MCP capabilities a mobile AI agent needs in practice: device discovery, auditable sessions, UI inspection, bounded UI actions, rule-based risk gates, interruption/recovery handling, and structured evidence.
Use Explorer when you want to answer product questions such as:
Which screens are reachable from this app entry point?
Which flows are blocked by policy, external-app boundaries, risk gates, or repeated failures?
What changed between two exploration runs?
Which discovered paths should be promoted into deterministic replay or PR review evidence?
Explorer writes fixed, reviewable artifacts such as tree.txt, report.md, summary.json, config.json, and failure-review JSON/Markdown. Current local and tracked evidence includes large Settings explorations, including 100+ page runs, with rule decisions, interruption/failure context, and machine-consumable page metadata.
Core Wedge: Governed Agent Control
The strongest use case for this project is not "replace every mobile E2E framework." It is: give an AI agent a safer control plane for mobile devices.
Compared with a thin adb or platform-command wrapper, this harness adds:
Policy boundaries: actions are checked against policy profiles before execution.
Auditable sessions: actions run inside session, lease, audit, and evidence context.
Capability disclosure: agents can query supported platforms and boundaries before acting.
Structured outcomes: failures, denials, and evidence are machine-consumable instead of log-only.
Reproduce the dry-run proof:
pnpm run proof:governed-agent-mobile-controlThe proof writes a timestamped bundle under output/showcase/governed-agent-mobile-control/<run-id>/ and verifies that a read-only session blocks an interactive action with structured POLICY_DENIED. See Governed Agent Mobile Control Proof.
Explorer: Automatic Page Traversal
Explorer is a DFS-based automatic page traversal engine built into the MCP server. It systematically navigates through your app's screens, builds a state graph, and produces structured coverage reports without requiring manual flow definitions.
npx -y @shenyuexin/mobile-e2e-mcp@latest explore \
--app-id com.example.app \
--platform android \
--output ./explore-reportKey features:
DFS-based traversal: systematically explores every reachable screen from a starting point
State graph tracking: records visited states and detects cycles to avoid infinite loops
Circuit breaker: automatically stops when exploration hits diminishing returns or configured limits
Structured coverage reports: outputs machine-consumable reports showing which screens and elements were discovered
Rule-based gating: respects skip-page, skip-element, sampling, and risk-gating rules for safe exploration
Interruption-aware evidence: records blocked, interrupted, skipped, and failed traversal decisions with reasons so a run can be reviewed instead of treated as a raw pass/fail crawl
Experimental horizontal fallback: after vertical segments are exhausted, Explorer can probe horizontally scrollable content with bounded page-identity checks
Output
Explorer produces a directory of structured artifacts:
File | Description |
| ASCII tree of all discovered pages and navigation paths |
| Human-readable coverage report with module breakdown |
| Human-readable failure triage with grouped patterns and suggested next actions |
| Machine-consumable failure triage summary |
| Machine-consumable metrics and page metadata |
| Runtime configuration and rule settings used for the run |
Example output from a real run against iOS Settings (181 pages, max depth 5):
tree.txt— full page hierarchyreport.md— module breakdown and pathssummary.json— metrics and metadata
Local real-device runs are written under output/evidence/explorer/. These outputs are intentionally structured so they can be curated into public showcase evidence or consumed by follow-on tooling such as coverage diffing, PR summaries, and replay path extraction.
For architecture details and rule configuration:
What This Repository Actually Is
This repo contains both:
Executable implementation (MCP server, adapters, contracts, core orchestration), and
Architecture and delivery knowledge base (design principles, capability model, phased rollout docs).
If you only remember one thing: this project is designed as an Explorer-led, governed mobile control layer for AI agents, not a single-framework test runner.
Mobile E2E Harness Positioning
This project is an AI mobile E2E harness: a policy-aware, session-oriented, deterministic-first execution harness for mobile automation where an AI agent needs controlled action, evidence, and support-boundary clarity.
If you're searching for terms like mobile test harness, real-device Android test harness, AI automation harness, or mobile CI harness, this repository is built for that exact workflow.
Why teams use this harness
Deterministic-first harness: stable selectors and structured retries before OCR/CV fallback
Failure-intelligence harness: reason codes, evidence artifacts, and remediation suggestions
Governance-aware harness: policy profiles, auditable sessions, and controlled tool surfaces
Explorer harness: the primary product surface, combining traversal, tool orchestration, risk gating, interruption handling, recovery, and structured coverage/failure evidence (available via CLI)
Real-device evidence: Explorer/probe artifacts plus historical videos for happy path and interruption recovery
Capability Showcase
If you want a quick hands-on tour before diving into architecture details, start here:
Happy path video (login -> scroll -> add to cart -> orders -> cart):
docs/showcase/videos/m2e-happy-path-scroll-pause-40s.mp4
Visible interruption + recovery video (HOME interruption -> recover_to_known_state -> continue action):
docs/showcase/videos/m2e-interruption-home-recovery-35s.mp4
Current real-device verification:
Android Explorer evidence:
docs/showcase/evidence/android-explorer-full-2026-04-28T03-38-20/Android probe entrypoint:
pnpm run validate:android-tool-probe(latest Vivo V2405A run: 20/23 success, 0 partial, 3 expected diagnostic failures; core UI and interruption-resume paths passed)iOS probe entrypoint:
pnpm run validate:ios-tool-probe
Governed-control proof:
pnpm run quickstart:governed-control(first-run readiness and next-command guide)pnpm run proof:governed-agent-mobile-controlpnpm run proof:governed-agent-mobile-control:preflight(checks Android live-proof readiness)pnpm run proof:governed-agent-mobile-control:live(requires an Android device/emulator)pnpm run proof:governed-business-app-workflow(installs/launches the demo app, then switches to read-only governed agent observation)pnpm run proof:governed-policy-escalation(denies under read-only, then retries under interactive policy)pnpm run validate:governed-control-evidencepnpm run validate:governed-business-app-evidencepnpm run validate:governed-business-app-comparisonpnpm run validate:governed-policy-escalation-evidencepnpm run validate:governed-evidence-briefpnpm run validate:governed-pr-evidence-summarypnpm run verify:mobile-change(one-command mobile change verification UX; use-- --live --contract=configs/readiness/mobile-change.android.jsonfor contract-backed live mode)pnpm run generate:mobile-change-readiness-contract/pnpm run validate:mobile-change-readiness-contractpnpm run generate:mobile-change-repo-app-success-candidate/pnpm run validate:mobile-change-repo-app-success-candidate(repo-owned demo app success candidate; blocked output is not success evidence until a device/emulator run passes intake)pnpm run generate:mobile-change-ci-pr-evidence/pnpm run validate:mobile-change-ci-pr-evidence(compact PR/CI artifact with proof-level-safe blocked/failed/success labels)pnpm run generate:mobile-change-failure-memory/pnpm run validate:mobile-change-failure-memory(deterministic failure-pattern grouping and bounded next-action routing)pnpm run verify:react-native-change(experimental RN lane that runs readiness plus evidence-pack orchestration; live success still requires device, Metro, debug target, stable selectors, and intake-backed proof)pnpm run generate:react-native-readiness/pnpm run validate:react-native-readiness(RN preflight for device, Metro, JS debug target, readiness contract, and stable selectors)pnpm run generate:react-native-evidence-pack/pnpm run validate:react-native-evidence-pack(RN review artifact that keeps Metro signals supplemental)pnpm run proof:mobile-change-verification(fixture-backed mobile change verification bundle, failure packet, and scenario index)pnpm run generate:mobile-change-device-readiness/pnpm run validate:mobile-change-device-readiness(structured device/app/readiness preflight before attempting live mobile change proof)pnpm run proof:mobile-change-verification:live(optional live device/emulator proof; useM2E_LIVE_MOBILE_CHANGE_ALLOW_NO_DEVICE=1for structured no-device output)pnpm run proof:mobile-change-verification:live-settings(runnable no-APK Android Settings success lane; requires10AEA40Z3Y000R5or editing the device id)pnpm run generate:mobile-change-live-settings-lane/pnpm run validate:mobile-change-live-settings-lanepnpm run proof:mobile-change-verification:readiness-failure(controlled live-runner-derived app readiness failure packet)pnpm run validate:mobile-change-verificationpnpm run validate:mobile-change-live-android-evidence(tracked Android device10AEA40Z3Y000R5live app-readiness failure evidence)pnpm run validate:mobile-change-readiness-failurepnpm run generate:mobile-change-handoff/pnpm run validate:mobile-change-handoffpnpm run intake:mobile-change-live-proof/pnpm run validate:mobile-change-live-proof-intake(review live runner output before promoting it as tracked evidence)docs/showcase/evidence/governed-control-vivo-2026-05-23/report.md
docs/showcase/evidence/governed-business-app-vivo-2026-05-24/report.md
docs/showcase/evidence/governed-business-app-vivo-2026-05-24/comparison.md
docs/showcase/evidence/governed-policy-escalation-dry-run-2026-05-25/report.md
docs/showcase/evidence/mobile-change-verification-fixture/report.md
docs/showcase/evidence/mobile-change-verification-fixture/failure-packet.md
docs/showcase/evidence/mobile-change-verification-fixture/scenario-index.md
docs/showcase/evidence/mobile-change-device-readiness/report.md
docs/showcase/evidence/mobile-change-live-android-10AEA40Z3Y000R5/report.md
docs/showcase/evidence/mobile-change-live-settings-lane/lane.md
docs/showcase/evidence/mobile-change-live-proof-intake/intake.md
docs/showcase/evidence/mobile-change-repo-app-success-candidate/candidate.md
docs/showcase/evidence/mobile-change-ci-pr-evidence/pr-summary.md
docs/showcase/evidence/mobile-change-failure-memory/remediation.md
docs/showcase/evidence/mobile-change-readiness-failure/failure-packet.md
docs/showcase/evidence/mobile-change-readiness-failure/handoff.md
Historical demo scripts:
bash scripts/legacy/dev/record-demo-happy-path-android.shbash scripts/legacy/dev/record-demo-interruption-home-recovery-android.shbash scripts/legacy/dev/publish-showcase-assets-android.sh(record + curate videos + refresh snapshots/GIFs)
Demo playbook and evidence index:
AI invocation and task guides:
CI evidence and boundary notes:
Quick GIF Preview
Happy path GIF | Interruption recovery GIF |
|
|
FAQ
What is a mobile E2E harness for AI agents?
It is an execution layer that lets AI agents run mobile test actions safely and reproducibly. This harness adds session control, policy boundaries, deterministic action routing, and structured evidence beyond basic command execution.
Can this harness run on real Android devices?
Yes. Current real-device evidence is centered on Explorer/probe artifacts, with historical showcase scripts under scripts/legacy/dev/* and recordings under docs/showcase/*. The latest Android Vivo probe verified the core Settings UI action path and resume_interrupted_action with native_android; a few diagnostic/negative-path checks remain intentionally non-green unless their prerequisites, such as Metro, are present.
How does interruption recovery work in this harness?
It detects interruption signals, classifies likely interruption type, and applies bounded recovery actions (for example recover_to_known_state) before continuing the flow.
Is this a replacement for Appium or Maestro?
Not necessarily. It is better understood as an orchestration harness that can coexist with existing execution ecosystems while adding AI-oriented governance and diagnostics.
Which scenarios are the best fit?
AI agents that need safe mobile device access, release-gate mobile regression, flaky-flow triage, AI-driven exploratory checks, and real-device CI workflows that require auditable, evidence-rich outcomes.
What is the Explorer and when should I use it?
Explorer automatically traverses your app's screens without predefined flows. Use it when you need broad coverage discovery, want to map an unfamiliar app's navigation structure, or need to identify all reachable screens before writing targeted test flows. It is available via the explore CLI command.
Android physical-device Explorer evidence is tracked under docs/showcase/evidence/android-explorer-full-2026-04-28T03-38-20/: a full Settings traversal completed in 33m 50s with 45 pages, max depth 4, and 0 failures.
Validate that evidence offline with pnpm run validate:explorer-android-evidence -- --min-pages 45 --min-depth 4.
Appium / Maestro vs This Harness
Dimension | Appium / Maestro | Mobile E2E MCP Harness |
Core role | Automation framework / flow runner | AI-facing orchestration harness |
Execution strategy | Action execution centric | Deterministic-first + policy/session governance |
Failure handling | Assertion/command failure outputs | Structured diagnostics + ranked causes + remediation hints |
AI integration | Possible but not primary abstraction | Primary design target (tools for AI agents) |
Evidence model | Varies by setup | Built-in evidence-first action outcomes |
Helper app dependency | Required for iOS/Android replay | Android: owned-adb primary (no helper app needed for common commands); iOS simulator: axe CLI; iOS physical: WDA (one-time setup, see External Tools Guide) |
Official AI Mobile Tools vs This Harness
Android CLI/Journeys, Android Studio Journeys, and the Dart/Flutter MCP server are complementary upstream tools, not replacements for this harness. Treat them as source-native journey execution, authoring, or framework-context providers; feed their outputs into mobile-e2e evidence intake only when proof boundaries are explicit.
The machine-readable bridge contract is generated by:
pnpm run validate:official-tool-bridgeSee docs/showcase/evidence/official-tool-bridge/bridge.md for the current relationship matrix.
How It Works (End-to-End)
Typical runtime path:
Agent/client invokes an MCP tool via stdio or dev CLI.
MCP server validates input and applies policy checks.
Session context is resolved (or created), with lease/scheduling guardrails.
Adapter router selects deterministic execution path first.
Action executes and returns a structured result envelope.
Artifacts/evidence (screens, logs, summaries) are attached for audit/debug.
If deterministic resolution fails and policy allows it, bounded OCR/CV fallback is attempted.
This is why the project emphasizes session + policy + evidence, not only UI actions.
High-Level Architecture
Reference split:
Control plane: tool contracts, policy checks, session orchestration, audit/evidence indexing
Execution plane: platform actions, UI resolution, retries, interruption handling, visual fallback
Architecture reference:
Source-of-truth note:
Architecture docs describe both current baseline and target-state design.
If a doc statement conflicts with strict validation behavior, prefer
packages/contracts/*.schema.jsonandconfigs/policies/*.yamlfor current enforced behavior.
Capability Map (Current Scope)
Environment & device control — discovery, lease/isolation, environment shaping
App lifecycle — install/launch/terminate/reset/deep-link entry
Perception & interaction — inspect/query UI, tap/type/wait, flow execution
Diagnostics & evidence — logs, crash signals, performance, screenshot/timeline artifacts
Reliability & remediation — reason-coded failures, bounded retries, remediation helpers
Tool registry/signature dispatch live in packages/mcp-server/src/server.ts, while descriptor metadata and wrapper composition live in packages/mcp-server/src/index.ts.
Complete MCP Tool Catalog (Current)
The server currently exposes 66 tools. For AI agents, this is the current tool surface.
1) Session & lifecycle
start_session, request_manual_handoff, end_session, run_flow, reset_app_state
2) Task orchestration & flow capture
execute_intent, complete_task, start_record_session, get_record_session_status, end_record_session, cancel_record_session, export_session_flow, record_task_flow, validate_flow
3) Device & app control
list_devices, install_app, launch_app, terminate_app, describe_capabilities, doctor
4) UI perception, targeting, and interaction
inspect_ui, query_ui, resolve_ui_target, scroll_only, scroll_and_resolve_ui_target, wait_for_ui, wait_for_ui_stable, tap, tap_element, scroll_and_tap_element, type_text, type_into_element, navigate_back
5) Evidence, observability, and diagnostics
take_screenshot, record_screen, get_logs, get_crash_signals, collect_diagnostics, collect_debug_evidence, get_screen_summary, get_session_state, get_page_context, capture_js_console_logs, capture_js_network_events, list_js_debug_targets, capture_element_screenshot, compare_visual_baseline
6) Interruption handling
detect_interruption, classify_interruption, resolve_interruption, resume_interrupted_action
7) Failure analysis, recovery, and remediation
perform_action_with_evidence, get_action_outcome, explain_last_failure, rank_failure_candidates, find_similar_failures, compare_against_baseline, recover_to_known_state, replay_last_stable_path, suggest_known_remediation, replay_checkpoint_chain
8) Performance profiling
measure_android_performance, measure_ios_performance
9) Network diagnostics
probe_network_readiness, diagnose_network_failure, inspect_network_policy
probe_network_readiness checks runtime connectivity, DNS, latency, and optional backend reachability. diagnose_network_failure starts from an observed failed request and attributes likely Android cleartext or iOS ATS release-policy blockers. inspect_network_policy remains the lower-level static checker for plain HTTP endpoints using decoded manifest, network-security-config, Info.plist, or readable APK/IPA ZIP artifact evidence. These tools do not proxy traffic or mutate app configuration.
For exact signatures and supported inputs/outputs, use packages/mcp-server/src/server.ts (the tool registry source of truth).
Deterministic Ladder and Fallback Policy
Action resolution order is intentional and strict:
Stable ID/resource-id/testID/accessibility identifier
Semantic tree match (text/label/role)
OCR text-region fallback (bounded)
CV/template fallback (bounded)
Fail with reason code + artifacts
Prohibited behavior:
OCR/CV as the default first path
Unbounded retries without state-change evidence
Silent downgrade from deterministic to probabilistic execution
Repository-Wide Principles
Deterministic-first: use stable IDs/tree/native capabilities first; OCR/CV is bounded fallback.
Structured tool contracts: return machine-consumable result envelopes (
status,reasonCode, artifacts).Session-oriented execution: actions run in auditable sessions with explicit policy profiles.
Evidence-rich failures: failures should carry enough context for explain/replay/remediation.
Session, Policy, and Governance Model
Sessions are auditable execution units with timeline and artifact references.
Policy profiles can restrict tool classes (for example read-only vs interactive/full-control).
Lease/scheduler constraints prevent unsafe concurrent execution on the same target.
Redaction/governance paths exist to keep evidence useful while respecting data boundaries.
Key policy/config locations:
configs/policies/*.yamlconfigs/profiles/*.yaml
Non-Goals (Important for Correct Expectations)
This is not a replacement for every mobile framework internals.
This is not OCR-first automation.
This does not imply separate full RN or Flutter backends, or immediate parity across all native/RN/Flutter edge cases.
This is not a single abstraction that erases all platform differences.
Selected Docs
README.zh-CN.md — Chinese overview
docs/README.md — public documentation index and publication policy
docs/guides/ai-agent-invocation.zh-CN.md — canonical AI-agent invocation guide
docs/guides/policy-profiles.md — policy profile usage and escalation boundaries
docs/engineering/ai-first-capability-expansion-guideline.md — feature expansion rules for AI-first harness capabilities
docs/architecture/overview.md — goals/scope/principles
docs/architecture/architecture.md — reference architecture
docs/architecture/capability-map.md — capability taxonomy/maturity
docs/architecture/governance-security.md — governance/security model
docs/architecture/README.zh-CN.md — architecture navigation index (zh-CN)
docs/architecture/session-orchestration-architecture.zh-CN.md — session lease/scheduler/runtime orchestration
docs/architecture/policy-engine-runtime-architecture.zh-CN.md — policy runtime/guard/scope mapping
docs/architecture/platform-implementation-matrix.zh-CN.md — cross-platform support matrix
docs/delivery/roadmap.md — delivery phases
docs/delivery/npm-release-and-git-tagging.zh-CN.md — npm release and Git tagging integration guide (includes layered doc-sync rules for PR/pre-tag/tag stages)
docs/showcase/README.md — real-device demo evidence and repro scripts
tests/README.md — test layers and CI scope
Roadmap Snapshot (Short)
Near term: harden deterministic session/action reliability and evidence model.
Mid term: broaden framework/profile maturity and real-run coverage.
Long term: stronger agentic remediation/governance and enterprise controls.
Detailed public planning references are maintained in docs/delivery/roadmap.md and docs/architecture/*.
Open Source Collaboration
License: MIT
Contributing guide: CONTRIBUTING.md
Changelog: CHANGELOG.md
Positioning
This project is not another isolated test framework. It is an AI-facing orchestration layer that routes mobile E2E actions through shared platform adapters and framework profiles, with deterministic-first behavior and strict governance boundaries.
Support This Project
If this project helps your team, you can support it by:
Starring and sharing the repository
Opening issues/PRs with reproducible evidence
Sponsoring the project
Donation note:
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/shenyuexin/mobile-e2e-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server

