Skip to main content
Glama
shenyuexin

Mobile E2E MCP

by shenyuexin

Mobile E2E MCP (2026)

CI (build + typecheck + unit + smoke) Platform Smoke (iOS sim + Android emulator) Real Device Acceptance (self-hosted)

AI-safe mobile device control via MCP: a policy-guarded, session-oriented mobile automation harness for AI agents, with deterministic-first Android/iOS execution, bounded visual fallback, and evidence-rich outcomes.

This repository is a pnpm monorepo that combines MCP tooling, adapter execution, and architecture docs for AI agents that need to inspect, act on, and debug mobile apps without turning raw device commands into ungoverned side effects.

Quick Start

{
  "mcpServers": {
    "mobile-e2e-mcp": {
      "command": "npx",
      "args": ["-y", "@shenyuexin/mobile-e2e-mcp@latest"]
    }
  }
}

Once installed, you get 66 MCP tools for governed mobile automation, plus a built-in Explorer for automatic page traversal.

Related MCP server: Mobile Testing AI Agent MCP Server

Primary Product Surface: Explorer

Explorer is the main outward-facing product capability of this repository. It is not a standalone crawler bolted onto the side; it is the clearest product surface for the harness because broad app exploration requires the same core MCP capabilities a mobile AI agent needs in practice: device discovery, auditable sessions, UI inspection, bounded UI actions, rule-based risk gates, interruption/recovery handling, and structured evidence.

Use Explorer when you want to answer product questions such as:

  • Which screens are reachable from this app entry point?

  • Which flows are blocked by policy, external-app boundaries, risk gates, or repeated failures?

  • What changed between two exploration runs?

  • Which discovered paths should be promoted into deterministic replay or PR review evidence?

Explorer writes fixed, reviewable artifacts such as tree.txt, report.md, summary.json, config.json, and failure-review JSON/Markdown. Current local and tracked evidence includes large Settings explorations, including 100+ page runs, with rule decisions, interruption/failure context, and machine-consumable page metadata.

Core Wedge: Governed Agent Control

The strongest use case for this project is not "replace every mobile E2E framework." It is: give an AI agent a safer control plane for mobile devices.

Compared with a thin adb or platform-command wrapper, this harness adds:

  • Policy boundaries: actions are checked against policy profiles before execution.

  • Auditable sessions: actions run inside session, lease, audit, and evidence context.

  • Capability disclosure: agents can query supported platforms and boundaries before acting.

  • Structured outcomes: failures, denials, and evidence are machine-consumable instead of log-only.

Reproduce the dry-run proof:

pnpm run proof:governed-agent-mobile-control

The proof writes a timestamped bundle under output/showcase/governed-agent-mobile-control/<run-id>/ and verifies that a read-only session blocks an interactive action with structured POLICY_DENIED. See Governed Agent Mobile Control Proof.

Explorer: Automatic Page Traversal

Explorer is a DFS-based automatic page traversal engine built into the MCP server. It systematically navigates through your app's screens, builds a state graph, and produces structured coverage reports without requiring manual flow definitions.

npx -y @shenyuexin/mobile-e2e-mcp@latest explore \
  --app-id com.example.app \
  --platform android \
  --output ./explore-report

Key features:

  • DFS-based traversal: systematically explores every reachable screen from a starting point

  • State graph tracking: records visited states and detects cycles to avoid infinite loops

  • Circuit breaker: automatically stops when exploration hits diminishing returns or configured limits

  • Structured coverage reports: outputs machine-consumable reports showing which screens and elements were discovered

  • Rule-based gating: respects skip-page, skip-element, sampling, and risk-gating rules for safe exploration

  • Interruption-aware evidence: records blocked, interrupted, skipped, and failed traversal decisions with reasons so a run can be reviewed instead of treated as a raw pass/fail crawl

  • Experimental horizontal fallback: after vertical segments are exhausted, Explorer can probe horizontally scrollable content with bounded page-identity checks

Output

Explorer produces a directory of structured artifacts:

File

Description

tree.txt

ASCII tree of all discovered pages and navigation paths

report.md

Human-readable coverage report with module breakdown

failure-review.md

Human-readable failure triage with grouped patterns and suggested next actions

failure-review.json

Machine-consumable failure triage summary

summary.json

Machine-consumable metrics and page metadata

config.json

Runtime configuration and rule settings used for the run

Example output from a real run against iOS Settings (181 pages, max depth 5):

Local real-device runs are written under output/evidence/explorer/. These outputs are intentionally structured so they can be curated into public showcase evidence or consumed by follow-on tooling such as coverage diffing, PR summaries, and replay path extraction.

For architecture details and rule configuration:

What This Repository Actually Is

This repo contains both:

  1. Executable implementation (MCP server, adapters, contracts, core orchestration), and

  2. Architecture and delivery knowledge base (design principles, capability model, phased rollout docs).

If you only remember one thing: this project is designed as an Explorer-led, governed mobile control layer for AI agents, not a single-framework test runner.

Mobile E2E Harness Positioning

This project is an AI mobile E2E harness: a policy-aware, session-oriented, deterministic-first execution harness for mobile automation where an AI agent needs controlled action, evidence, and support-boundary clarity.

If you're searching for terms like mobile test harness, real-device Android test harness, AI automation harness, or mobile CI harness, this repository is built for that exact workflow.

Why teams use this harness

  • Deterministic-first harness: stable selectors and structured retries before OCR/CV fallback

  • Failure-intelligence harness: reason codes, evidence artifacts, and remediation suggestions

  • Governance-aware harness: policy profiles, auditable sessions, and controlled tool surfaces

  • Explorer harness: the primary product surface, combining traversal, tool orchestration, risk gating, interruption handling, recovery, and structured coverage/failure evidence (available via CLI)

  • Real-device evidence: Explorer/probe artifacts plus historical videos for happy path and interruption recovery

Capability Showcase

If you want a quick hands-on tour before diving into architecture details, start here:

Quick GIF Preview

Happy path GIF

Interruption recovery GIF

Happy path preview

Interruption recovery preview

FAQ

What is a mobile E2E harness for AI agents?

It is an execution layer that lets AI agents run mobile test actions safely and reproducibly. This harness adds session control, policy boundaries, deterministic action routing, and structured evidence beyond basic command execution.

Can this harness run on real Android devices?

Yes. Current real-device evidence is centered on Explorer/probe artifacts, with historical showcase scripts under scripts/legacy/dev/* and recordings under docs/showcase/*. The latest Android Vivo probe verified the core Settings UI action path and resume_interrupted_action with native_android; a few diagnostic/negative-path checks remain intentionally non-green unless their prerequisites, such as Metro, are present.

How does interruption recovery work in this harness?

It detects interruption signals, classifies likely interruption type, and applies bounded recovery actions (for example recover_to_known_state) before continuing the flow.

Is this a replacement for Appium or Maestro?

Not necessarily. It is better understood as an orchestration harness that can coexist with existing execution ecosystems while adding AI-oriented governance and diagnostics.

Which scenarios are the best fit?

AI agents that need safe mobile device access, release-gate mobile regression, flaky-flow triage, AI-driven exploratory checks, and real-device CI workflows that require auditable, evidence-rich outcomes.

What is the Explorer and when should I use it?

Explorer automatically traverses your app's screens without predefined flows. Use it when you need broad coverage discovery, want to map an unfamiliar app's navigation structure, or need to identify all reachable screens before writing targeted test flows. It is available via the explore CLI command.

Android physical-device Explorer evidence is tracked under docs/showcase/evidence/android-explorer-full-2026-04-28T03-38-20/: a full Settings traversal completed in 33m 50s with 45 pages, max depth 4, and 0 failures.

Validate that evidence offline with pnpm run validate:explorer-android-evidence -- --min-pages 45 --min-depth 4.

Appium / Maestro vs This Harness

Dimension

Appium / Maestro

Mobile E2E MCP Harness

Core role

Automation framework / flow runner

AI-facing orchestration harness

Execution strategy

Action execution centric

Deterministic-first + policy/session governance

Failure handling

Assertion/command failure outputs

Structured diagnostics + ranked causes + remediation hints

AI integration

Possible but not primary abstraction

Primary design target (tools for AI agents)

Evidence model

Varies by setup

Built-in evidence-first action outcomes

Helper app dependency

Required for iOS/Android replay

Android: owned-adb primary (no helper app needed for common commands); iOS simulator: axe CLI; iOS physical: WDA (one-time setup, see External Tools Guide)

Official AI Mobile Tools vs This Harness

Android CLI/Journeys, Android Studio Journeys, and the Dart/Flutter MCP server are complementary upstream tools, not replacements for this harness. Treat them as source-native journey execution, authoring, or framework-context providers; feed their outputs into mobile-e2e evidence intake only when proof boundaries are explicit.

The machine-readable bridge contract is generated by:

pnpm run validate:official-tool-bridge

See docs/showcase/evidence/official-tool-bridge/bridge.md for the current relationship matrix.

How It Works (End-to-End)

Typical runtime path:

  1. Agent/client invokes an MCP tool via stdio or dev CLI.

  2. MCP server validates input and applies policy checks.

  3. Session context is resolved (or created), with lease/scheduling guardrails.

  4. Adapter router selects deterministic execution path first.

  5. Action executes and returns a structured result envelope.

  6. Artifacts/evidence (screens, logs, summaries) are attached for audit/debug.

  7. If deterministic resolution fails and policy allows it, bounded OCR/CV fallback is attempted.

This is why the project emphasizes session + policy + evidence, not only UI actions.

High-Level Architecture

Reference split:

  • Control plane: tool contracts, policy checks, session orchestration, audit/evidence indexing

  • Execution plane: platform actions, UI resolution, retries, interruption handling, visual fallback

Architecture reference:

Source-of-truth note:

  • Architecture docs describe both current baseline and target-state design.

  • If a doc statement conflicts with strict validation behavior, prefer packages/contracts/*.schema.json and configs/policies/*.yaml for current enforced behavior.

Capability Map (Current Scope)

  • Environment & device control — discovery, lease/isolation, environment shaping

  • App lifecycle — install/launch/terminate/reset/deep-link entry

  • Perception & interaction — inspect/query UI, tap/type/wait, flow execution

  • Diagnostics & evidence — logs, crash signals, performance, screenshot/timeline artifacts

  • Reliability & remediation — reason-coded failures, bounded retries, remediation helpers

Tool registry/signature dispatch live in packages/mcp-server/src/server.ts, while descriptor metadata and wrapper composition live in packages/mcp-server/src/index.ts.

Complete MCP Tool Catalog (Current)

The server currently exposes 66 tools. For AI agents, this is the current tool surface.

1) Session & lifecycle

start_session, request_manual_handoff, end_session, run_flow, reset_app_state

2) Task orchestration & flow capture

execute_intent, complete_task, start_record_session, get_record_session_status, end_record_session, cancel_record_session, export_session_flow, record_task_flow, validate_flow

3) Device & app control

list_devices, install_app, launch_app, terminate_app, describe_capabilities, doctor

4) UI perception, targeting, and interaction

inspect_ui, query_ui, resolve_ui_target, scroll_only, scroll_and_resolve_ui_target, wait_for_ui, wait_for_ui_stable, tap, tap_element, scroll_and_tap_element, type_text, type_into_element, navigate_back

5) Evidence, observability, and diagnostics

take_screenshot, record_screen, get_logs, get_crash_signals, collect_diagnostics, collect_debug_evidence, get_screen_summary, get_session_state, get_page_context, capture_js_console_logs, capture_js_network_events, list_js_debug_targets, capture_element_screenshot, compare_visual_baseline

6) Interruption handling

detect_interruption, classify_interruption, resolve_interruption, resume_interrupted_action

7) Failure analysis, recovery, and remediation

perform_action_with_evidence, get_action_outcome, explain_last_failure, rank_failure_candidates, find_similar_failures, compare_against_baseline, recover_to_known_state, replay_last_stable_path, suggest_known_remediation, replay_checkpoint_chain

8) Performance profiling

measure_android_performance, measure_ios_performance

9) Network diagnostics

probe_network_readiness, diagnose_network_failure, inspect_network_policy

probe_network_readiness checks runtime connectivity, DNS, latency, and optional backend reachability. diagnose_network_failure starts from an observed failed request and attributes likely Android cleartext or iOS ATS release-policy blockers. inspect_network_policy remains the lower-level static checker for plain HTTP endpoints using decoded manifest, network-security-config, Info.plist, or readable APK/IPA ZIP artifact evidence. These tools do not proxy traffic or mutate app configuration.

For exact signatures and supported inputs/outputs, use packages/mcp-server/src/server.ts (the tool registry source of truth).

Deterministic Ladder and Fallback Policy

Action resolution order is intentional and strict:

  1. Stable ID/resource-id/testID/accessibility identifier

  2. Semantic tree match (text/label/role)

  3. OCR text-region fallback (bounded)

  4. CV/template fallback (bounded)

  5. Fail with reason code + artifacts

Prohibited behavior:

  • OCR/CV as the default first path

  • Unbounded retries without state-change evidence

  • Silent downgrade from deterministic to probabilistic execution

Repository-Wide Principles

  • Deterministic-first: use stable IDs/tree/native capabilities first; OCR/CV is bounded fallback.

  • Structured tool contracts: return machine-consumable result envelopes (status, reasonCode, artifacts).

  • Session-oriented execution: actions run in auditable sessions with explicit policy profiles.

  • Evidence-rich failures: failures should carry enough context for explain/replay/remediation.

Session, Policy, and Governance Model

  • Sessions are auditable execution units with timeline and artifact references.

  • Policy profiles can restrict tool classes (for example read-only vs interactive/full-control).

  • Lease/scheduler constraints prevent unsafe concurrent execution on the same target.

  • Redaction/governance paths exist to keep evidence useful while respecting data boundaries.

Key policy/config locations:

Non-Goals (Important for Correct Expectations)

  • This is not a replacement for every mobile framework internals.

  • This is not OCR-first automation.

  • This does not imply separate full RN or Flutter backends, or immediate parity across all native/RN/Flutter edge cases.

  • This is not a single abstraction that erases all platform differences.

Selected Docs

Roadmap Snapshot (Short)

  • Near term: harden deterministic session/action reliability and evidence model.

  • Mid term: broaden framework/profile maturity and real-run coverage.

  • Long term: stronger agentic remediation/governance and enterprise controls.

Detailed public planning references are maintained in docs/delivery/roadmap.md and docs/architecture/*.

Open Source Collaboration

Positioning

This project is not another isolated test framework. It is an AI-facing orchestration layer that routes mobile E2E actions through shared platform adapters and framework profiles, with deterministic-first behavior and strict governance boundaries.

Support This Project

If this project helps your team, you can support it by:

  1. Starring and sharing the repository

  2. Opening issues/PRs with reproducible evidence

  3. Sponsoring the project

Donation note:

  • Donate via PayPal

A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
Response time
3dRelease cycle
8Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shenyuexin/mobile-e2e-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server