Skip to main content
Glama

mcp-scrcpy-vision

An MCP server that gives AI agents complete vision and control over Android devices.

Features:

  • Real-time Vision: Continuous screen streaming via scrcpy H.264 + ffmpeg

  • Fast Input Control: When streaming, input uses scrcpy control protocol (~5-10ms latency vs ~100-300ms with adb shell)

  • UI Automation: Element detection via uiautomator with tap coordinates

  • Full Input Control: Tap, swipe, long press, pinch, drag-drop, text, keycodes

  • System Access: Shell commands, file transfer, clipboard, notifications

  • Multi-device: Control multiple Android devices simultaneously

  • WiFi ADB: Connect wirelessly for untethered automation


Quick Start

1. Prerequisites

Required:

  • Node.js 18+

  • ADB (Android Platform Tools) in PATH

  • Android device with USB debugging enabled

For streaming (recommended for fast input):

  • scrcpy - download release, extract scrcpy-server file

  • ffmpeg - install and add to PATH

2. Install

git clone https://github.com/anthropics/mcp-scrcpy-vision.git cd mcp-scrcpy-vision npm install npm run build

3. Configure

Create .env file:

# Required for streaming + fast input SCRCPY_SERVER_PATH="C:\scrcpy-win64-v3.2\scrcpy-server" SCRCPY_SERVER_VERSION="3.2" # Optional (defaults shown) ADB_PATH="adb" FFMPEG_PATH="ffmpeg" DEFAULT_MAX_SIZE="1024" DEFAULT_MAX_FPS="30" DEFAULT_FRAME_FPS="2"

4. Add to MCP Client

Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json on Windows):

{ "mcpServers": { "android": { "command": "node", "args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"], "env": { "SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server", "SCRCPY_SERVER_VERSION": "3.2" } } } }

Cursor (Settings > MCP):

{ "android": { "command": "node", "args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"], "env": { "SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server", "SCRCPY_SERVER_VERSION": "3.2" } } }

5. Connect Device

  1. Enable USB debugging on Android device (Settings > Developer Options > USB Debugging)

  2. Connect via USB

  3. Accept RSA fingerprint prompt on device

  4. Verify: adb devices should show your device


How It Works

Two Modes of Operation

1. Snapshot Mode (No streaming required)

  • Uses android.vision.snapshot for screenshots

  • Input uses ADB shell commands (~100-300ms per action)

  • Works without scrcpy/ffmpeg

  • Best for simple automation or when streaming isn't available

2. Streaming Mode (Recommended)

  • Start with android.vision.startStream

  • Continuous JPEG frames available via resource URI

  • Input uses scrcpy control protocol (~5-10ms per action)

  • 10-20x faster than snapshot mode

  • Best for real-time control and rapid interactions

Performance Comparison

Operation

Snapshot Mode

Streaming Mode

Tap

~100-300ms

~5-10ms

Swipe

~300-500ms

~50-100ms

Type text

~50ms/char

~5ms total

Screenshot

~500ms

~33ms (30fps)


Tools Reference (32 tools)

Device Management

Tool

Parameters

Description

android.devices.list

-

List connected devices

android.devices.info

serial

Get device info (model, SDK, etc.)

android.adb.enableTcpip

serial, port?

Enable WiFi debugging

android.adb.getDeviceIp

serial

Get device WiFi IP

android.adb.connectWifi

ipAddress, port?

Connect via WiFi

android.adb.disconnectWifi

ipAddress?

Disconnect WiFi

Vision

Tool

Parameters

Description

android.vision.startStream

serial, maxSize?, maxFps?, frameFps?

Start continuous stream (enables fast input)

android.vision.stopStream

serial

Stop stream

android.vision.snapshot

serial

Take PNG screenshot (works without streaming)

android.ui.dump

serial

Get UI hierarchy XML

android.ui.findElement

serial, text?, resourceId?, className?, contentDesc?

Find elements with tap coords

Input Control

Note: These automatically use fast scrcpy control when streaming, otherwise fall back to ADB.

Tool

Parameters

Description

android.input.tap

serial, x, y

Tap at coordinates

android.input.swipe

serial, x1, y1, x2, y2, durationMs?

Swipe gesture

android.input.longPress

serial, x, y, durationMs?

Long press

android.input.pinch

serial, centerX, centerY, startDistance, endDistance, durationMs?

Pinch zoom

android.input.dragDrop

serial, startX, startY, endX, endY, durationMs?

Drag and drop

android.input.text

serial, text

Type text

android.input.keyevent

serial, keycode

Send keycode

App Control

Tool

Parameters

Description

android.app.start

serial, packageName, activity?

Launch app

android.app.stop

serial, packageName

Force-stop app

android.apps.list

serial, system?

List installed apps

android.activity.current

serial

Get foreground activity

System

Tool

Parameters

Description

android.shell.exec

serial, command

Execute shell command

android.file.push

serial, localPath, remotePath

Push file to device

android.file.pull

serial, remotePath, localPath

Pull file from device

android.file.list

serial, path

List directory

android.clipboard.get

serial

Get clipboard

android.clipboard.set

serial, text

Set clipboard

android.notifications.get

serial

Get notifications

Screen Control

Tool

Parameters

Description

android.screen.wake

serial

Wake screen

android.screen.sleep

serial

Sleep screen

android.screen.isOn

serial

Check if screen is on

android.screen.unlock

serial

Unlock (unsecured only)


Resources

The server exposes these MCP resources:

  • android://devices - JSON list of connected devices

  • android://device/<serial>/frame/latest.jpg - Latest JPEG frame (when streaming)


Usage Examples

Basic Automation Loop (Streaming Mode)

1. Start stream: android.vision.startStream { serial: "ABC123" } 2. Read resource: android://device/ABC123/frame/latest.jpg 3. AI analyzes image, decides to tap "Login" button 4. Find element: android.ui.findElement { serial: "ABC123", text: "Login" } 5. Tap at returned coordinates: android.input.tap { serial: "ABC123", x: 540, y: 1200 } 6. Wait 500ms, read resource again, repeat 7. When done: android.vision.stopStream { serial: "ABC123" }

Simple Screenshot Mode

1. Take screenshot: android.vision.snapshot { serial: "ABC123" } 2. AI analyzes image 3. Find and tap: android.ui.findElement + android.input.tap 4. Take another screenshot to verify

WiFi Connection Workflow

1. Connect device via USB 2. android.adb.enableTcpip { serial: "ABC123" } 3. android.adb.getDeviceIp { serial: "ABC123" } → "192.168.1.50" 4. Disconnect USB cable 5. android.adb.connectWifi { ipAddress: "192.168.1.50" } 6. Now use "192.168.1.50:5555" as serial for all commands

App Testing Example

1. android.app.start { serial: "ABC123", packageName: "com.example.app" } 2. android.vision.startStream { serial: "ABC123" } 3. Wait for app to load, read frame 4. android.ui.findElement { serial: "ABC123", resourceId: "username_field" } 5. android.input.tap { serial: "ABC123", x: 540, y: 300 } 6. android.input.text { serial: "ABC123", text: "testuser@example.com" } 7. android.input.keyevent { serial: "ABC123", keycode: 66 } // Enter 8. Read frame, verify login succeeded 9. android.vision.stopStream { serial: "ABC123" }

Common Keycodes

Key

Code

Key

Code

HOME

3

BACK

4

VOLUME_UP

24

VOLUME_DOWN

25

POWER

26

ENTER

66

DELETE

67

TAB

61

MENU

82

APP_SWITCH

187

WAKEUP

224

SLEEP

223


Troubleshooting

No devices found

adb kill-server adb start-server adb devices

Ensure USB debugging is enabled and RSA fingerprint accepted.

Scrcpy version mismatch

SCRCPY_SERVER_VERSION must exactly match your scrcpy-server file. Check the scrcpy release version you downloaded.

ffmpeg not found

  • Windows: Download from https://ffmpeg.org/download.html, extract, add bin folder to PATH

  • macOS: brew install ffmpeg

  • Linux: apt install ffmpeg or yum install ffmpeg

Or set FFMPEG_PATH in .env to the full path.

uiautomator dump fails

Some devices need screen on. Try android.screen.wake first.

Clipboard not working (Android 10+)

Android 10+ restricts clipboard access. Use UI automation to paste instead.

Stream won't start

  1. Check scrcpy-server path is correct

  2. Verify version numbers match

  3. Try running scrcpy standalone first to verify it works


Notes & Limitations

  • Fast input when streaming: When a stream is active, tap/swipe/text/keyevent use the scrcpy control protocol (~5-10ms latency). Without streaming, falls back to adb shell input (~100-300ms).

  • One stream per device at a time

  • Snapshot works without scrcpy - useful fallback when streaming is not needed

  • Clipboard has platform limitations on Android 10+

  • Notifications may require permissions on newer Android

  • Pinch gesture currently simulates single-finger; true multi-touch requires the streaming session


Security Warning

This MCP server provides full control over connected Android devices:

  • Execute arbitrary shell commands

  • Read/write files on device

  • Control UI and input

  • Access clipboard and notifications

Only connect devices you own and trust the AI agent.


Development

npm run dev # Development with tsx npm run build # Compile TypeScript npm start # Run production build

See claude.md for developer documentation. See agents.md for AI agent integration guide.


License

MIT

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/invidtiv/mcp-scrcpy-vision'

If you have feedback or need assistance with the MCP directory API, please join our Discord server