# mcp-scrcpy-vision
An **MCP server** that gives AI agents complete vision and control over Android devices.
**Features:**
- **Real-time Vision**: Continuous screen streaming via scrcpy H.264 + ffmpeg
- **Fast Input Control**: When streaming, input uses scrcpy control protocol (~5-10ms latency vs ~100-300ms with adb shell)
- **UI Automation**: Element detection via uiautomator with tap coordinates
- **Full Input Control**: Tap, swipe, long press, pinch, drag-drop, text, keycodes
- **System Access**: Shell commands, file transfer, clipboard, notifications
- **Multi-device**: Control multiple Android devices simultaneously
- **WiFi ADB**: Connect wirelessly for untethered automation
---
## Quick Start
### 1. Prerequisites
**Required:**
- Node.js 18+
- ADB (Android Platform Tools) in PATH
- Android device with USB debugging enabled
**For streaming (recommended for fast input):**
- [scrcpy](https://github.com/Genymobile/scrcpy/releases) - download release, extract `scrcpy-server` file
- [ffmpeg](https://ffmpeg.org/download.html) - install and add to PATH
### 2. Install
```bash
git clone https://github.com/anthropics/mcp-scrcpy-vision.git
cd mcp-scrcpy-vision
npm install
npm run build
```
### 3. Configure
Create `.env` file:
```bash
# Required for streaming + fast input
SCRCPY_SERVER_PATH="C:\scrcpy-win64-v3.2\scrcpy-server"
SCRCPY_SERVER_VERSION="3.2"
# Optional (defaults shown)
ADB_PATH="adb"
FFMPEG_PATH="ffmpeg"
DEFAULT_MAX_SIZE="1024"
DEFAULT_MAX_FPS="30"
DEFAULT_FRAME_FPS="2"
```
### 4. Add to MCP Client
**Claude Desktop** (`%APPDATA%\Claude\claude_desktop_config.json` on Windows):
```json
{
"mcpServers": {
"android": {
"command": "node",
"args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
"env": {
"SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
"SCRCPY_SERVER_VERSION": "3.2"
}
}
}
}
```
**Cursor** (Settings > MCP):
```json
{
"android": {
"command": "node",
"args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
"env": {
"SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
"SCRCPY_SERVER_VERSION": "3.2"
}
}
}
```
### 5. Connect Device
1. Enable USB debugging on Android device (Settings > Developer Options > USB Debugging)
2. Connect via USB
3. Accept RSA fingerprint prompt on device
4. Verify: `adb devices` should show your device
---
## How It Works
### Two Modes of Operation
**1. Snapshot Mode (No streaming required)**
- Uses `android.vision.snapshot` for screenshots
- Input uses ADB shell commands (~100-300ms per action)
- Works without scrcpy/ffmpeg
- Best for simple automation or when streaming isn't available
**2. Streaming Mode (Recommended)**
- Start with `android.vision.startStream`
- Continuous JPEG frames available via resource URI
- Input uses scrcpy control protocol (~5-10ms per action)
- **10-20x faster** than snapshot mode
- Best for real-time control and rapid interactions
### Performance Comparison
| Operation | Snapshot Mode | Streaming Mode |
|-----------|---------------|----------------|
| Tap | ~100-300ms | ~5-10ms |
| Swipe | ~300-500ms | ~50-100ms |
| Type text | ~50ms/char | ~5ms total |
| Screenshot | ~500ms | ~33ms (30fps) |
---
## Tools Reference (32 tools)
### Device Management
| Tool | Parameters | Description |
|------|------------|-------------|
| `android.devices.list` | - | List connected devices |
| `android.devices.info` | `serial` | Get device info (model, SDK, etc.) |
| `android.adb.enableTcpip` | `serial`, `port?` | Enable WiFi debugging |
| `android.adb.getDeviceIp` | `serial` | Get device WiFi IP |
| `android.adb.connectWifi` | `ipAddress`, `port?` | Connect via WiFi |
| `android.adb.disconnectWifi` | `ipAddress?` | Disconnect WiFi |
### Vision
| Tool | Parameters | Description |
|------|------------|-------------|
| `android.vision.startStream` | `serial`, `maxSize?`, `maxFps?`, `frameFps?` | Start continuous stream (enables fast input) |
| `android.vision.stopStream` | `serial` | Stop stream |
| `android.vision.snapshot` | `serial` | Take PNG screenshot (works without streaming) |
| `android.ui.dump` | `serial` | Get UI hierarchy XML |
| `android.ui.findElement` | `serial`, `text?`, `resourceId?`, `className?`, `contentDesc?` | Find elements with tap coords |
### Input Control
**Note:** These automatically use fast scrcpy control when streaming, otherwise fall back to ADB.
| Tool | Parameters | Description |
|------|------------|-------------|
| `android.input.tap` | `serial`, `x`, `y` | Tap at coordinates |
| `android.input.swipe` | `serial`, `x1`, `y1`, `x2`, `y2`, `durationMs?` | Swipe gesture |
| `android.input.longPress` | `serial`, `x`, `y`, `durationMs?` | Long press |
| `android.input.pinch` | `serial`, `centerX`, `centerY`, `startDistance`, `endDistance`, `durationMs?` | Pinch zoom |
| `android.input.dragDrop` | `serial`, `startX`, `startY`, `endX`, `endY`, `durationMs?` | Drag and drop |
| `android.input.text` | `serial`, `text` | Type text |
| `android.input.keyevent` | `serial`, `keycode` | Send keycode |
### App Control
| Tool | Parameters | Description |
|------|------------|-------------|
| `android.app.start` | `serial`, `packageName`, `activity?` | Launch app |
| `android.app.stop` | `serial`, `packageName` | Force-stop app |
| `android.apps.list` | `serial`, `system?` | List installed apps |
| `android.activity.current` | `serial` | Get foreground activity |
### System
| Tool | Parameters | Description |
|------|------------|-------------|
| `android.shell.exec` | `serial`, `command` | Execute shell command |
| `android.file.push` | `serial`, `localPath`, `remotePath` | Push file to device |
| `android.file.pull` | `serial`, `remotePath`, `localPath` | Pull file from device |
| `android.file.list` | `serial`, `path` | List directory |
| `android.clipboard.get` | `serial` | Get clipboard |
| `android.clipboard.set` | `serial`, `text` | Set clipboard |
| `android.notifications.get` | `serial` | Get notifications |
### Screen Control
| Tool | Parameters | Description |
|------|------------|-------------|
| `android.screen.wake` | `serial` | Wake screen |
| `android.screen.sleep` | `serial` | Sleep screen |
| `android.screen.isOn` | `serial` | Check if screen is on |
| `android.screen.unlock` | `serial` | Unlock (unsecured only) |
---
## Resources
The server exposes these MCP resources:
- `android://devices` - JSON list of connected devices
- `android://device/<serial>/frame/latest.jpg` - Latest JPEG frame (when streaming)
---
## Usage Examples
### Basic Automation Loop (Streaming Mode)
```
1. Start stream: android.vision.startStream { serial: "ABC123" }
2. Read resource: android://device/ABC123/frame/latest.jpg
3. AI analyzes image, decides to tap "Login" button
4. Find element: android.ui.findElement { serial: "ABC123", text: "Login" }
5. Tap at returned coordinates: android.input.tap { serial: "ABC123", x: 540, y: 1200 }
6. Wait 500ms, read resource again, repeat
7. When done: android.vision.stopStream { serial: "ABC123" }
```
### Simple Screenshot Mode
```
1. Take screenshot: android.vision.snapshot { serial: "ABC123" }
2. AI analyzes image
3. Find and tap: android.ui.findElement + android.input.tap
4. Take another screenshot to verify
```
### WiFi Connection Workflow
```
1. Connect device via USB
2. android.adb.enableTcpip { serial: "ABC123" }
3. android.adb.getDeviceIp { serial: "ABC123" } → "192.168.1.50"
4. Disconnect USB cable
5. android.adb.connectWifi { ipAddress: "192.168.1.50" }
6. Now use "192.168.1.50:5555" as serial for all commands
```
### App Testing Example
```
1. android.app.start { serial: "ABC123", packageName: "com.example.app" }
2. android.vision.startStream { serial: "ABC123" }
3. Wait for app to load, read frame
4. android.ui.findElement { serial: "ABC123", resourceId: "username_field" }
5. android.input.tap { serial: "ABC123", x: 540, y: 300 }
6. android.input.text { serial: "ABC123", text: "testuser@example.com" }
7. android.input.keyevent { serial: "ABC123", keycode: 66 } // Enter
8. Read frame, verify login succeeded
9. android.vision.stopStream { serial: "ABC123" }
```
---
## Common Keycodes
| Key | Code | Key | Code |
|-----|------|-----|------|
| HOME | 3 | BACK | 4 |
| VOLUME_UP | 24 | VOLUME_DOWN | 25 |
| POWER | 26 | ENTER | 66 |
| DELETE | 67 | TAB | 61 |
| MENU | 82 | APP_SWITCH | 187 |
| WAKEUP | 224 | SLEEP | 223 |
---
## Troubleshooting
### No devices found
```bash
adb kill-server
adb start-server
adb devices
```
Ensure USB debugging is enabled and RSA fingerprint accepted.
### Scrcpy version mismatch
`SCRCPY_SERVER_VERSION` must exactly match your scrcpy-server file. Check the scrcpy release version you downloaded.
### ffmpeg not found
- **Windows**: Download from https://ffmpeg.org/download.html, extract, add bin folder to PATH
- **macOS**: `brew install ffmpeg`
- **Linux**: `apt install ffmpeg` or `yum install ffmpeg`
Or set `FFMPEG_PATH` in .env to the full path.
### uiautomator dump fails
Some devices need screen on. Try `android.screen.wake` first.
### Clipboard not working (Android 10+)
Android 10+ restricts clipboard access. Use UI automation to paste instead.
### Stream won't start
1. Check scrcpy-server path is correct
2. Verify version numbers match
3. Try running scrcpy standalone first to verify it works
---
## Notes & Limitations
- **Fast input when streaming**: When a stream is active, tap/swipe/text/keyevent use the scrcpy control protocol (~5-10ms latency). Without streaming, falls back to `adb shell input` (~100-300ms).
- **One stream per device** at a time
- **Snapshot works without scrcpy** - useful fallback when streaming is not needed
- **Clipboard** has platform limitations on Android 10+
- **Notifications** may require permissions on newer Android
- **Pinch gesture** currently simulates single-finger; true multi-touch requires the streaming session
---
## Security Warning
This MCP server provides **full control** over connected Android devices:
- Execute arbitrary shell commands
- Read/write files on device
- Control UI and input
- Access clipboard and notifications
**Only connect devices you own and trust the AI agent.**
---
## Development
```bash
npm run dev # Development with tsx
npm run build # Compile TypeScript
npm start # Run production build
```
See [claude.md](claude.md) for developer documentation.
See [agents.md](agents.md) for AI agent integration guide.
---
## License
MIT