# CLAUDE.md - mcp-scrcpy-vision
This file provides guidance to Claude Code when working with this MCP server.
---
## Project Overview
**mcp-scrcpy-vision** is an MCP (Model Context Protocol) server that gives AI agents full control over Android devices through vision and input capabilities. It combines:
- **Real-time screen streaming** via scrcpy's H.264 pipeline
- **UI element detection** via uiautomator
- **Full input control** (tap, swipe, gestures, text, keycodes)
- **System access** (shell, files, clipboard, notifications)
- **WiFi ADB** for wireless device control
**Primary Use Cases**: App testing, device automation, accessibility
---
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ MCP Client (Claude) │
│ Uses vision to understand screen content │
│ Decides what actions to take │
└─────────────────────────┬───────────────────────────────────┘
│ MCP Protocol (stdio)
┌─────────────────────────▼───────────────────────────────────┐
│ mcp-scrcpy-vision (Node.js) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Vision │ │ Input │ │ System │ │ ADB │ │
│ │ Tools │ │ Tools │ │ Tools │ │ Tools │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼────────────┼────────────┼────────────┼──────────────┘
│ │ │ │
┌───────▼────────────▼────────────▼────────────▼──────────────┐
│ ADB (Android Debug Bridge) │
│ USB or WiFi connection to device │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────────────▼───────────────────────────────────┐
│ Android Device(s) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ scrcpy-srv │ │ uiautomator │ │ system │ │
│ │ (H.264) │ │ (UI tree) │ │ (shell,etc) │ │
│ └──────┬──────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ ffmpeg │ ← Decodes H.264 to JPEG frames │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
---
## File Structure
```
mcp-scrcpy-vision/
├── src/
│ ├── index.ts # MCP server entry, all tool registrations
│ ├── adb.ts # ADB command wrappers (30+ functions)
│ ├── scrcpySession.ts # Scrcpy streaming session + fast input control
│ ├── scrcpyControl.ts # Binary protocol encoders for scrcpy control
│ ├── jpegParser.ts # JPEG frame extraction from ffmpeg output
│ └── config.ts # Environment config loader
├── dist/ # Compiled JavaScript (after build)
├── package.json
├── tsconfig.json
├── .env # Local configuration (not committed)
├── mcp.sample.json # Example MCP client configuration
├── README.md # User documentation
└── claude.md # This file (developer guidance)
```
---
## Available Tools (32 total)
### Device Management
| Tool | Description |
|------|-------------|
| `android.devices.list` | List all connected Android devices |
| `android.devices.info` | Get device info (model, brand, SDK version) |
| `android.adb.enableTcpip` | Enable WiFi debugging (requires USB first) |
| `android.adb.getDeviceIp` | Get device's WiFi IP address |
| `android.adb.connectWifi` | Connect to device over WiFi |
| `android.adb.disconnectWifi` | Disconnect WiFi connection |
### Vision
| Tool | Description |
|------|-------------|
| `android.vision.startStream` | Start continuous H.264 stream via scrcpy |
| `android.vision.stopStream` | Stop streaming session |
| `android.vision.snapshot` | Take single PNG screenshot (no scrcpy needed) |
| `android.ui.dump` | Dump UI hierarchy as XML (uiautomator) |
| `android.ui.findElement` | Find elements by text/id/class with tap coords |
### Input Control
**Note:** When a stream is active, tap/swipe/text/keyevent/longPress use the fast scrcpy control protocol (~5-10ms latency). Without streaming, these fall back to adb shell input (~100-300ms).
| Tool | Description |
|------|-------------|
| `android.input.tap` | Tap at coordinates (x, y) |
| `android.input.swipe` | Swipe from (x1,y1) to (x2,y2) |
| `android.input.longPress` | Long press at coordinates |
| `android.input.pinch` | Pinch zoom gesture |
| `android.input.dragDrop` | Drag and drop gesture |
| `android.input.text` | Type text (fast: scrcpy, fallback: adb) |
| `android.input.keyevent` | Send keycode (HOME=3, BACK=4, etc.) |
### App Control
| Tool | Description |
|------|-------------|
| `android.app.start` | Launch app by package name |
| `android.app.stop` | Force-stop app |
| `android.apps.list` | List installed apps (system/third-party filter) |
| `android.activity.current` | Get current foreground activity |
### System
| Tool | Description |
|------|-------------|
| `android.shell.exec` | Execute arbitrary shell command |
| `android.file.push` | Push file to device |
| `android.file.pull` | Pull file from device |
| `android.file.list` | List directory contents |
| `android.clipboard.get` | Get clipboard content |
| `android.clipboard.set` | Set clipboard content (limited on Android 10+) |
| `android.notifications.get` | Get notification dump |
### Screen Control
| Tool | Description |
|------|-------------|
| `android.screen.wake` | Wake device screen |
| `android.screen.sleep` | Put screen to sleep |
| `android.screen.isOn` | Check if screen is on |
| `android.screen.unlock` | Unlock screen (unsecured devices only) |
---
## Common Keycodes
```javascript
const KEYCODES = {
HOME: 3,
BACK: 4,
CALL: 5,
END_CALL: 6,
VOLUME_UP: 24,
VOLUME_DOWN: 25,
POWER: 26,
CAMERA: 27,
ENTER: 66,
DELETE: 67,
MENU: 82,
SEARCH: 84,
APP_SWITCH: 187, // Recent apps
WAKEUP: 224,
SLEEP: 223,
};
```
---
## Agent Loop Pattern
The recommended pattern for AI agents controlling Android devices:
```typescript
// 1. Start vision stream (or use snapshot for simpler cases)
await android.vision.startStream({ serial: "device123" });
// 2. Get current screen state
const screenshot = await readResource("android://device/device123/frame/latest.jpg");
// 3. Analyze screen content (AI vision)
// Claude/GPT analyzes the image and decides action
// 4. Optionally get UI elements for precise targeting
const elements = await android.ui.findElement({
serial: "device123",
text: "Login"
});
// 5. Perform action based on analysis
await android.input.tap({
serial: "device123",
x: elements.elements[0].centerX,
y: elements.elements[0].centerY
});
// 6. Wait for UI to update, repeat loop
```
---
## Development
### Building
```bash
npm install
npm run build # Compile TypeScript
npm run dev # Run with tsx (development)
npm start # Run compiled version
```
### Testing with a Device
1. Connect Android device via USB
2. Enable USB debugging on device
3. Accept RSA fingerprint prompt
4. Run `adb devices` to verify connection
5. Start the MCP server: `npm start`
### Adding New Tools
1. Add function to `src/adb.ts`:
```typescript
export async function newFunction(adbPath: string, serial: string, ...args): Promise<Result> {
const res = await adbShell(adbPath, serial, ["command", ...args]);
if (res.code !== 0) throw new Error(`command failed: ${res.stderr || res.stdout}`);
return parseResult(res.stdout);
}
```
2. Register tool in `src/index.ts`:
```typescript
server.registerTool(
"android.category.name",
{
title: "Human readable title",
description: "What this tool does",
inputSchema: z.object({
serial: z.string().min(1),
// ... other params
}).strict(),
},
async ({ serial, ...params }) => {
const result = await newFunction(cfg.adbPath, serial, ...params);
return {
content: [{ type: "text", text: safeJson(result) }],
structuredContent: result,
};
}
);
```
3. Export from `adb.ts` imports at top of `index.ts`
4. Rebuild: `npm run build`
---
## Troubleshooting
### "No devices found"
- Check `adb devices` output
- Ensure USB debugging is enabled
- Accept RSA fingerprint on device
- Try `adb kill-server && adb start-server`
### "scrcpy server version mismatch"
- Ensure `SCRCPY_SERVER_VERSION` matches your scrcpy-server file exactly
- Download matching version from scrcpy releases
### "ffmpeg not found"
- Install ffmpeg and add to PATH
- Or set `FFMPEG_PATH` environment variable
### Vision stream not working
- Verify scrcpy works standalone first
- Check `SCRCPY_SERVER_PATH` points to valid file
- Try `SCRCPY_RAW_STREAM_ARG=raw_video_stream` for older scrcpy
### uiautomator dump fails
- Some devices require screen to be on
- System animations may interfere; try `adb shell settings put global window_animation_scale 0`
---
## Security Considerations
This MCP server has full control over connected Android devices:
- **Shell access**: Can execute any command
- **File access**: Can read/write device storage
- **Input injection**: Can control the device UI
- **App control**: Can install/uninstall/launch apps
Treat this like root access. Only connect devices you own and trust the AI agent controlling them.
---
## Resources
- [scrcpy GitHub](https://github.com/Genymobile/scrcpy)
- [ADB Documentation](https://developer.android.com/studio/command-line/adb)
- [MCP Protocol](https://modelcontextprotocol.io/)
- [Android Keycodes](https://developer.android.com/reference/android/view/KeyEvent)