README.md•14.6 kB
# ByteBot MCP Server
Production-grade Model Context Protocol (MCP) server for ByteBot's dual-API architecture, providing intelligent hybrid workflow orchestration for autonomous task execution and desktop computer control.
## Overview
This MCP server integrates ByteBot's Agent API (task management) and Desktop API (computer control) into a unified interface for AI assistants like Claude. It enables:
- **Autonomous Task Execution**: Create and manage tasks for ByteBot to execute independently
- **Direct Computer Control**: Mouse, keyboard, screen capture, and file operations
- **Hybrid Workflows**: Intelligent orchestration with automatic monitoring and intervention handling
- **Real-time Updates**: Optional WebSocket support for live task status notifications
## Features
### Agent API Tools (Task Management)
- `bytebot_create_task` - Create new tasks with priority levels
- `bytebot_list_tasks` - List and filter tasks by status/priority
- `bytebot_get_task` - Get detailed task information with message history
- `bytebot_get_in_progress_task` - Check currently running task
- `bytebot_update_task` - Update task status or priority
- `bytebot_delete_task` - Delete tasks
### Desktop API Tools (Computer Control)
**Mouse Operations:**
- `bytebot_move_mouse` - Move cursor to coordinates
- `bytebot_click` - Click with left/right/middle button
- `bytebot_drag` - Drag from one position to another
- `bytebot_scroll` - Scroll in any direction
**Keyboard Operations:**
- `bytebot_type_text` - Type text strings
- `bytebot_paste_text` - Paste text (for special characters)
- `bytebot_press_keys` - Keyboard shortcuts (Ctrl+C, Alt+Tab, etc.)
**Screen Operations:**
- `bytebot_screenshot` - Capture screen as base64 PNG
- `bytebot_cursor_position` - Get current cursor position
**File I/O:**
- `bytebot_read_file` - Read file content (base64)
- `bytebot_write_file` - Write file content (base64)
**System:**
- `bytebot_switch_application` - Switch to application
- `bytebot_wait` - Wait for specified duration
### Hybrid Orchestration Tools (Priority 1)
- `bytebot_create_and_monitor_task` - Create task and wait for completion
- `bytebot_monitor_task` - Monitor existing task until terminal state
- `bytebot_intervene_in_task` - Provide help when task needs intervention
- `bytebot_execute_workflow` - Multi-step workflow with automatic error recovery
## Prerequisites
- **Node.js**: 20.x or higher
- **ByteBot Instance**: Running and accessible at configured endpoints
- Agent API (default: `http://localhost:9991`)
- Desktop API (default: `http://localhost:9990`)
## Installation
```bash
# Clone or download this repository
cd bytebot-mcp-server
# Install dependencies
npm install
# Build TypeScript code
npm run build
```
## Configuration
### 1. Create Environment File
Copy the example environment file and customize:
```bash
cp .env.example .env
```
### 2. Edit `.env` File
```env
# ByteBot Agent API (Task Management)
BYTEBOT_AGENT_URL=http://localhost:9991
# ByteBot Desktop API (Computer Control)
BYTEBOT_DESKTOP_URL=http://localhost:9990
# WebSocket Configuration (Optional)
BYTEBOT_WS_URL=ws://localhost:9991
ENABLE_WEBSOCKET=false
# Server Configuration
MCP_SERVER_NAME=bytebot-mcp
# Timeouts (milliseconds)
REQUEST_TIMEOUT=30000
DESKTOP_ACTION_TIMEOUT=10000
# Retry Configuration
MAX_RETRIES=3
RETRY_DELAY=1000
# Monitoring Configuration
TASK_POLL_INTERVAL=2000
TASK_MONITOR_TIMEOUT=300000
# File Configuration
MAX_FILE_SIZE=10485760
# Logging
LOG_LEVEL=info
```
### 3. Remote ByteBot Configuration
If ByteBot is running on a remote server:
```env
BYTEBOT_AGENT_URL=http://your-server.com:9991
BYTEBOT_DESKTOP_URL=http://your-server.com:9990
BYTEBOT_WS_URL=ws://your-server.com:9991
```
## MCP Client Setup
### Claude Desktop
Add to your Claude Desktop configuration file:
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
```json
{
"mcpServers": {
"bytebot": {
"command": "node",
"args": ["/absolute/path/to/bytebot-mcp-server/dist/index.js"],
"env": {
"BYTEBOT_AGENT_URL": "http://localhost:9991",
"BYTEBOT_DESKTOP_URL": "http://localhost:9990"
}
}
}
}
```
### Zed Editor
Add to your Zed settings:
```json
{
"context_servers": {
"bytebot": {
"command": {
"path": "node",
"args": ["/absolute/path/to/bytebot-mcp-server/dist/index.js"]
},
"env": {
"BYTEBOT_AGENT_URL": "http://localhost:9991",
"BYTEBOT_DESKTOP_URL": "http://localhost:9990"
}
}
}
}
```
### Continue.dev
Add to `.continue/config.json`:
```json
{
"mcpServers": [
{
"name": "bytebot",
"command": "node",
"args": ["/absolute/path/to/bytebot-mcp-server/dist/index.js"],
"env": {
"BYTEBOT_AGENT_URL": "http://localhost:9991",
"BYTEBOT_DESKTOP_URL": "http://localhost:9990"
}
}
]
}
```
## Usage Examples
### Example 1: Basic Task Creation
```
User: Create a task for ByteBot to search Wikipedia for "quantum computing"
Claude uses: bytebot_create_task
{
"description": "Go to wikipedia.org and search for 'quantum computing'",
"priority": "MEDIUM"
}
Response:
{
"id": "task-123",
"status": "PENDING",
"priority": "MEDIUM",
"createdAt": "2024-01-15T10:30:00Z"
}
```
### Example 2: Hybrid Workflow (Create → Monitor → Complete)
```
User: Create a task to log into example.com and wait for it to complete
Claude uses: bytebot_create_and_monitor_task
{
"description": "Navigate to example.com and log in with credentials from keychain",
"timeout": 60000,
"pollInterval": 2000
}
Response:
{
"taskId": "task-456",
"finalStatus": "COMPLETED",
"completedAt": "2024-01-15T10:31:45Z",
"messagesCount": 12,
"task": { ... full task details ... }
}
```
### Example 3: Task Needs Intervention
```
User: Create a task to fill out a complex form
Claude uses: bytebot_create_and_monitor_task
{
"description": "Fill out the registration form at example.com/register"
}
Response (after monitoring):
{
"taskId": "task-789",
"finalStatus": "NEEDS_HELP",
"task": {
"id": "task-789",
"status": "NEEDS_HELP",
"messages": [
{
"role": "assistant",
"content": "I need the user's phone number to complete this form"
}
]
}
}
User: My phone number is 555-1234
Claude uses: bytebot_intervene_in_task
{
"taskId": "task-789",
"message": "User's phone number is 555-1234",
"action": "resume",
"continueMonitoring": true
}
Response:
{
"taskId": "task-789",
"status": "COMPLETED",
"intervention": "applied"
}
```
### Example 4: Interactive Desktop Control
```
User: Take a screenshot and click at position (500, 300)
Claude uses: bytebot_screenshot
Response: { "screenshot": "iVBORw0KG..." }
Claude uses: bytebot_click
{
"x": 500,
"y": 300,
"button": "left"
}
Response: ✓ bytebot_click completed successfully
```
### Example 5: Multi-Step Workflow
```
User: Execute a workflow to open Firefox, navigate to GitHub, and take a screenshot
Claude uses: bytebot_execute_workflow
{
"steps": [
{
"name": "Open Firefox",
"description": "Switch to Firefox browser application"
},
{
"name": "Navigate to GitHub",
"description": "Navigate to github.com in the browser"
},
{
"name": "Take Screenshot",
"description": "Capture a screenshot of the GitHub homepage"
}
],
"priority": "HIGH"
}
Response:
{
"steps": [
{ "name": "Open Firefox", "taskId": "task-001", "status": "COMPLETED" },
{ "name": "Navigate to GitHub", "taskId": "task-002", "status": "COMPLETED" },
{ "name": "Take Screenshot", "taskId": "task-003", "status": "COMPLETED" }
],
"overallStatus": "completed",
"totalInterventions": 0
}
```
### Example 6: File Operations
```
User: Read the contents of /home/user/data.txt
Claude uses: bytebot_read_file
{
"path": "/home/user/data.txt"
}
Response: { "content": "SGVsbG8gV29ybGQh..." } // Base64 encoded
```
## Troubleshooting
### Error: "Cannot connect to ByteBot server"
**Cause**: ByteBot is not running or endpoint URL is incorrect
**Solution**:
1. Verify ByteBot is running: `curl http://localhost:9991/tasks`
2. Check `.env` file has correct URLs
3. Ensure no firewall blocking connections
### Error: "Request to ByteBot timed out"
**Cause**: Task took longer than configured timeout
**Solution**:
1. Increase `REQUEST_TIMEOUT` in `.env` for Agent API calls
2. Increase `DESKTOP_ACTION_TIMEOUT` for Desktop API calls
3. Use `bytebot_create_and_monitor_task` with custom timeout:
```json
{
"description": "Long running task",
"timeout": 600000
}
```
### Error: "Task with ID xyz not found"
**Cause**: Task was deleted or ID is incorrect
**Solution**:
1. List all tasks: `bytebot_list_tasks`
2. Verify task ID from response
3. Check if task was accidentally deleted
### Warning: "Screenshot size is 8.5MB"
**Cause**: Screenshot is very large (high resolution display)
**Solution**:
1. This is just a warning, screenshot still works
2. Consider reducing screen resolution if frequently capturing screenshots
3. Screenshots >5MB will show this warning
### Error: "Task must be in NEEDS_HELP state"
**Cause**: Attempting to intervene in task that doesn't need help
**Solution**:
1. Check task status first: `bytebot_get_task`
2. Only use `bytebot_intervene_in_task` when status is `NEEDS_HELP`
3. Use `bytebot_update_task` to manually change status if needed
### WebSocket Connection Failed
**Cause**: WebSocket URL incorrect or ByteBot doesn't support WebSocket
**Solution**:
1. Set `ENABLE_WEBSOCKET=false` in `.env` to disable WebSocket
2. Server will automatically fall back to HTTP polling
3. WebSocket is optional - all features work without it
### Error: "File size exceeds maximum allowed size"
**Cause**: Trying to upload/read file larger than 10MB
**Solution**:
1. Increase `MAX_FILE_SIZE` in `.env` (in bytes)
2. Split large files into smaller chunks
3. Compress files before uploading
## API Reference
### Task Priority Levels
- `LOW` - Background tasks, non-urgent
- `MEDIUM` - Default priority (recommended)
- `HIGH` - Important tasks, process soon
- `URGENT` - Critical tasks, process immediately
### Task Lifecycle States
1. `PENDING` - Task created, waiting to start
2. `IN_PROGRESS` - Task currently executing
3. `NEEDS_HELP` - Task blocked, requires intervention
4. `NEEDS_REVIEW` - Task complete but needs verification
5. `COMPLETED` - Task finished successfully
6. `CANCELLED` - Task cancelled by user
7. `FAILED` - Task failed with error
### Mouse Buttons
- `left` - Primary button (default)
- `right` - Context menu button
- `middle` - Scroll wheel click
### Scroll Directions
- `up` - Scroll up
- `down` - Scroll down
- `left` - Scroll left
- `right` - Scroll right
### Common Applications
- `firefox` - Mozilla Firefox
- `chrome` - Google Chrome
- `safari` - Safari (macOS)
- `terminal` - Terminal/Command Prompt
- `vscode` - Visual Studio Code
## Architecture
```
┌─────────────────────────────────────────────┐
│ MCP Client (Claude) │
└─────────────────┬───────────────────────────┘
│ stdio transport
┌─────────────────▼───────────────────────────┐
│ ByteBot MCP Server │
│ ┌────────────────────────────────────────┐ │
│ │ Agent Tools │ Desktop Tools │ │
│ │ Hybrid Orchestrator │ │
│ └────────────┬──────────────┬─────────────┘ │
└───────────────┼──────────────┼───────────────┘
│ │
┌──────────▼──┐ ┌──────▼──────┐
│ Agent API │ │ Desktop API │
│ (port 9991) │ │ (port 9990) │
└─────────────┘ └─────────────┘
│ │
┌──────▼───────────────────▼──────┐
│ ByteBot Instance │
└─────────────────────────────────┘
```
## Development
### Build
```bash
npm run build
```
### Type Check
```bash
npm run type-check
```
### Watch Mode
```bash
npm run dev
```
## Environment Variables Reference
| Variable | Default | Description |
|----------|---------|-------------|
| `BYTEBOT_AGENT_URL` | `http://localhost:9991` | ByteBot Agent API endpoint |
| `BYTEBOT_DESKTOP_URL` | `http://localhost:9990` | ByteBot Desktop API endpoint |
| `BYTEBOT_WS_URL` | `ws://localhost:9991` | WebSocket endpoint for real-time updates |
| `ENABLE_WEBSOCKET` | `false` | Enable WebSocket connections |
| `MCP_SERVER_NAME` | `bytebot-mcp` | Server identifier |
| `REQUEST_TIMEOUT` | `30000` | HTTP request timeout (ms) |
| `DESKTOP_ACTION_TIMEOUT` | `10000` | Desktop action timeout (ms) |
| `MAX_RETRIES` | `3` | Maximum retry attempts for failed requests |
| `RETRY_DELAY` | `1000` | Initial retry delay (ms) |
| `TASK_POLL_INTERVAL` | `2000` | Task status polling interval (ms) |
| `TASK_MONITOR_TIMEOUT` | `300000` | Maximum task monitoring duration (ms) |
| `MAX_FILE_SIZE` | `10485760` | Maximum file size in bytes (10MB) |
| `LOG_LEVEL` | `info` | Logging level (debug/info/warn/error) |
## License
MIT
## Support
For issues and questions:
- ByteBot Documentation: https://docs.bytebot.ai
- MCP Specification: https://modelcontextprotocol.io
- Report issues: Create an issue in this repository
## Version History
### 1.0.0 (2024-01-15)
- Initial release
- Agent API integration (task management)
- Desktop API integration (computer control)
- Hybrid orchestration tools
- WebSocket support for real-time updates
- Comprehensive error handling and retry logic
- Full TypeScript implementation with strict typing