# RPA MCP Server
Robotic Process Automation (RPA) REST API service for desktop automation, vision, and workflow orchestration.
## Overview
Spring Boot service providing REST API for desktop automation including screenshot capture, AI vision, mouse/keyboard control, file operations, OCR, and multi-step workflow execution.
## Technology Stack
- **Framework:** Spring Boot
- **Language:** Java
- **Build:** Gradle
- **AI Vision:** Ollama integration
- **Port:** 9100
## Features
### Screenshot & Vision
- Capture screenshots
- AI-powered image description
- OCR text extraction
### Desktop Automation
- Mouse clicks (coordinates)
- Keyboard input and key presses
- Window management (list, focus, close)
### File Operations
- Read/write files
- List directories
- Delete files
### Browser Control
- Open URLs in default browser
- Launch specific browsers (Chrome, Firefox)
### Workflow Orchestration
- Multi-step automation sequences
- Conditional execution
- Error handling
## API Endpoints
**Base URL:** http://royaloak02.local:9100
### Status
- `GET /rpa/status` - Service status
### Screenshot & Vision
- `GET /screenshot` - Capture screenshot (PNG)
- `GET /vision/describe` - AI description of last screenshot
### Automation
- `POST /auto/click?x=100&y=200` - Click coordinates
- `POST /auto/type` - Type text (body: text)
- `POST /auto/key` - Press keys (body: key name)
- `GET /auto/windows` - List open windows
- `POST /auto/focus?title=<name>` - Focus window
- `POST /auto/close?title=<name>` - Close window
### File Operations
- `GET /file/read?path=<path>` - Read file
- `POST /file/write?path=<path>` - Write file (body: content)
- `GET /file/list?path=<path>` - List directory
- `POST /file/delete?path=<path>` - Delete file
### Browser
- `POST /browser/open?url=<url>` - Open in default browser
- `POST /browser/chrome?url=<url>` - Open in Chrome
- `POST /browser/firefox?url=<url>` - Open in Firefox
### OCR
- `GET /ocr/screen` - Extract text from screenshot
- `GET /ocr/file?path=<path>` - Extract text from image
### Workflow
- `POST /workflow/execute` - Execute action sequence
**Example workflow:**
```json
[
{"action": "open", "url": "https://example.com"},
{"action": "wait", "ms": 2000},
{"action": "screenshot"},
{"action": "click", "x": 100, "y": 200}
]
```
## Building
```bash
./gradlew build
```
## Running
```bash
./gradlew bootRun
```
Or use control script:
```bash
./control.sh start
./control.sh stop
./control.sh status
./control.sh log
```
## Documentation
- [API.md](API.md) - Complete API reference
- [IMPROVEMENTS.md](IMPROVEMENTS.md) - Planned improvements
- [SUGGESTIONS.md](SUGGESTIONS.md) - Enhancement suggestions
## Use Cases
- Desktop automation
- UI testing
- Screen scraping
- Workflow automation
- AI-powered vision tasks
- File management automation
- Browser automation