Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@RPA MCP ServerDescribe my current screen and extract any text using OCR."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
RPA MCP Server
Robotic Process Automation (RPA) REST API service for desktop automation, vision, and workflow orchestration.
Overview
Spring Boot service providing REST API for desktop automation including screenshot capture, AI vision, mouse/keyboard control, file operations, OCR, and multi-step workflow execution.
Technology Stack
Framework: Spring Boot
Language: Java
Build: Gradle
AI Vision: Ollama integration
Port: 9100
Features
Screenshot & Vision
Capture screenshots
AI-powered image description
OCR text extraction
Desktop Automation
Mouse clicks (coordinates)
Keyboard input and key presses
Window management (list, focus, close)
File Operations
Read/write files
List directories
Delete files
Browser Control
Open URLs in default browser
Launch specific browsers (Chrome, Firefox)
Workflow Orchestration
Multi-step automation sequences
Conditional execution
Error handling
API Endpoints
Base URL: http://royaloak02.local:9100
Status
GET /rpa/status- Service status
Screenshot & Vision
GET /screenshot- Capture screenshot (PNG)GET /vision/describe- AI description of last screenshot
Automation
POST /auto/click?x=100&y=200- Click coordinatesPOST /auto/type- Type text (body: text)POST /auto/key- Press keys (body: key name)GET /auto/windows- List open windowsPOST /auto/focus?title=<name>- Focus windowPOST /auto/close?title=<name>- Close window
File Operations
GET /file/read?path=<path>- Read filePOST /file/write?path=<path>- Write file (body: content)GET /file/list?path=<path>- List directoryPOST /file/delete?path=<path>- Delete file
Browser
POST /browser/open?url=<url>- Open in default browserPOST /browser/chrome?url=<url>- Open in ChromePOST /browser/firefox?url=<url>- Open in Firefox
OCR
GET /ocr/screen- Extract text from screenshotGET /ocr/file?path=<path>- Extract text from image
Workflow
POST /workflow/execute- Execute action sequence
Example workflow:
Building
Running
Or use control script:
Documentation
API.md - Complete API reference
IMPROVEMENTS.md - Planned improvements
SUGGESTIONS.md - Enhancement suggestions
Use Cases
Desktop automation
UI testing
Screen scraping
Workflow automation
AI-powered vision tasks
File management automation
Browser automation