Which integrations are available for this server?

Enables launching specific URLs and controlling the Firefox web browser as part of desktop automation sequences and workflows. Provides AI-powered vision capabilities by integrating with Ollama to generate descriptions and analysis of captured desktop screenshots.

How do I use RPA MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@RPA MCP Server Describe my current screen and extract any text using OCR." That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

RPA MCP Server

Robotic Process Automation (RPA) REST API service for desktop automation, vision, and workflow orchestration.

Overview

Spring Boot service providing REST API for desktop automation including screenshot capture, AI vision, mouse/keyboard control, file operations, OCR, and multi-step workflow execution.

Technology Stack

Framework: Spring Boot
Language: Java
Build: Gradle
AI Vision: Ollama integration
Port: 9100

Features

Screenshot & Vision

Capture screenshots
AI-powered image description
OCR text extraction

Desktop Automation

Mouse clicks (coordinates)
Keyboard input and key presses
Window management (list, focus, close)

File Operations

Read/write files
List directories
Delete files

Browser Control

Open URLs in default browser
Launch specific browsers (Chrome, Firefox)

Workflow Orchestration

Multi-step automation sequences
Conditional execution
Error handling

API Endpoints

Base URL: http://royaloak02.local:9100

Status

GET /rpa/status - Service status

Screenshot & Vision

GET /screenshot - Capture screenshot (PNG)
GET /vision/describe - AI description of last screenshot

Automation

POST /auto/click?x=100&y=200 - Click coordinates
POST /auto/type - Type text (body: text)
POST /auto/key - Press keys (body: key name)
GET /auto/windows - List open windows
POST /auto/focus?title=<name> - Focus window
POST /auto/close?title=<name> - Close window

File Operations

GET /file/read?path=<path> - Read file
POST /file/write?path=<path> - Write file (body: content)
GET /file/list?path=<path> - List directory
POST /file/delete?path=<path> - Delete file

Browser

POST /browser/open?url=<url> - Open in default browser
POST /browser/chrome?url=<url> - Open in Chrome
POST /browser/firefox?url=<url> - Open in Firefox

OCR

GET /ocr/screen - Extract text from screenshot
GET /ocr/file?path=<path> - Extract text from image

Workflow

POST /workflow/execute - Execute action sequence

Example workflow:

[ {"action": "open", "url": "https://example.com"}, {"action": "wait", "ms": 2000}, {"action": "screenshot"}, {"action": "click", "x": 100, "y": 200} ]

Building

./gradlew build

Running

./gradlew bootRun

Or use control script:

./control.sh start ./control.sh stop ./control.sh status ./control.sh log

Documentation

API.md - Complete API reference
IMPROVEMENTS.md - Planned improvements
SUGGESTIONS.md - Enhancement suggestions

Use Cases

Desktop automation
UI testing
Screen scraping
Workflow automation
AI-powered vision tasks
File management automation
Browser automation

This server cannot be installed

-

security - not tested

F

license - not found

-

quality - not tested

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Report Issue

Related Servers

RPA MCP Server

RPA MCP Server

Overview

Technology Stack

Features

Screenshot & Vision

Desktop Automation

File Operations

Browser Control

Workflow Orchestration

API Endpoints

Status

Screenshot & Vision

Automation

File Operations

Browser

OCR

Workflow

Building

Running

Documentation

Use Cases

Resources

Latest Blog Posts

MCP directory API