Which integrations are available for this server?

Provides tools to analyze and understand the state of Elementor (a WordPress page builder) via screenshot and OCR, enabling AI agents to interact with Elementor interfaces through pixel-based control.

How do I use WinPilot Computer Use MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@WinPilot Computer Use MCP Click the 'Start' button and open Calculator" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

WinPilot Computer Use MCP

by omidmanoochehri

Overview Schema Related Servers Score Discussions

Python

Local

WinPilot Computer Use MCP

WinPilot is a Windows computer-use MCP server for Codex-style agents. It operates GUI applications the same way a human does:

screen capture
computer vision
OCR
mouse movement
keyboard input
window management

It intentionally avoids application APIs, browser automation APIs, plugins, extensions, and application integrations. Chrome, Photoshop, Elementor, file dialogs, installers, and desktop apps are all treated as pixels plus OS input.

Status

This repository contains the first production-oriented implementation skeleton:

MCP tools for observation, element lookup, waiting, input, windows, screenshots, workflows, permissions, and task execution.
A vision pipeline with OCR, UI primitive detection, scrollable/dialog heuristics, screenshot diffing, annotated screenshots, and a semantic desktop model.
A guarded executor with per-action safety options, before/after screenshots, human-like mouse motion, keyboard entry, and structured logging.
Memory and workflow recording so successful layouts and demonstrations can be reused.

Optional advanced detectors such as PaddleOCR, YOLO, OmniParser, Florence2, and Grounding DINO are wired through extension points. The baseline works with local screenshots, OpenCV, Tesseract, and Windows input primitives.

Related MCP server: Windows-MCP

Install

python -m venv .venv
. .venv\Scripts\Activate.ps1
pip install -e ".[dev]"

Install Tesseract OCR separately and make sure tesseract.exe is on PATH.

Optional vision stack:

pip install -e ".[vision]"

Run MCP Server

win-pilot-mcp

Or:

python -m win_pilot_mcp.mcp.server

For a background/local HTTP MCP endpoint:

$env:WIN_PILOT_MCP_TRANSPORT="streamable-http"
$env:WIN_PILOT_MCP_HOST="127.0.0.1"
$env:WIN_PILOT_MCP_PORT="8765"
python -m win_pilot_mcp.mcp.server

Endpoint:

http://127.0.0.1:8765/mcp

Core Loop

Every task is executed as:

Observe the screen.
Think and choose one next action.
Act through mouse, keyboard, or window controls.
Re-observe.
Verify the result.
Retry or recover if needed.

The planner never executes long blind action sequences.

Safety Levels

Actions are classified into:

read_only
standard
full_control
dangerous

Each action accepts:

{
  "dryRun": false,
  "requireConfirmation": true,
  "takeScreenshotBefore": null,
  "takeScreenshotAfter": null,
  "verificationMode": "auto"
}

The server defaults to standard, which allows normal navigation/input but blocks dangerous actions unless the permission level is raised. Screenshot verification defaults to auto: low value actions such as mouse move, scroll, focus, key press, hotkey, and wait skip before/after screenshots, while uncertain targets, text entry, clicks, drags, window changes, full-control, and dangerous actions still capture screenshots for accuracy. Set verificationMode to always or explicit takeScreenshotBefore / takeScreenshotAfter booleans to override.

MCP Tools

Representative tools:

analyze_screen
configure_optimization
clear_observation_cache
get_performance_stats
analyze_application
get_desktop_model
get_canvas_state
get_photoshop_state
get_elementor_state
get_browser_state
get_word_state
get_excel_state
get_powerpoint_state
get_vscode_state
get_illustrator_state
get_player_state
get_settings_state
list_supported_apps
get_shortcuts
run_app_shortcut
get_vision_providers
detect_objects
find_element
wait_for_element
wait_until_disappears
wait_until_stable
detect_state_changes
compare_screenshots
capture_screen
capture_region
create_annotated_screenshot
move_mouse, click, double_click, right_click, drag, drag_and_drop, draw_path
scroll
type_text, press_key, hotkey, hold_key, paste_text, select_all
list_windows, focus_window, maximize_window, resize_window
get_permission_level, set_permission_level
get_memory, remember_preference, remember_element, get_remembered_element
start_recording, stop_recording, list_workflows, learn_from_user, replay_workflow
read_logs
recover_from_unexpected_state
decide_next_action
plan_task
execute_task

Feature Coverage

Screen understanding: OCR text, buttons, icons, toolbars, menus, tabs, dropdowns, checkboxes, radio buttons, inputs, dialogs, images, canvas areas, loading indicators, context menus, notifications, file pickers, scrollables, selected elements, and a semantic desktop model.
Element lookup: text, type, description, image template, color, position, and remembered locations.
Vision stack: Tesseract/PaddleOCR OCR, OpenCV primitives and similarity, YOLO adapter, and explicit provider hooks for OmniParser, Florence2, and Grounding DINO.
Input: human-like mouse movement, click variants, scroll, drag/drop, drawing paths, text typing, key presses, hotkeys, key holds, paste, and select-all.
Windows: list, active window, focus, move, resize, maximize, minimize, and close.
Screenshots: full screen, region, window, comparison, change detection, annotated captures, and stability waits.
Agent loop: observe, plan one step, act, verify, recover, and retry. plan_task exposes the planned steps; execute_task executes one action at a time with re-observation.
Recovery: detects loading, dialogs, crashes, visible errors, and authentication blocks, then recommends the next recovery action.
Photoshop and Elementor: screenshot/OCR-only semantic state helpers for panels, canvas, widgets, navigator, publish controls, layers/properties/export dialogs, active tool, and inferred document size.
Professional app profiles: Word, Excel, PowerPoint, VSCode, Illustrator, media players, Windows Settings, and browsers expose semantic state helpers plus shortcut maps for faster actions without relying on app APIs.
Shortcut-first control: common commands such as word bold, excel format_cells, powerpoint new_slide, vscode command_palette, illustrator pen_tool, player play_pause, and settings open_settings map to keyboard shortcuts before falling back to mouse/vision.
Memory and workflows: remembered elements, preferences, action logs, macro recording, human demonstration capture, workflow listing, and replay.
Safety: read-only, standard, full-control, and dangerous permission levels, plus dryRun, requireConfirmation, and before/after screenshots on mutating actions.

Project Layout

src/win_pilot_mcp/
  mcp/
  agent/
  vision/
  executor/
  planner/
  memory/
  tools/
  workflows/
  permissions/
  logs/
  screenshots/

Runtime artifacts are written to runtime/ by default.

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

Windows-MCP
Shell Access Open Data
StepByStep-1
A
license
-
quality
D
maintenance
Enables AI agents to interact with Windows operating systems through native UI automation, file navigation, application control, and system commands. Provides seamless integration between LLMs and Windows environments for tasks like clicking, typing, launching apps, and capturing desktop state.
Last updated 2025-08-16
MIT
Windows-MCP
OS Automation Shell Access App Automation
zhouke2020
A
license
-
quality
D
maintenance
Enables AI agents to interact with Windows operating systems by providing tools for UI automation, file navigation, application control, and system operations. Works with any LLM to perform tasks like clicking, typing, launching applications, and executing PowerShell commands through native Windows integration.
Last updated 2025-08-25
MIT
Windows MCP Server
OS Automation App Automation
RandyNorthrup
A
license
-
quality
A
maintenance
Enables comprehensive Windows desktop automation including screen capture, OCR text extraction, mouse/keyboard control, window management, process control, and clipboard operations through 25+ tools for AI agents.
Last updated 2026-07-06
3
MIT
windows-gui-mcp
OS Automation App Automation
dcl632
A
license
-
quality
B
maintenance
Enables AI coding agents to automate Windows desktop applications through semantic UI Automation instead of brittle coordinate clicks, with tools for discovering windows, finding controls by stable identifiers, and verifying actions.
Last updated 2026-06-05
1
MIT

View all related MCP servers

Related MCP Connectors

Glasswarp
Eyes and hands on real Windows PCs — observe, click, type via Glasswarp API.
Skyvern
AI-powered browser automation — navigate, click, fill forms, and extract data from any website.
agentbay-mcp
Persistent memory and knowledge management for AI agents with semantic search and 50+ tools.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omidmanoochehri/win-pilot-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

WinPilot Computer Use MCP

Status

Install

Run MCP Server

Core Loop

Safety Levels

MCP Tools

Feature Coverage

Project Layout

Maintenance

Resources

Looking for Admin?

Related MCP Servers

Windows-MCP

Windows-MCP

Windows MCP Server

windows-gui-mcp

Related MCP Connectors

Latest Blog Posts

MCP directory API