WinPilot Computer Use MCP
Provides tools to analyze and understand the state of Elementor (a WordPress page builder) via screenshot and OCR, enabling AI agents to interact with Elementor interfaces through pixel-based control.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@WinPilot Computer Use MCPClick the 'Start' button and open Calculator"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
WinPilot Computer Use MCP
WinPilot is a Windows computer-use MCP server for Codex-style agents. It operates GUI applications the same way a human does:
screen capture
computer vision
OCR
mouse movement
keyboard input
window management
It intentionally avoids application APIs, browser automation APIs, plugins, extensions, and application integrations. Chrome, Photoshop, Elementor, file dialogs, installers, and desktop apps are all treated as pixels plus OS input.
Status
This repository contains the first production-oriented implementation skeleton:
MCP tools for observation, element lookup, waiting, input, windows, screenshots, workflows, permissions, and task execution.
A vision pipeline with OCR, UI primitive detection, scrollable/dialog heuristics, screenshot diffing, annotated screenshots, and a semantic desktop model.
A guarded executor with per-action safety options, before/after screenshots, human-like mouse motion, keyboard entry, and structured logging.
Memory and workflow recording so successful layouts and demonstrations can be reused.
Optional advanced detectors such as PaddleOCR, YOLO, OmniParser, Florence2, and Grounding DINO are wired through extension points. The baseline works with local screenshots, OpenCV, Tesseract, and Windows input primitives.
Related MCP server: Windows-MCP
Install
python -m venv .venv
. .venv\Scripts\Activate.ps1
pip install -e ".[dev]"Install Tesseract OCR separately and make sure tesseract.exe is on PATH.
Optional vision stack:
pip install -e ".[vision]"Run MCP Server
win-pilot-mcpOr:
python -m win_pilot_mcp.mcp.serverFor a background/local HTTP MCP endpoint:
$env:WIN_PILOT_MCP_TRANSPORT="streamable-http"
$env:WIN_PILOT_MCP_HOST="127.0.0.1"
$env:WIN_PILOT_MCP_PORT="8765"
python -m win_pilot_mcp.mcp.serverEndpoint:
http://127.0.0.1:8765/mcpCore Loop
Every task is executed as:
Observe the screen.
Think and choose one next action.
Act through mouse, keyboard, or window controls.
Re-observe.
Verify the result.
Retry or recover if needed.
The planner never executes long blind action sequences.
Safety Levels
Actions are classified into:
read_onlystandardfull_controldangerous
Each action accepts:
{
"dryRun": false,
"requireConfirmation": true,
"takeScreenshotBefore": null,
"takeScreenshotAfter": null,
"verificationMode": "auto"
}The server defaults to standard, which allows normal navigation/input but blocks dangerous
actions unless the permission level is raised. Screenshot verification defaults to auto: low
value actions such as mouse move, scroll, focus, key press, hotkey, and wait skip before/after
screenshots, while uncertain targets, text entry, clicks, drags, window changes, full-control,
and dangerous actions still capture screenshots for accuracy. Set verificationMode to
always or explicit takeScreenshotBefore / takeScreenshotAfter booleans to override.
MCP Tools
Representative tools:
analyze_screenconfigure_optimizationclear_observation_cacheget_performance_statsanalyze_applicationget_desktop_modelget_canvas_stateget_photoshop_stateget_elementor_stateget_browser_stateget_vision_providersdetect_objectsfind_elementwait_for_elementwait_until_disappearswait_until_stabledetect_state_changescompare_screenshotscapture_screencapture_regioncreate_annotated_screenshotmove_mouse,click,double_click,right_click,drag,drag_and_drop,draw_pathscrolltype_text,press_key,hotkey,hold_key,paste_text,select_alllist_windows,focus_window,maximize_window,resize_windowget_permission_level,set_permission_levelget_memory,remember_preference,remember_element,get_remembered_elementstart_recording,stop_recording,list_workflows,learn_from_user,replay_workflowread_logsrecover_from_unexpected_statedecide_next_actionplan_taskexecute_task
Feature Coverage
Screen understanding: OCR text, buttons, icons, toolbars, menus, tabs, dropdowns, checkboxes, radio buttons, inputs, dialogs, images, canvas areas, loading indicators, context menus, notifications, file pickers, scrollables, selected elements, and a semantic desktop model.
Element lookup: text, type, description, image template, color, position, and remembered locations.
Vision stack: Tesseract/PaddleOCR OCR, OpenCV primitives and similarity, YOLO adapter, and explicit provider hooks for OmniParser, Florence2, and Grounding DINO.
Input: human-like mouse movement, click variants, scroll, drag/drop, drawing paths, text typing, key presses, hotkeys, key holds, paste, and select-all.
Windows: list, active window, focus, move, resize, maximize, minimize, and close.
Screenshots: full screen, region, window, comparison, change detection, annotated captures, and stability waits.
Agent loop: observe, plan one step, act, verify, recover, and retry.
plan_taskexposes the planned steps;execute_taskexecutes one action at a time with re-observation.Recovery: detects loading, dialogs, crashes, visible errors, and authentication blocks, then recommends the next recovery action.
Photoshop and Elementor: screenshot/OCR-only semantic state helpers for panels, canvas, widgets, navigator, publish controls, layers/properties/export dialogs, active tool, and inferred document size.
Memory and workflows: remembered elements, preferences, action logs, macro recording, human demonstration capture, workflow listing, and replay.
Safety: read-only, standard, full-control, and dangerous permission levels, plus
dryRun,requireConfirmation, and before/after screenshots on mutating actions.
Project Layout
src/win_pilot_mcp/
mcp/
agent/
vision/
executor/
planner/
memory/
tools/
workflows/
permissions/
logs/
screenshots/Runtime artifacts are written to runtime/ by default.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/omidmanoochehri/win-pilot-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server