windows-gui-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@windows-gui-mcpFocus on the Calculator window and click the '7' button."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
windows-gui-mcp
Windows GUI Automation MCP server for AI coding agents.
windows-gui-mcp helps agents operate Windows desktop applications through
semantic UI Automation instead of brittle coordinate clicks. It is designed for
agent workflows that need to inspect a live Windows UI, act on stable
identifiers, verify every action, and turn successful sessions into reusable
scripts.
Why this exists
AI agents can work reliably with web pages because browsers expose structured DOM state. Windows desktop applications are harder: the visible UI is often stateful, asynchronous, and easy to break with raw coordinates.
This project exposes a small MCP toolset that keeps the agent in a safer loop:
Discover visible windows.
Focus the target window.
Dump the UI Automation tree.
Find controls by stable identifiers.
Act with post-action verification.
Use OCR or image fallback only after semantic lookup fails.
Generate a pywinauto replay script from the trace.
Tooling model
AI coding agent
|
| MCP stdio
v
windows_gui_mcp.server
|
v
tools/dispatch + trace recorder
|
+-- window / element / input / verify / wait
+-- screenshot / OCR / fallback / trace-to-script
|
v
Windows backend ladder
|
+-- pywinauto UIA first choice
+-- pywinauto win32 legacy fallback
+-- pyautogui image/coordinate last resortMCP tools
Tool | Purpose |
| Enumerate visible top-level windows. |
| Bring a title-matching window to the foreground and verify focus. |
| Dump the UIA tree so the agent can choose stable identifiers. |
| Locate one control by |
| Click a semantically identified control and verify the post-condition. |
| Type into a target control and optionally verify the value. |
| Send a pywinauto-style key chord such as |
| Capture the screen, a window, or a region. |
| Wait for a control to exist, become visible, or become enabled. |
| Verify text through UIA first, OCR only when requested. |
| Last-resort click by image template or OCR anchor. |
| Convert the current trace into a pywinauto replay script. |
Install
Python 3.12 or newer is required.
For normal Windows agent use:
py -3.12 -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install "windows-gui-mcp[windows,ocr]"For local development from this repository:
python -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install -e ".[dev]"On Windows, install the optional runtime extras when you want live GUI control:
.\.venv\Scripts\python -m pip install -e ".[dev,windows,ocr]"OCR support is optional. If you use Tesseract OCR, install the Windows package
separately and make sure tesseract.exe is on PATH.
Run
Start the MCP server on the Windows machine that owns the desktop session:
windows-gui-mcpCheck CLI metadata without starting the MCP stdio transport:
windows-gui-mcp --help
windows-gui-mcp --versionExample local MCP client config:
{
"mcpServers": {
"windows-gui": {
"command": "windows-gui-mcp"
}
}
}Example SSH-based config from another machine:
{
"mcpServers": {
"windows-gui": {
"command": "ssh",
"args": [
"user@windows-host",
"C:\\path\\to\\windows-gui-mcp\\.venv\\Scripts\\windows-gui-mcp.exe"
]
}
}
}Example workflow
This is the intended agent loop for a Notepad or Calculator task:
1. list_windows()
2. focus_window(title_regex="Notepad|Calculator")
3. dump_ui_tree(window_handle=...)
4. find_element(spec={"name": "Save", "control_type": "Button"})
5. click_element(
spec={"name": "Save", "control_type": "Button"},
expect_element_after={"class_name": "#32770"}
)
6. type_text(
spec={"automation_id": "1001"},
text="agent-notes.txt",
verify_value_contains="agent-notes.txt"
)
7. hotkey("%{ENTER}")
8. generate_stable_script_from_trace()See examples/notepad_calculator.md for a longer walkthrough.
Safety rules
Prefer
automation_id, thenname, thencontrol_type, thenclass_name.Do not start with screen coordinates.
Verify every click or text entry with a concrete post-condition.
Re-dump the UI tree after a failed verification instead of retrying blindly.
Treat OCR and image matching as fallbacks, not the primary automation path.
Development checks
python -m compileall -q src tests
python -m pytest -q
ruff check .
python -m build
twine check dist/*Contributing and security
See CONTRIBUTING.md for development workflow and automation design rules. See SECURITY.md for vulnerability reporting and desktop automation safety expectations.
License
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/dcl632/windows-gui-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server