Skip to main content
Glama
dcl632

windows-gui-mcp

by dcl632

windows-gui-mcp

Windows GUI Automation MCP server for AI coding agents.

windows-gui-mcp helps agents operate Windows desktop applications through semantic UI Automation instead of brittle coordinate clicks. It is designed for agent workflows that need to inspect a live Windows UI, act on stable identifiers, verify every action, and turn successful sessions into reusable scripts.

Why this exists

AI agents can work reliably with web pages because browsers expose structured DOM state. Windows desktop applications are harder: the visible UI is often stateful, asynchronous, and easy to break with raw coordinates.

This project exposes a small MCP toolset that keeps the agent in a safer loop:

  1. Discover visible windows.

  2. Focus the target window.

  3. Dump the UI Automation tree.

  4. Find controls by stable identifiers.

  5. Act with post-action verification.

  6. Use OCR or image fallback only after semantic lookup fails.

  7. Generate a pywinauto replay script from the trace.

Tooling model

AI coding agent
      |
      | MCP stdio
      v
windows_gui_mcp.server
      |
      v
tools/dispatch + trace recorder
      |
      +-- window / element / input / verify / wait
      +-- screenshot / OCR / fallback / trace-to-script
      |
      v
Windows backend ladder
      |
      +-- pywinauto UIA      first choice
      +-- pywinauto win32    legacy fallback
      +-- pyautogui          image/coordinate last resort

MCP tools

Tool

Purpose

list_windows

Enumerate visible top-level windows.

focus_window

Bring a title-matching window to the foreground and verify focus.

dump_ui_tree

Dump the UIA tree so the agent can choose stable identifiers.

find_element

Locate one control by automation_id, name, control_type, or class_name.

click_element

Click a semantically identified control and verify the post-condition.

type_text

Type into a target control and optionally verify the value.

hotkey

Send a pywinauto-style key chord such as ^s or %{F4}.

screenshot

Capture the screen, a window, or a region.

wait_until_element

Wait for a control to exist, become visible, or become enabled.

verify_text_exists

Verify text through UIA first, OCR only when requested.

fallback_click_by_image_or_ocr

Last-resort click by image template or OCR anchor.

generate_stable_script_from_trace

Convert the current trace into a pywinauto replay script.

Install

Python 3.12 or newer is required.

For normal Windows agent use:

py -3.12 -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install "windows-gui-mcp[windows,ocr]"

For local development from this repository:

python -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install -e ".[dev]"

On Windows, install the optional runtime extras when you want live GUI control:

.\.venv\Scripts\python -m pip install -e ".[dev,windows,ocr]"

OCR support is optional. If you use Tesseract OCR, install the Windows package separately and make sure tesseract.exe is on PATH.

Run

Start the MCP server on the Windows machine that owns the desktop session:

windows-gui-mcp

Check CLI metadata without starting the MCP stdio transport:

windows-gui-mcp --help
windows-gui-mcp --version

Example local MCP client config:

{
  "mcpServers": {
    "windows-gui": {
      "command": "windows-gui-mcp"
    }
  }
}

Example SSH-based config from another machine:

{
  "mcpServers": {
    "windows-gui": {
      "command": "ssh",
      "args": [
        "user@windows-host",
        "C:\\path\\to\\windows-gui-mcp\\.venv\\Scripts\\windows-gui-mcp.exe"
      ]
    }
  }
}

Example workflow

This is the intended agent loop for a Notepad or Calculator task:

1. list_windows()
2. focus_window(title_regex="Notepad|Calculator")
3. dump_ui_tree(window_handle=...)
4. find_element(spec={"name": "Save", "control_type": "Button"})
5. click_element(
     spec={"name": "Save", "control_type": "Button"},
     expect_element_after={"class_name": "#32770"}
   )
6. type_text(
     spec={"automation_id": "1001"},
     text="agent-notes.txt",
     verify_value_contains="agent-notes.txt"
   )
7. hotkey("%{ENTER}")
8. generate_stable_script_from_trace()

See examples/notepad_calculator.md for a longer walkthrough.

Safety rules

  • Prefer automation_id, then name, then control_type, then class_name.

  • Do not start with screen coordinates.

  • Verify every click or text entry with a concrete post-condition.

  • Re-dump the UI tree after a failed verification instead of retrying blindly.

  • Treat OCR and image matching as fallbacks, not the primary automation path.

Development checks

python -m compileall -q src tests
python -m pytest -q
ruff check .
python -m build
twine check dist/*

Contributing and security

See CONTRIBUTING.md for development workflow and automation design rules. See SECURITY.md for vulnerability reporting and desktop automation safety expectations.

License

MIT

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
1Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dcl632/windows-gui-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server