Skip to main content
Glama
sshh12

windows-computer-use-mcp

by sshh12

windows-computer-use-mcp

A Model Context Protocol server that gives a Claude agent full control of the local Windows desktop — native screen capture, low-level input injection, video recording, and a play-test loop for driving games and apps.

Unlike Anthropic's sandboxed computer-use tool, this runs on the machine it controls: it reads the actual current displays (no resolution requests), is multi-monitor and per-monitor DPI aware, and injects input via SendInput scan codes so it works in games that ignore synthetic virtual-key events. Built for full Claude control — no security gating.

Why this exists

Anthropic's official computer use in the Claude Code CLI is a macOS-only research preview — Pro/Max only, interactive sessions only (not available with the -p flag). The cross-platform alternative is the Claude Desktop app. There is no official, non-Desktop computer-use for Windows: nothing you can drive headlessly from claude -p, from the API, or wire into an agent over MCP.

This server fills that gap. It's a standard MCP server, so it works on Windows in Claude Code (interactive and -p), in Claude Desktop, or from any MCP client / custom agent — with no plan gating — and it's tuned for what a Windows agent actually needs that the sandboxed cloud tool can't do: real multi-monitor capture, per-window GPU capture, game-grade input, and play-testing.

Tools

Tool

What it does

screenshot

See the screen: whole desktop, a display:N, a window (even occluded/DirectX via PrintWindow), or a region. Downscaled inline image + a coordinate frame for clicks.

act

Do input, batched: left_click, type, key, scroll, drag, hold_key, paste, click_element (UIA, no pixels), mouse_move_relative (game look), … Coordinates are in the last screenshot's image space; the server maps them to physical pixels.

record

Record N seconds → a single timestamped frame montage (not N images) + an mp4. Judge motion/animation/stutter.

play

Drive a timed input script at a cadence while recording (scan codes + relative mouse). probe/until read telemetry per sample and stop early — the closed loop for play-testing.

window

Find / focus / close / read (get_text via UI Automation + OCR) / click_element controls with no screenshot — token-cheap.

process

Launch (incl. shell:true for URLs / ms-settings: / Store apps), kill, wait, run a shell command (real stdout), and wait on readiness (wait_for_window, wait_for_file).

system

Monitor layout + DPI/scale, cursor position, and clipboard get/set.

See TOOL_DESIGN.md for the full parameter reference and design rationale.

Coordinates & multi-monitor

Coordinates are physical pixels in virtual-desktop space (primary monitor's top-left is (0,0); monitors to the left/above are negative). You click in the image space of the last screenshot; the server maps that back to physical pixels (handling downscale, per-monitor offset, and DPI). Every screenshot returns a capture_id; act errors loudly if you click against a stale frame instead of mis-clicking. Use system displays to see the layout, then target a specific monitor with display:0 / display:primary|left|right.

Install

/plugin marketplace add sshh12/claude-plugins
/plugin install windows-computer-use@shrivu-plugins

The plugin bootstraps a Python virtual environment and installs this package from GitHub on first run, then starts the MCP server automatically.

Standalone (project-local MCP)

Requires Python 3.10+ and (for video) ffmpeg on PATH.

pip install git+https://github.com/sshh12/windows-computer-use-mcp

Then add to your MCP client config (e.g. a project .mcp.json):

{
  "mcpServers": {
    "windows-computer-use": {
      "command": "python",
      "args": ["-m", "windows_computer_use"]
    }
  }
}

Development

python -m venv .venv
.venv\Scripts\python.exe -m pip install -e .
.venv\Scripts\python.exe tests\smoke_engine.py    # capture/input/display engine
.venv\Scripts\python.exe tests\smoke_server.py     # assembled MCP tool surface

MCP_OUTPUT_DIR overrides where screenshots/video are written (default: a client root → ~/Pictures/windows-computer-use%TEMP%).

License

MIT

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sshh12/windows-computer-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server