Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Windows MCP Servertake a screenshot of the current window"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Win32 MCP Server
Enterprise-grade Windows automation for AI agents — 47 tools over MCP
The most comprehensive Windows desktop automation server for the Model Context Protocol. Give any MCP-compatible AI agent full control over Windows applications: intelligent text finding and clicking, structured OCR, screenshot capture, mouse/keyboard input, window management, process control, and multi-step batch operations — all through a single MCP server.
What's New in v2.0
47 tools (up from 25) — fully modular, enterprise-quality architecture
Smart automation tools —
click_text,wait_for_text,fill_field,execute_sequence, and moreStructured OCR — bounding boxes, confidence scores, and screen coordinates for every word
Fuzzy window matching — find windows by partial title with intelligent suggestions
DPI-aware coordinates — automatic per-monitor DPI awareness for high-resolution displays
Image preprocessing — auto, light_bg, dark_bg, high_contrast modes for better OCR accuracy
Multi-step sequences — batch multiple tool calls in a single request
Screenshot comparison — pixel-level diff between current screen and a reference image
Window snapshots — combined screenshot + OCR in a single call
Robust error handling — structured JSON errors with actionable suggestions
Features
Smart Automation (the most powerful tools)
Tool | Description |
| Find text on screen and click it — no coordinates needed |
| Locate all occurrences of text with screen coordinates |
| Poll until text appears on screen (with timeout) |
| Verify text is or is not visible (for UI testing) |
| Click a labeled input field and type a value |
| Screenshot + structured OCR in one call |
| Right-click and OCR the context menu items |
| Run up to 50 tools in sequence without round-trips |
Screen Capture (6 tools)
Full screen, per-window, and per-monitor capture
PNG, JPEG, and WebP output with quality/scale controls
Pixel color sampling at any coordinate
Screenshot comparison with similarity metrics
OCR — Optical Character Recognition (5 tools)
Full screen and region-based text extraction
Per-window OCR with automatic focus and capture
Structured mode — every word with bounding box, confidence, line/block/word numbers
Intelligent preprocessing: auto-detects light/dark backgrounds
Coordinates map back to original screen space for accurate clicking
Mouse Control (8 tools)
Click, double-click, triple-click (left/right/middle buttons)
Drag-and-drop with configurable duration and button
Mouse move with smooth animation
Vertical and horizontal scrolling at any position
Current position reporting
Keyboard Control (3 tools)
Type text with Unicode support (auto-fallback to clipboard paste)
Press individual keys or key combinations (
ctrl+c,alt+f4)Execute hotkey combos from arrays (
["ctrl", "shift", "s"])
Clipboard (2 tools)
Copy text to system clipboard
Read current clipboard contents
Window Management (10 tools)
List all windows with fuzzy title filtering
Detailed window info (PID, position, size, state, process name, memory)
Focus, close, minimize, maximize, restore
Resize and move to exact coordinates
Wait for a window to appear (polling with timeout)
Fuzzy matching with intelligent suggestions on miss
Process Management (4 tools)
List processes with filtering, sorting, and pagination
Graceful termination with force-kill fallback
Launch applications with optional wait-for-completion
Wait for a process to become idle (CPU threshold monitoring)
System (1 tool)
health_check— verify all dependencies, DPI, monitors, Tesseract, and tool count
Installation
Prerequisites
Python 3.10+
Tesseract OCR (optional — required only for OCR tools):
Install and ensure it's on PATH
Verify:
tesseract --version
Install Package
From GitHub (recommended — latest v2.0):
pip install git+https://github.com/RandyNorthrup/win32-mcp-server.gitFrom PyPI:
pip install win32-mcp-serverNote: PyPI may lag behind the latest GitHub release. For the newest features, install from GitHub.
From source:
git clone https://github.com/RandyNorthrup/win32-mcp-server.git
cd win32-mcp-server
pip install -e .Configuration
VS Code with GitHub Copilot
Add to your MCP configuration (%APPDATA%\Code\User\mcp.json):
{
"servers": {
"win32-inspector": {
"type": "stdio",
"command": "win32-mcp-server"
}
}
}Or install from the VS Code Marketplace — search "Windows Automation Inspector".
Claude Desktop
Add to %APPDATA%\Claude\claude_desktop_config.json:
{
"mcpServers": {
"win32-inspector": {
"command": "win32-mcp-server"
}
}
}Any MCP Client
The server uses STDIO transport and works with any MCP-compatible client.
Usage Examples
Smart Automation (Natural Language)
"Click the 'Submit' button"
"Wait for 'Loading complete' to appear, then click 'Continue'"
"Fill in the 'Username' field with 'admin@example.com'"
"Take a snapshot of the Chrome window and tell me what you see"
"Right-click the desktop and show me the menu options"Screen Capture
"Capture a screenshot of the entire screen"
"Capture the Notepad window as a compressed JPEG at 50% scale"
"Compare the current screen to this reference image"OCR
"Extract all text from the screen"
"Get structured OCR data from the region at (100, 200) size 800x600"
"Read all text in the Chrome window"Mouse & Keyboard
"Click at (500, 300) with the right mouse button"
"Drag from (100, 100) to (500, 500)"
"Type 'Hello World' — use clipboard paste for Unicode characters"
"Press Ctrl+Shift+S"Window & Process Management
"List all open windows containing 'Visual Studio'"
"Maximize the Chrome window"
"Resize Notepad to 800x600 and move it to (0, 0)"
"Wait for a window titled 'Installation Complete' to appear"
"List the top 20 processes by memory usage"
"Kill process with PID 1234"Batch Operations
"Execute this sequence: click (100,100), wait 500ms, type 'hello', press Enter"All 47 Tools
Smart Automation
Tool | Description |
| Find text on screen and click it |
| Find all text occurrences with coordinates |
| Wait until text appears (polling) |
| Assert text is/isn't visible |
| Click labeled field and type value |
| Screenshot + OCR in one call |
| Right-click and OCR the menu |
| Batch up to 50 tool calls |
Screen Capture
Tool | Description |
| Full screen screenshot (PNG/JPEG/WebP) |
| Window screenshot with fuzzy title match |
| Capture specific monitor by index |
| List monitors with resolution and DPI |
| Get RGB/hex color at coordinates |
| Pixel-level comparison with similarity score |
OCR
Tool | Description |
| Full screen text extraction |
| Region text extraction |
| Window text extraction |
| Full screen OCR with bounding boxes |
| Region OCR with bounding boxes |
Mouse
Tool | Description |
| Click at coordinates (left/right/middle, N clicks) |
| Double-click at coordinates |
| Triple-click to select line/paragraph |
| Drag from start to end with duration |
| Get current cursor position |
| Move cursor with smooth animation |
| Vertical scroll at position |
| Horizontal scroll at position |
Keyboard
Tool | Description |
| Type text (auto Unicode detection, clipboard fallback) |
| Press key or combo ( |
| Hotkey from key array ( |
Clipboard
Tool | Description |
| Copy text to clipboard |
| Read clipboard contents |
Window Management
Tool | Description |
| List windows with optional title filter |
| Detailed window info (PID, process, memory) |
| Bring window to foreground |
| Close window by title |
| Minimize window |
| Maximize window |
| Restore from minimized/maximized |
| Resize to exact dimensions |
| Move to exact position |
| Wait for window to appear (polling) |
Process Management
Tool | Description |
| List processes (filter, sort, paginate) |
| Terminate process (graceful + force fallback) |
| Launch application with optional wait |
| Wait for process CPU to drop below threshold |
System
Tool | Description |
| Full dependency and system status report |
Architecture
win32-mcp-server/
├── win32_mcp_server/
│ ├── __init__.py # Package entry, version
│ ├── __main__.py # python -m support
│ ├── config.py # Dataclass config, PreprocessMode
│ ├── registry.py # Decorator-based tool registry + dispatch
│ ├── server.py # MCP server, stdio transport, health_check
│ ├── utils/
│ │ ├── coordinates.py # DPI awareness, screen geometry, validation
│ │ ├── errors.py # ToolError with suggestions
│ │ ├── imaging.py # Image preprocessing, encoding, diffing
│ │ └── window_match.py # Fuzzy title matching, deduplication, PID
│ └── tools/
│ ├── capture.py # Screenshot tools (6)
│ ├── ocr.py # OCR tools (5)
│ ├── mouse.py # Mouse tools (8)
│ ├── keyboard.py # Keyboard tools (3)
│ ├── clipboard.py # Clipboard tools (2)
│ ├── window.py # Window management tools (10)
│ ├── process.py # Process management tools (4)
│ └── smart.py # Smart automation tools (8)
├── extension.js # VS Code extension bootstrap
├── package.json # VS Code extension manifest
├── pyproject.toml # Python package config
├── server.py # Root entry point
└── LICENSE # MIT LicenseSecurity Considerations
This server has powerful system control capabilities. Only use in trusted environments where you control the MCP client.
The server can:
Capture screenshots of any window or the entire desktop
Read and write the system clipboard
Control mouse and keyboard input
Terminate processes
Launch applications
Recommended Practices
Enable only when needed — disable via VS Code settings when not in use
Review automation logs — all tool calls are logged to stderr
Test in sandboxed environments first
Restrict MCP client access — limit who can invoke the server
Be aware: PyAutoGUI failsafe is disabled for uninterrupted automation
Troubleshooting
Problem | Solution |
| Install from https://github.com/UB-Mannheim/tesseract/wiki and add to PATH |
| Run VS Code / MCP client as Administrator |
|
|
| Use partial title. Run |
OCR returns empty/garbled text | Try |
Coordinates are wrong on HiDPI | The server auto-enables DPI awareness. Run |
Dependencies
Package | Purpose |
Model Context Protocol SDK | |
Fast cross-platform screen capture | |
Image processing and encoding | |
Image preprocessing for OCR | |
Mouse and keyboard automation | |
Window enumeration and control | |
Clipboard operations | |
Tesseract OCR wrapper | |
Process management | |
Fast fuzzy string matching |
Contributing
Contributions welcome!
Fork the repository
Create a feature branch (
git checkout -b feature/my-feature)Commit your changes (
git commit -am 'Add my feature')Push to the branch (
git push origin feature/my-feature)Open a Pull Request
License
MIT License — see LICENSE file.
Links
Repository: https://github.com/RandyNorthrup/win32-mcp-server
VS Code Marketplace: https://marketplace.visualstudio.com/items?itemName=RandyNorthrup.win32-mcp-inspector
Issues: https://github.com/RandyNorthrup/win32-mcp-server/issues
MCP Specification: https://modelcontextprotocol.io/
Author: Randy Northrup
Built for Windows automation and AI agents
This server cannot be installed
Resources
Looking for Admin?
Admins can modify the Dockerfile, update the server description, and track usage metrics. If you are the server author, to access the admin panel.