MCP Windows Desktop Automation
by mario-andreschak
Verified
# MCP Windows Desktop Automation
A Model Context Protocol (MCP) server for Windows desktop automation using AutoIt.
## Overview
This project provides a TypeScript MCP server that wraps the [node-autoit-koffi](https://www.npmjs.com/package/node-autoit-koffi) package, allowing LLM applications to automate Windows desktop tasks through the MCP protocol.
The server exposes:
- **Tools**: All AutoIt functions as MCP tools
- **Resources**: File access and screenshot capabilities
- **Prompts**: Templates for common automation tasks
## Features
- Full wrapping of all AutoIt functions as MCP tools
- Support for both stdio and WebSocket transports
- File access resources for reading files and directories
- Screenshot resources for capturing the screen or specific windows
- Prompt templates for common automation tasks
- Strict TypeScript typing throughout
## Installation
```bash
# Clone the repository
git clone https://github.com/yourusername/mcp-windows-desktop-automation.git
cd mcp-windows-desktop-automation
# Install dependencies
npm install
# Build the project
npm run build
```
## Usage
### Starting the Server
```bash
# Start with stdio transport (default)
npm start
# Start with WebSocket transport
npm start -- --transport=websocket --port=3000
# Enable verbose logging
npm start -- --verbose
```
### Command Line Options
- `--transport=stdio|websocket`: Specify the transport protocol (default: stdio)
- `--port=<number>`: Specify the port for WebSocket transport (default: 3000)
- `--verbose`: Enable verbose logging
## Tools
The server provides tools for:
- **Mouse operations**: Move, click, drag, etc.
- **Keyboard operations**: Send keystrokes, clipboard operations, etc.
- **Window management**: Find, activate, close, resize windows, etc.
- **Control manipulation**: Interact with UI controls, buttons, text fields, etc.
- **Process management**: Start, stop, and monitor processes
- **System operations**: Shutdown, sleep, etc.
## Resources
The server provides resources for:
- **File access**: Read files and list directories
- **Screenshots**: Capture the screen or specific windows
## Prompts
The server provides prompt templates for:
- **Window interaction**: Find and interact with windows
- **Form filling**: Automate form filling tasks
- **Automation tasks**: Create scripts for repetitive tasks
- **Monitoring**: Wait for specific conditions
## Development
```bash
# Run in development mode
npm run dev
# Lint the code
npm run lint
# Run tests
npm run test
```
## License
MIT