Which integrations are available for this server?

Enables detailed automation and information extraction from the Firefox browser, including reading page content, locating UI elements like links and buttons, and retrieving the active URL via the Windows UI Automation tree. Provides specialized support for interacting with Mozilla applications, particularly Firefox, leveraging its comprehensive accessibility tree to provide AI agents with structured data and precise control over browser elements.

How do I use ahk-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@ahk-mcp List all open windows and focus the Notepad application." That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

ahk-mcp

An MCP server that gives AI agents hands on Windows — via AutoHotkey v2 and the Windows UI Automation accessibility tree.

This is the "cursed but effective" alternative to screenshot-based computer use. Instead of sending 1280x720 PNGs back and forth (burning 2000-3500 tokens per action on image encoding alone), ahk-mcp returns structured text: window titles, control lists, accessibility tree nodes, and action confirmations. Typical cost: 200-700 tokens per action.

The thesis is simple: the Windows accessibility tree already contains a machine-readable description of everything on screen. Screenshots throw that away and make the model re-derive it from pixels. Why?

How it works

ahk-mcp exposes 15 tools over MCP's stdio transport:

Observation tools read the accessibility tree, window text, and UI element properties
Action tools send keystrokes, click coordinates, manage clipboard, launch programs
ahk_eval is the escape hatch — execute arbitrary AHK v2 code for anything the built-in tools don't cover

Every action tool reports context before and after execution (which window was focused, what changed). This "guardrails pattern" means the agent always knows what it just did and what state the system is in, without needing a follow-up screenshot.

Token cost comparison

Approach	Tokens per action	What you get
Screenshot-based (1280x720 PNG)	~2000-3500	Pixels. Model must OCR, locate elements, interpret layout.
ahk-mcp (structured text)	~200-700	Window title, control names, accessibility tree nodes, action confirmation.

The savings compound fast. A 20-step workflow that would cost ~50k tokens in screenshots costs ~8k tokens with ahk-mcp. More importantly, the structured output is more reliable — the model doesn't have to guess where the "Save" button is from pixels when the accessibility tree says Button: "Save" @1043,672,88x32.

Installation

Prerequisites

Windows 10/11 (this is an AutoHotkey project — Windows is the point)
Python 3.10+
AutoHotkey v2 — install via winget (the official distribution, avoids dodgy third-party repackages):
```
winget install AutoHotkey.AutoHotkey
```
This puts it at %LOCALAPPDATA%\Programs\AutoHotkey\v2\AutoHotkey64.exe. If you prefer a manual install, download only from autohotkey.com — avoid other sources.

Setup

# Clone the repo
git clone https://github.com/anomalous3/ahk-mcp.git
cd ahk-mcp

# Create a virtual environment (uv or plain venv)
uv venv .venv
# or: python -m venv .venv

# Activate
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Verify AHK is found

The server looks for AutoHotkey at the default install path. If yours is elsewhere, set the AHK_EXE environment variable:

set AHK_EXE=C:\Path\To\AutoHotkey64.exe

Claude Code MCP configuration

Add this to your ~/.claude.json (or the project-level .claude/settings.json):

{
  "mcpServers": {
    "ahk": {
      "command": "C:\\Users\\YOUR_USER\\ahk-mcp\\.venv\\Scripts\\python.exe",
      "args": ["C:\\Users\\YOUR_USER\\ahk-mcp\\server.py"],
      "env": {
        "PYTHONIOENCODING": "utf-8",
        "PYTHONUNBUFFERED": "1"
      }
    }
  }
}

Replace YOUR_USER with your Windows username. The PYTHONIOENCODING env var prevents Windows cp1252 encoding crashes on Unicode output.

After adding the config, restart Claude Code. The tools will appear with the mcp__ahk__ prefix.

Tool reference

Observation

Tool	Description
`ahk_windows`	List all visible windows with title, class, PID, and position. The "what's on screen" overview.
`ahk_window_info`	Detailed info on the active window: title, class, PID, process, position, and a list of all UI controls with their text.
`ahk_read`	Read all text content from a window (active or by title). Cheap and fast.
`ahk_uia_tree`	Dump the UI Automation accessibility tree. This is how you read browser content, form fields, and UI state that WinGetText can't see. Configurable depth and node limit.
`ahk_uia_find`	Search for UI elements by name and/or control type. Returns matches with position and value. Find that "Submit" button without scanning the whole tree.
`ahk_uia_url`	Get the current URL from the active browser's address bar (Firefox, Chrome, Edge).
`ahk_screenshot`	Capture a screenshot of a window, region, or full screen. Returns a PNG file path. Uses PrintWindow for per-window capture (works even when the window is occluded). The fallback when you genuinely need pixels.

Action

Tool	Description
`ahk_focus`	Activate/focus a window by title (partial match).
`ahk_send`	Send keystrokes using AHK syntax: `{Enter}`, `^c` (Ctrl+C), `!{F4}` (Alt+F4), `+{Home}` (Shift+Home), etc.
`ahk_type`	Type plain text via SendText (no special key interpretation). Use this for typing into fields.
`ahk_click`	Click at screen coordinates. Reports which window was active before and after.
`ahk_clipboard`	Get or set the system clipboard.
`ahk_run`	Launch a program, open a file, or navigate to a URL.
`ahk_msgbox`	Show a message box to the user (notifications, confirmations).

Escape hatch

Tool	Description
`ahk_eval`	Execute arbitrary AHK v2 code. Output via `FileAppend "text", "*"`, end with `ExitApp`. Full AHK v2 language: COM automation, DllCall, regex, file I/O, window manipulation — anything.

Browser automation via UI Automation

Modern browsers (Firefox, Chrome, Edge) render everything through GPU surfaces — WinGetText returns nothing useful. The UIA tools solve this by reading the browser's accessibility tree, which is the same interface screen readers use.

Read page content — every link, button, text element, and form field, with names and pixel coordinates:

> ahk_uia_find target="Firefox" control_type="Hyperlink"

Found 16 element(s) in: anomalous3/hearth — Mozilla Firefox
HyperlinkControl: "Issues" @4425,283,32x32 = https://github.com/issues
HyperlinkControl: "Pull requests" @4465,283,32x32 = https://github.com/pulls
...

Get the current URL without screenshots or clipboard tricks:

> ahk_uia_url

Window: anomalous3/hearth — Mozilla Firefox
URL: https://github.com/anomalous3/hearth

Dump the full accessibility tree to understand page structure:

> ahk_uia_tree target="Firefox" max_depth=5

Window: anomalous3/hearth — Mozilla Firefox
  ToolBarControl: "Navigation"
    ComboBoxControl = https://github.com/anomalous3/hearth
      EditControl: "Search with Google or enter address" = https://github.com/anomalous3/hearth
  DocumentControl = https://github.com/anomalous3/hearth
    HyperlinkControl: "Issues" = https://github.com/issues
    ...

The UIA approach gives you element names, types, values, and bounding rectangles — everything you need to read and interact with browser content. Combine with ahk_click (using coordinates from UIA) or ahk_send (for keyboard navigation) to drive the browser without any browser-specific automation framework.

Use Firefox. It exposes the richest accessibility tree of the major browsers — more element detail, better labeling, and more consistent structure than Chrome or Edge. The examples above are all from Firefox.

The guardrails pattern

Every action tool reports what happened:

Target: Untitled - Notepad [notepad.exe]
Sent: ^a

Before: Mozilla Firefox
Clicked: 500,300 left
After: Mozilla Firefox

This is deliberate. The agent always knows:

What window was active when the action fired
What the action was
What changed afterward

No guessing, no "did that click land?" ambiguity. If the active window changed unexpectedly, the agent sees it immediately in the response and can course-correct.

The `ahk_eval` philosophy

The built-in tools cover ~80% of common tasks. For the other 20%, ahk_eval gives the agent full access to AutoHotkey v2 — which is a surprisingly capable automation language with COM object support, DllCall for the entire Win32 API, regex, file I/O, and window manipulation primitives.

In practice, agents learn to use ahk_eval for things like:

Multi-step sequences (select all, copy, process clipboard, paste result)
COM automation (Excel, Outlook, Word via their COM interfaces)
Fine-grained window manipulation (resize, move, set transparency)
Anything the built-in tools don't cover

The convention: output results with FileAppend "text", "*" (writes to stdout) and end scripts with ExitApp.

Platform notes

ahk-mcp is Windows-only because AutoHotkey is Windows-only. That said, the approach is portable:

Linux: AT-SPI2 provides an equivalent accessibility tree. The python-atspi package or busctl can read it. Keyboard/mouse synthesis via xdotool or ydotool (Wayland).
macOS: The Accessibility API exposes the same tree. pyobjc can read it, and cliclick or AppleScript can drive input.

The core insight — that structured accessibility data is cheaper and more reliable than screenshots for most automation tasks — applies everywhere. The AHK-specific parts are just the Windows implementation.

Configuration

Environment variables:

Variable	Default	Description
`AHK_EXE`	`%LOCALAPPDATA%\Programs\AutoHotkey\v2\AutoHotkey64.exe`	Path to AutoHotkey v2 executable
`AHK_TIMEOUT`	`10`	Default script execution timeout in seconds

License

MIT

This server cannot be installed

A

license - permissive license

-

quality - not tested

C

maintenance

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

ahk-mcp

ahk-mcp

How it works

Token cost comparison

Installation

Prerequisites

Setup

Verify AHK is found

Claude Code MCP configuration

Tool reference

Observation

Action

Escape hatch

Browser automation via UI Automation

The guardrails pattern

The `ahk_eval` philosophy

Platform notes

Configuration

License

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API

ahk-mcp

How it works

Token cost comparison

Installation

Prerequisites

Setup

Verify AHK is found

Claude Code MCP configuration

Tool reference

Observation

Action

Escape hatch

Browser automation via UI Automation

The guardrails pattern

The ahk_eval philosophy

Platform notes

Configuration

License

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API

The `ahk_eval` philosophy