Skip to main content
Glama

Branches

mac-cua ships two branches. Pick the one that matches your needs.

release

confirmed-delivery-pipeline

Stability

Base implementation, fewer moving parts

Experimental, actively developed

Input delivery

CGEventPostToPid with compound modifier events

Confirmed delivery pipeline: per-event transport confirmation via on-demand CGEventTap, SkyLight SPI fallback, micro-activation retry

Event source

Shared global CGEventSource

Per-session isolated CGEventSource (no cross-session state leakage)

Verification

Dual-monitor system (CGEvent transport + AX outcome)

Snapshot-based: trust transport, return fresh AX tree + screenshot as ground truth. Old event-driven monitors removed (raced against AX propagation, caused false negatives)

Mouse delivery

Mouse-move pre-positioning before every click/drag/scroll

No mouse-move events (eliminated cursor teleportation). Down/up events carry point + window hints directly

Scroll

scroll_system with cursor warp + line-based deltas

Pixel-based scroll with dual integer + fixed-point deltas (Chromium + Cocoa compat). Cursor warp removed

Keyboard

Compound modifier events (modifiers as flags on keyDown/keyUp)

Same for CGEvent path (discrete flagsChanged leaked to global state). SkyLight path uses discrete modifiers

New modules

--

skylight.py (CGS SPIs), delivery_tap.py (transport confirmation), confirmed_verification.py (ActionVerifier)

Tests

216 passing

270 passing

Known issues

Event-driven verification produces false "no observed effect" errors when AX lags

Event taps add brief overhead during action delivery (~50ms). SkyLight SPIs may not be available on all macOS versions

Both branches are early-stage software. Contributors are welcome on either branch -- see Contributing below.

Use release if you want the base implementation with fewer moving parts.

Use confirmed-delivery-pipeline if you want the latest input delivery fixes, elimination of false-negative verification errors, and per-session isolation.

# Clone and use the stable branch (default)
git clone https://github.com/hyprcat/mac-cua.git
cd mac-cua && git checkout release

# Or use the experimental branch
git checkout confirmed-delivery-pipeline

Demo

Task: Search Google for the population of Tokyo, calculate what percentage that is of the world population (8.1 billion), write the result in Notes, then open the Wikipedia page for Tokyo in Safari and find the mayor's name. Only CUA tools — no other tools allowed.

Demo video

Result from Claude:

  1. Searched Google for "population of Tokyo" — found approximately 14.1–14.2 million residents

  2. Calculated the percentage: 14.15 million / 8.1 billion = 0.17% of the world's population

  3. Wrote the result in Notes — created a new note titled "Tokyo Population vs World Population"

  4. Opened the Wikipedia page for Tokyo in Safari and found that the leader of Tokyo is Governor Yuriko Koike (Tokyo has a Governor rather than a mayor, since it's officially the Tokyo Metropolitan Prefecture)


Most computer use agents take over your screen. They grab your cursor, steal window focus, and lock you out while they work. You sit and watch.

mac-cua works differently. It sends input events directly to target processes using CGEventPostToPid — a macOS API that delivers clicks, keystrokes, and gestures to a specific app without moving your cursor or activating any window. The AI works in the background. You keep working in the foreground. At the same time. On the same machine.

OpenAI and Perplexity both shipped computer use agents this week — locked to their platforms, behind paywalls. mac-cua is the same capability as an open MCP tool. Plug it into Claude Code, Cursor, Codex, or any MCP client. Free, open source, Apache 2.0.

Background-First

This is the core idea behind mac-cua, and it influences every design decision.

  Traditional computer use agent:            mac-cua:

  +----------------------------------+       +----------------------------------+
  |  YOUR SCREEN                     |       |  YOUR SCREEN                     |
  |                                  |       |                                  |
  |  +----------------------------+  |       |  +----------------------------+  |
  |  |                            |  |       |  |                            |  |
  |  |  [Agent controls this]     |  |       |  |  You're working here.      |  |
  |  |  You're locked out.        |  |       |  |  Writing code, browsing,   |  |
  |  |  Cursor hijacked.          |  |       |  |  whatever you want.        |  |
  |  |  Focus stolen.             |  |       |  |                            |  |
  |  |  Don't touch anything.     |  |       |  |  Your cursor. Your focus.  |  |
  |  |                            |  |       |  |                            |  |
  |  +----------------------------+  |       |  +----------------------------+  |
  |                                  |       |                                  |
  |  Cursor: [Agent's]               |       |  Meanwhile, in the background:   |
  |  Focus:  [Agent's]               |       |  mac-cua clicks, types, scrolls  |
  |  You:    Watching.               |       |  in Safari, Music, Finder...     |
  +----------------------------------+       +----------------------------------+

How it stays invisible

What

How

Mouse clicks

CGEventPostToPid sends click events to the target PID. Your cursor doesn't move.

Keyboard input

Key events are posted to the target process, not the global event stream.

Window focus

Mac-cua reads window state without activating windows. Temporary activation happens only when strictly required (e.g., key-window targeting) and is immediately released.

Screenshots

GPU-accelerated ScreenCaptureKit captures specific windows by ID — works even if the window is behind other windows.

AX tree reads

Accessibility API queries are read-only and non-intrusive. They don't trigger any visual changes.

A note on focus

Most operations are fully invisible, but a few macOS APIs have limitations that may cause a brief, momentary focus flash:

  • Launching an app — macOS activates apps when they start; mac-cua yields focus back immediately

  • Scroll events — some apps require momentary focus to receive scroll input

  • Key-window targeting — certain actions need the window to be key window briefly

These flashes are sub-second and mac-cua restores your previous focus automatically. The vast majority of interactions — clicks, typing, value setting, screenshots, tree reads — are completely invisible.

What this means in practice

  • You can browse the web while mac-cua fills out a form in another app

  • You can write code while mac-cua navigates System Settings to change a preference

  • You can be in a video call while mac-cua organizes files in Finder

  • The agent never interrupts you. If a conflict arises, you win — mac-cua detects user interruption and backs off

Why mac-cua?

Codex CUA

Perplexity Computer

mac-cua

Cost

$20–200/mo (ChatGPT tier)

$200/mo (Max only)

Free

Source

Closed

Closed

Open (Apache 2.0)

LLM

GPT only

Perplexity-routed

Any model

Protocol

Proprietary (in-app)

Proprietary (in-app)

MCP (open standard)

Integration

Codex app only

Perplexity app only

Claude Code, Cursor, VS Code, Codex, Zed, any MCP client

Background mode

Yes (virtual cursor)

Unknown

Yes (CGEventPostToPid)

Accessibility API

Yes (AX tree + screenshots)

Screenshots + AppleScript

Yes (AX tree + screenshots)

Platform

macOS only

macOS only

macOS

Availability

Not in EU/UK/CH

Waitlist (Max subscribers)

Everyone, everywhere

Quickstart

Prerequisites

  • macOS 13+ (Ventura or later)

  • Python 3.13+

  • uv package manager

Install

git clone https://github.com/hyprcat/mac-cua.git
cd mac-cua
uv sync

Run

uv run python main.py

On first launch, macOS will prompt for two permissions:

Permission

Why

Accessibility

Read UI element trees and perform actions on elements

Screen Recording

Capture window screenshots without activating windows

Grant both, and the MCP server starts on stdio — ready for your AI tool to connect.

Setup Your AI Tool

mac-cua is a standard MCP stdio server. It works with any tool that supports the Model Context Protocol — no plugins, no extensions, just config.

Note: Replace /path/to/mac-cua with the actual path where you cloned the repo.

Option A — CLI command (recommended):

claude mcp add mac-cua -- uv run --directory /path/to/mac-cua python main.py

Option B — Manual config in ~/.claude.json or project .mcp.json:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Restart Claude Desktop after saving.

Option A — Project-level: Create .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Option B — Global: Create ~/.cursor/mcp.json with the same content.

Create .vscode/mcp.json in your project root:

{
  "servers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Requires the GitHub Copilot extension with MCP support enabled.

Open Windsurf Settings > MCP and add:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Or edit ~/.codeium/windsurf/mcp_config.json directly.

Create or edit ~/.codex/config.json:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Create .amp/mcp.json in your project root (or ~/.amp/mcp.json globally):

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Add to your Zed settings.json (Zed > Settings > Open Settings):

{
  "context_servers": {
    "mac-cua": {
      "command": {
        "path": "uv",
        "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
      }
    }
  }
}

Open Cline settings in VS Code, navigate to MCP Servers, and add:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

mac-cua is a standard MCP stdio server. Point your client at:

Command:  uv
Args:     run --directory /path/to/mac-cua python main.py
Protocol: stdio

No API keys, no accounts, no network calls. It runs locally on your Mac.

How It Works

mac-cua reads apps through two complementary channels and acts through background-targeted input:

                        +-----------------------+
                        |      LLM Client       |
                        |  (Claude, GPT, etc.)  |
                        +-----------+-----------+
                                    |
                              MCP (stdio)
                                    |
                        +-----------+-----------+
                        |     mac-cua Server    |
                        +-----------+-----------+
                                    |
                    +---------------+---------------+
                    |                               |
          +---------+---------+           +---------+---------+
          |   Accessibility   |           |    Screenshots    |
          |    API (AXTree)   |           | (ScreenCaptureKit)|
          +---------+---------+           +---------+---------+
                    |                               |
          Structured element              Visual pixel-level
          tree with roles,                context via GPU-
          states, actions                 accelerated window
          (read-only, non-               capture (works even
           intrusive)                     behind other windows)
                    |                               |
                    +---------------+---------------+
                                    |
                        +-----------+-----------+
                        |   Background Input    |
                        |   CGEventPostToPid    |
                        |                       |
                        |  Your cursor: unmoved |
                        |  Your focus: untouched|
                        +-----------------------+

Every tool call returns a fresh snapshot — the accessibility tree and a screenshot together — so the LLM always sees the current state before deciding what to do next.

Tools

9 MCP tools that cover the full range of desktop interaction — all operating in the background.

Discovery

Tool

Description

list_apps

List running and recently-used apps with bundle IDs and usage stats

get_app_state

Capture a window's accessibility tree + screenshot.Called each turn before interaction.

Interaction

Tool

Description

click

Click by element index or pixel coordinates. Supports double-click, right-click. All clicks are background-targeted.

type_text

Type literal text via background keyboard input— keys go to the target process, not your focused app

press_key

Send key combos inxdotool syntax (super+c, Return, Tab) to a specific process

set_value

Directly set an accessibility element's value— no focus or typing needed

scroll

Scroll a specific element by direction and page count

drag

Drag between two pixel coordinates

perform_secondary_action

Invoke non-primary AX actions (expand, collapse, zoom, raise)

Reliability Hierarchy

When multiple tools could accomplish the same thing, prefer them in this order:

  Most reliable                                          Least reliable
  +-------------------+------------------+-----------------+------------------+
  | AX secondary      | set_value        | click by        | click by         |
  | action            |                  | element         | coordinates      |
  +-------------------+------------------+-----------------+------------------+

Example Workflow

Here's what a typical interaction looks like. Notice: every step happens in the background.

# 1. Discover what's running
list_apps()
# => Safari (running), Music (running), Finder (running), ...

# 2. Get the current state of Safari (screenshot + AX tree)
get_app_state(app="Safari")
# => You don't even see Safari activate. mac-cua reads it silently.

# 3. Click the URL bar (element 12 from the tree)
click(app="Safari", element_index="12")
# => Click delivered to Safari's process. Your cursor didn't move.

# 4. Set the URL
set_value(app="Safari", element_index="12", value="https://example.com")
# => Value set directly via AX API. No typing animation. No focus change.

# 5. Press Enter
press_key(app="Safari", key="Return")
# => Key event sent to Safari. You didn't feel a thing.

# 6. Verify it worked
get_app_state(app="Safari")
# => Fresh screenshot shows the page loaded. All in the background.

Architecture

Three clean layers. No framework magic.

Layer 1 ─ MCP Protocol         app/server.py         Thin. Validates, delegates, formats.
Layer 2 ─ Session Manager       app/session.py        Per-app lifecycle, snapshots, recovery.
Layer 3 ─ Platform Backend      app/_lib/             One module per macOS subsystem.

Platform Backend Modules

Module

Responsibility

accessibility.py

AX tree walking, batch attribute reads, element actions

screenshot.py

CGWindowListCreateImage, window ID resolution

screen_capture.py

GPU-accelerated ScreenCaptureKit capture

input.py

CGEventPostToPid — background mouse, keyboard, typing

apps.py

NSWorkspace app discovery, launch, PID/AX caching

focus.py

Focus tracking, user interruption detection, conflict resolution

virtual_cursor.py

Background cursor, input strategy, app-type detection

selection.py

Text selection extraction and formatting

tree.py

AX tree→ indexed text serialization

pruning.py

Smart tree pruning to fit LLM context windows

keys.py

xdotool syntax→ CGKeyCode + modifier mapping

event_tap.py

CGEventTap wrapper with auto-reenable

safety.py

App/URL blocklists, SSRF protection

retry.py

Exponential backoff policies

elicitation.py

App approval store (session + persistent)

lifecycle.py

Per-turn cleanup and step tracking

errors.py

Typed exceptions and AX error code table

Key Design Decisions

  • CGEventPostToPid, never CGEventPost — all input is process-targeted. The global event stream (your cursor, your keyboard) is never touched

  • Window capture without activationScreenCaptureKit captures by window ID, even if the window is fully occluded

  • User interruption detection — if you start using an app the agent is working in, mac-cua detects the conflict and yields to you

  • Snapshot-local indices — element indices are valid only for the snapshot that produced them; no stale references

  • Cross-app robustness — detects and adapts to Native Cocoa, Electron, Safari, Chrome, Java, and Qt apps

  • Event-driven settlingwait_for_settle with per-tool timeouts and debounce, not fixed sleep() calls

  • Per-app guidance — custom operational hints per bundle ID (e.g., app/guidance/com.apple.Music.md)

Cross-App Test Results

Tested across 4 apps and all 9 tools. ~30 tool invocations, zero crashes, zero focus theft, zero cursor hijacking.

App Compatibility

App

Type

AX Tree

Notes

TextEdit

Native Cocoa

Rich (50+ elements)

All 9 tools work. Best overall compatibility.

VS Code

Electron

Very rich (900+ elements)

click, get_app_state, press_key work. set_value rejected (Electron limitation).

Terminal

Native Cocoa

Sparse

type_text and press_key work. Limited AX exposure (shell text area only).

Finder

Native Cocoa

Sparse (desktop)

get_app_state works. Desktop-only view has minimal interactive elements.

Macs Fan Control

Qt

None

get_app_state fails gracefully. AX window not resolvable.

Tool Results

Tool

Native Cocoa

Electron

Notes

list_apps

PASS

PASS

Returns all running apps with window IDs, PIDs, bounds

get_app_state

PASS

PASS

AX tree + screenshot on every call. Fails gracefully on Qt apps with no AX

click

PASS

PASS

Element clicks work cross-app. Double-click fails on Electron

type_text

PASS

PASS

Delivered to correct process even with multiple apps. macOS auto-correct can mangle text

press_key

PASS

PASS

Cmd+A, Cmd+Z, Shift+End all land. Some complex shortcuts don't trigger in background

set_value

PASS

FAIL

Instant on Cocoa (replaced entire documents). Electron rejects AX value setting

scroll

PASS

PARTIAL

Works on Cocoa elements. Electron webviews block AX scroll

drag

PASS

Coordinates-based, worked for precise UI manipulation (ruler markers)

perform_secondary_action

PASS

Zoom, fullscreen work. Raise blocked by design (would steal focus)

What works well

  • AX element clicks are the most reliable cross-app interaction

  • set_value is instant on Native Cocoa — replaces entire document contents without typing

  • type_text delivers to the correct process even with multiple apps running

  • press_key handles complex modifiers (Cmd+A, Cmd+Z, Shift+End) reliably

  • Screenshots capture every window correctly regardless of visibility

Known Limitations

  • Electron set_value — VS Code and other Electron apps reject AX value setting. Use type_text instead

  • Electron double-click — click_count > 1 fails on some Electron elements

  • Electron scroll — webview elements don't expose scroll to AX. Coordinate-based scrolling needed

  • macOS auto-correct — type_text triggers the system spell checker, which can mangle text

  • Qt apps — apps with zero AX exposure (like Macs Fan Control) can't be automated

  • Background shortcuts — some complex key combos don't trigger when sent via background events

Supported App Types

mac-cua works with any macOS application that exposes an accessibility tree:

  • Native Cocoa — Finder, Safari, Music, System Settings, Notes, Calendar, TextEdit

  • Electron — VS Code, Slack, Discord, Notion (click + type_text work; set_value limited)

  • Chromium — Chrome, Arc, Edge

  • Java — JetBrains IDEs (IntelliJ, PyCharm, WebStorm)

  • Qt — Limited. Falls back to screenshot-based coordinate interaction

Safety

  • App blocklist — prevents interaction with system security processes (Keychain, login)

  • URL blocklist — SSRF protection for web-based interactions

  • App approval flow — session and persistent approval gates before controlling new apps

  • Step limits — per-turn cleanup and step tracking to prevent runaway loops

  • Background-only — cannot inject events globally; input is always process-targeted

  • User wins — interruption detection yields control back to you immediately

Development

# Install dependencies
uv sync

# Run tests
uv run pytest

# Run a specific test
uv run pytest tests/test_safety.py -v

# Run the server
uv run python main.py

Project Structure

mac-cua/
  main.py                  Entry point, permissions, logging
  app/
    server.py              MCP protocol layer
    session.py             Session lifecycle & orchestration
    response.py            Response dataclasses
    guidance/              Per-app operational hints
    _lib/                  Platform backend (17 modules, ~7300 LOC)
  tests/                   136 tests
  specs/                   Tool reference docs

Contributing

Contributions are welcome! mac-cua is a community-driven project and we'd love your help.

  1. Fork the repo

  2. Create a branch (git checkout -b my-feature)

  3. Make your changes — add tests if applicable

  4. Run the test suite (uv run pytest)

  5. Open a Pull Request

Whether it's a bug fix, new app guidance file, documentation improvement, or a whole new feature — all contributions are appreciated.

If you find mac-cua useful, consider giving it a star. It helps others discover the project.

License

Apache License 2.0 — use it, fork it, ship it, sell it. No strings attached.

Acknowledgments

mac-cua was inspired by Codex computer use (OpenAI, April 2026) and Personal Computer (Perplexity, April 2026). Both showed that background desktop automation is the future — mac-cua brings that capability to everyone as an open-source MCP tool that works with any LLM.

Built with MCP for universal LLM compatibility, PyObjC for macOS integration, and ScreenCaptureKit for GPU-accelerated background capture.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
1Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hyprcat/mac-cua'

If you have feedback or need assistance with the MCP directory API, please join our Discord server