Which integrations are available for this server?

Provides browser automation capabilities in Google Chrome, including executing JavaScript, extracting page text, clicking elements by CSS selector, and filling form fields. Allows AI agents to control the macOS operating system directly: capture screenshots, read screen text via OCR, click, type, scroll, manage apps, and access the accessibility tree. Provides browser automation capabilities in Safari, including executing JavaScript, extracting page text, clicking elements by CSS selector, and filling form fields. Enables control of the Slack desktop app through screen interaction and key inputs, allowing actions such as sending messages, switching channels, and navigating the interface. Enables control of the Spotify desktop app through screen interaction, supporting play/pause, volume adjustment, and navigation of the player interface.

How do I use macos-control-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@macos-control-mcp Find the login button on the screen and click it." That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

macos-control-mcp

by PeterHdd

Overview Schema Related Servers Score Discussions

TypeScript

Local

macos-control-mcp

Give AI agents eyes and hands on macOS.

npm license macOS

What is this?

An MCP server that lets AI agents see your screen, read text on it, and interact — click, type, scroll — just like a human sitting at the keyboard. Unlike blind script runners, this MCP gives agents state awareness: they screenshot the screen, OCR it to get text with pixel coordinates, then click exactly where they need to.

Related MCP server: computer-use

The See-Think-Act Loop

┌─────────────────────────────────────────────────┐
│                                                 │
│   1. SEE        screenshot / screen_ocr         │
│      ↓          "What's on the screen?"         │
│                                                 │
│   2. THINK      AI reasons about the content    │
│      ↓          "I need to click the Save btn"  │
│                                                 │
│   3. ACT        click_at / type_text / press_key│
│                 "Click at (425, 300)"           │
│                                                 │
│      ↻ repeat                                   │
└─────────────────────────────────────────────────┘

This is what makes it powerful: the agent sees the result of every action and can course-correct, retry, or move on — just like you would.

Quick Start

No install needed — run directly with npx:

npx -y macos-control-mcp

On first run, a Python virtual environment is automatically created at ~/.macos-control-mcp/.venv with the required Apple Vision and Quartz frameworks. This takes ~60 seconds once and persists across updates.

Video Showcasing the MCP:

https://www.youtube.com/watch?v=aswlsElHV5o

Configure Your AI Client

All clients use the same command: npx -y macos-control-mcp

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "macos-control": {
      "command": "npx",
      "args": ["-y", "macos-control-mcp"]
    }
  }
}

Restart Claude Desktop after saving.

claude mcp add macos-control -- npx -y macos-control-mcp

Add to .vscode/mcp.json in your workspace:

{
  "servers": {
    "macos-control": {
      "command": "npx",
      "args": ["-y", "macos-control-mcp"]
    }
  }
}

Add to .cursor/mcp.json in your project:

{
  "mcpServers": {
    "macos-control": {
      "command": "npx",
      "args": ["-y", "macos-control-mcp"]
    }
  }
}

Open Cline extension settings → MCP Servers → Add:

{
  "macos-control": {
    "command": "npx",
    "args": ["-y", "macos-control-mcp"]
  }
}

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "macos-control": {
      "command": "npx",
      "args": ["-y", "macos-control-mcp"]
    }
  }
}

Permissions

macOS requires two permissions for full functionality:

Screen Recording — for screenshots and OCR
Accessibility — for clicking, typing, and reading UI elements

Go to System Settings → Privacy & Security and add your terminal app (Terminal, iTerm2, VS Code, etc.) to both lists. You'll be prompted on first use.

Tools (19)

See the screen

Tool	Description
`screenshot`	Capture full screen or app window as JPEG
`screen_ocr`	OCR the screen — returns text elements with pixel coordinates
`find_text_on_screen`	Find specific text and get clickable x,y coordinates

Interact with the screen

Tool	Description
`click_at`	Click at x,y coordinates (returns screenshot)
`double_click_at`	Double-click at x,y (returns screenshot)
`type_text`	Type text into the frontmost app
`press_key`	Press key combos (Cmd+S, Ctrl+C, etc.)
`scroll`	Scroll up/down/left/right

App management

Tool	Description
`launch_app`	Open or focus an application
`list_running_apps`	List visible running apps

Accessibility tree

Tool	Description
`get_ui_elements`	Get accessibility tree of an app window
`click_element`	Click a named UI element (returns screenshot)

Browser automation

Tool	Description
`execute_javascript`	Run JavaScript in the active browser tab
`get_page_text`	Get all visible text from the page (faster than OCR)
`click_web_element`	Click element by CSS selector (instant, precise)
`fill_form_field`	Fill a form field by CSS selector

Utilities

Tool	Description
`open_url`	Open URL in Safari or Chrome
`get_clipboard`	Read clipboard contents
`set_clipboard`	Write to clipboard

Example Workflows

Fill out a web form

You: "Go to example.com/signup and fill in my details"

Agent:
1. open_url("https://example.com/signup")
2. screenshot() → sees the form
3. screen_ocr() → finds "Email" field at (300, 250)
4. click_at(300, 250) → clicks the email field
5. type_text("user@example.com")
6. find_text_on_screen("Submit") → gets button coordinates
7. click_at(350, 500) → submits the form
8. screenshot() → confirms success

Navigate an unfamiliar app

You: "Change the font size to 16 in TextEdit"

Agent:
1. launch_app("TextEdit")
2. screenshot() → sees the app
3. get_ui_elements("TextEdit") → finds menu items
4. press_key("t", ["command"]) → opens Fonts panel
5. screenshot() → sees the font panel
6. find_text_on_screen("Size") → locates the size field
7. click_at(x, y) → clicks size field
8. type_text("16")
9. press_key("return")

Extract info from an email

You: "Copy the tracking number from the shipping email in Mail"

Agent:
1. launch_app("Mail")
2. screenshot() → sees the inbox
3. find_text_on_screen("Your order has shipped") → locates the email
4. click_at(x, y) → opens the email
5. screenshot() → sees the email content
6. screen_ocr() → extracts all text including tracking number
7. set_clipboard("1Z999AA10123456784") → copies to clipboard

Take a screenshot and describe what's on screen

You: "What do I have open right now?"

Agent:
1. screenshot() → captures the full screen
2. Describes what it sees: apps, windows, content
3. list_running_apps() → ["Safari", "Slack", "VS Code", "Spotify"]
4. "You have Safari open on Twitter, Slack with 3 unread messages, and VS Code editing index.ts"

Debug a UI issue

You: "The submit button on this page isn't working, can you check what's happening?"

Agent:
1. screenshot() → sees the page
2. find_text_on_screen("Submit") → locates button at (500, 600)
3. get_ui_elements("Safari") → inspects accessibility tree
4. "The button has aria-disabled=true. There's a required field 'Phone' that's empty."
5. find_text_on_screen("Phone") → locates the field
6. click_at(x, y) → clicks the field
7. type_text("+1 555-0123")
8. click_at(500, 600) → clicks Submit again
9. screenshot() → confirms it worked

Control Spotify

You: "Pause the music and turn the volume down"

Agent:
1. launch_app("Spotify")
2. screenshot() → sees Spotify is playing
3. find_text_on_screen("Pause") → locates pause button
4. click_at(x, y) → pauses playback
5. find_text_on_screen("Volume") → locates volume slider area
6. click_at(x, y) → adjusts volume
7. screenshot() → confirms paused and volume lowered

Work with Finder

You: "Create a new folder on my Desktop called 'Project Assets' and move it to Documents"

Agent:
1. launch_app("Finder")
2. press_key("d", ["command", "shift"]) → opens Desktop
3. screenshot() → sees Desktop in Finder
4. press_key("n", ["command", "shift"]) → creates new folder
5. type_text("Project Assets")
6. press_key("return")
7. screenshot() → confirms folder created

Send a message in Slack

You: "Send 'build is green, ready to deploy' in the #engineering channel on Slack"

Agent:
1. launch_app("Slack")
2. screenshot() → sees Slack
3. press_key("k", ["command"]) → opens Quick Switcher
4. type_text("engineering")
5. press_key("return") → opens #engineering
6. screenshot() → confirms channel is open
7. click_at(x, y) → clicks message input
8. type_text("build is green, ready to deploy")
9. press_key("return") → sends message
10. screenshot() → confirms sent

Research and copy data from a website

You: "Look up the current price of AAPL on Google Finance and copy it"

Agent:
1. open_url("https://google.com/finance/quote/AAPL:NASDAQ")
2. screenshot() → sees the page loading
3. screen_ocr() → reads all text on the page
4. Finds the price: "$187.42"
5. set_clipboard("$187.42")
6. "Copied AAPL price $187.42 to your clipboard"

Multi-app workflow

You: "Take what's in my clipboard, search for it in Safari, and screenshot the results"

Agent:
1. get_clipboard() → "best mechanical keyboards 2025"
2. launch_app("Safari")
3. press_key("l", ["command"]) → focuses address bar
4. type_text("best mechanical keyboards 2025")
5. press_key("return") → searches
6. screenshot() → captures the search results
7. "Here are the search results for 'best mechanical keyboards 2025'"

Navigate System Settings

You: "Turn on Dark Mode"

Agent:
1. launch_app("System Settings")
2. screenshot() → sees System Settings
3. find_text_on_screen("Appearance") → locates the option
4. click_at(x, y) → opens Appearance settings
5. screenshot() → sees Light/Dark/Auto options
6. find_text_on_screen("Dark") → locates Dark mode option
7. click_at(x, y) → enables Dark Mode
8. screenshot() → confirms Dark Mode is on

Requirements

macOS 13+ (Ventura or later)
Node.js 18+
Python 3.9+ (pre-installed on macOS — needed for OCR and mouse control)

How It Works

Screenshots — native screencapture CLI
OCR — Apple Vision framework (VNRecognizeTextRequest) via Python bridge, returns text with bounding box coordinates
Mouse — Quartz Core Graphics events via Python bridge for precise pixel-level control
Keyboard & Apps — AppleScript via osascript for key presses, app launching, and UI element interaction
Python env — auto-managed venv at ~/.macos-control-mcp/.venv/ with only two packages (pyobjc-framework-Vision, pyobjc-framework-Quartz)

Troubleshooting

"Permission denied" or blank screenshots → Add your terminal to System Settings → Privacy & Security → Screen Recording

Clicks don't work → Add your terminal to System Settings → Privacy & Security → Accessibility

Python setup fails → Ensure python3 is in your PATH. Run python3 --version to check. Non-Python tools (keyboard, apps, clipboard) still work without it.

OCR returns empty results → Make sure Screen Recording permission is granted. Try a full-screen OCR first (without the app parameter).

"App not found" errors → Use the exact app name as shown in Activity Monitor (e.g., "Google Chrome" not "Chrome").

License

MIT

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

0dRelease cycle

11Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

View all tools

Related MCP Servers

Automation MCP
OS Automation App Automation
ashwwwin
A
license
-
quality
D
maintenance
Enables AI assistants to automate macOS desktop tasks including mouse control, keyboard input, screenshots, window management, and UI interaction.
Last updated 2025-06-11
23
415
MIT
computer-use
OS Automation Autonomous Agents
wimi321
A
license
-
quality
B
maintenance
Standalone MCP server that gives AI agents full GUI control over macOS — screenshots, mouse, keyboard, apps, clipboard, and multi-display — with zero private dependencies.
Last updated 2026-04-19
16
MIT
macos-desktop-control
OS Automation App Automation Virtualization
d-wwei
A
license
-
quality
C
maintenance
Provides native macOS desktop automation for AI agents, enabling screen capture, mouse/keyboard control, window management, and iOS/Android simulator control in both foreground and background modes without focus stealing.
Last updated 2026-04-01
3
MIT
Daimon
OS Automation Autonomous Agents
ArboRithmDev
A
license
-
quality
A
maintenance
A local daemon for macOS that gives any MCP-capable AI client eyes, hands, and a face — screen capture, accessibility tree, mouse/keyboard actions, and an overlay — with a built-in security ceiling.
Last updated 2026-07-19
1
AGPL 3.0

View all related MCP servers

Related MCP Connectors

local-mcp
Let ChatGPT, Claude & Cursor use your Mac: email, calendar, iMessage, Teams, files. Local, free.
Glasswarp
Eyes and hands on real Windows PCs — observe, click, type via Glasswarp API.
E2LLM
E2LLM gives your AI eyes and hands in a real browser: structured perception (SiFR) plus action.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PeterHdd/macos-control-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

macos-control-mcp

What is this?

The See-Think-Act Loop

Quick Start

Video Showcasing the MCP:

Configure Your AI Client

Permissions

Tools (19)

See the screen

Interact with the screen

App management

Accessibility tree

Browser automation

Utilities

Example Workflows

Fill out a web form

Navigate an unfamiliar app

Extract info from an email

Take a screenshot and describe what's on screen

Debug a UI issue

Control Spotify

Work with Finder

Send a message in Slack

Research and copy data from a website

Multi-app workflow

Navigate System Settings

Requirements

How It Works

Troubleshooting

License

Maintenance

Resources

Looking for Admin?

Tools

Related MCP Servers

Automation MCP

computer-use

macos-desktop-control

Daimon

Related MCP Connectors

Latest Blog Posts

MCP directory API