Allows AI assistants to visually interact with macOS applications by capturing screenshots, controlling mouse and keyboard inputs, managing windows, extracting text via OCR, and detecting errors on screen.
Mentioned as an example application that can be controlled, allowing the AI to focus Safari windows and interact with web content.
Mac Commander MCP Server
🤖 Enable AI assistants to visually interact with your macOS applications
An MCP (Model Context Protocol) server that allows AI coding tools like Claude Desktop, Claude Code, and Cursor to see, control, and test macOS applications. Perfect for automated testing, UI debugging, and error detection.
🎆 What makes this special?
- ✨ Visual AI: Your AI can actually see what's on your screen
- 🗾 Error Detection: Automatically finds bugs and error dialogs
- 🔄 Full Control: Click, type, and navigate just like a human
- 📱 App Testing: Perfect for testing mobile apps, desktop software, or web interfaces
- 🚀 Easy Setup: Get started in under 5 minutes
🚀 Quick Start
Option 1: Automated Install (Easiest)
The installer will:
- ✅ Check your Node.js version
- ✅ Install dependencies and build the project
- ✅ Show you the exact configuration to copy
- ✅ Offer to open System Settings for permissions
Option 2: Manual Install
✨ In 2 minutes, your AI will be able to see and control your Mac!
📚 Table of Contents
- ✨ Features
- 🛠️ Prerequisites
- 📦 Installation
- ⚙️ Configuration
- 🚀 Usage Examples
- 📈 Available Tools
- ⚠️ Troubleshooting
- 🔒 Security
✨ Features
Core Features
- 📸 Screenshot Capture: Take full screen or region-specific screenshots with PNG export
- 🖱️ Mouse Control: Click, double-click, and move the mouse cursor
- ⌨️ Keyboard Input: Type text and press key combinations
- 🪟 Window Management: List, find, focus, and get information about application windows
- 🔍 OCR Text Recognition: Extract and find text on screen using Tesseract.js
- ⚠️ Error Detection: Automatically detect error dialogs and messages using OCR
- 📏 Screen Information: Get display dimensions and coordinates
Advanced Automation Features (New!)
- 🎯 Drag & Drop: Smooth, customizable drag operations with duration control
- 📜 Advanced Scrolling: Pixel-perfect and smooth scrolling in any direction
- 🖱️ Mouse Gestures: Hover, click-and-hold, and relative mouse movements
- ⌨️ Human-like Typing: Realistic typing with variable delays and optional typos
- 🔄 Complex Interactions: Chain multiple actions for sophisticated automation
- ⏱️ Precise Timing: Control duration and delays for natural interactions
- 🎨 Smooth Animations: Easing functions for natural mouse movements
🛠️ Prerequisites
System Requirements
- macOS 13+ (Ventura or later)
- Node.js 18+ and npm
- AI client with MCP support:
- Claude Desktop (recommended)
- Claude Code
- Cursor with MCP support
- Any other MCP-compatible client
Required macOS Permissions
⚠️ Important: You must grant these permissions or the server won't work!
- Screen Recording Permission:
- Go to System Settings → Privacy & Security → Screen Recording
- Click the + button and add your AI client (Claude Desktop, Cursor, etc.)
- ✅ Check the box next to your AI client
- Accessibility Permission:
- Go to System Settings → Privacy & Security → Accessibility
- Click the + button and add your AI client
- ✅ Check the box next to your AI client
💡 Tip: You might need to restart your AI client after granting permissions.
📦 Installation
💿 Automated Installation
Recommended for beginners:
The installer script will guide you through everything!
🔧 Manual Installation
For advanced users:
Option 2: Global Install
🔧 Verify Installation
Run the test script to make sure everything works:
You should see the server start and respond to test commands.
⚙️ Configuration
🖥️ Claude Desktop Setup
- Open Claude Desktop and go to Settings (gear icon)
- Click on the Developer tab
- Click Edit Config to open the configuration file
- Add the MCP server configuration:
🚨 Important: Replace
/FULL/PATH/TO/
with the actual absolute path to where you cloned this repository!
Example with real path:
- Save the file and restart Claude Desktop
- Start a new chat - you should see a 🔨 hammer icon indicating MCP is active
💻 Claude Code Setup
- Navigate to your project folder in terminal
- Create or edit
.claude/config.json
in your project root:
- Start Claude Code in that project folder:
🎯 Cursor Setup
- Open Cursor and go to Settings → Cursor Settings → MCP
- Click "Add new global MCP server"
- Add the configuration:
- Name:
macos-simulator
- Command:
node
- Args:
/FULL/PATH/TO/mac-commander/build/index.js
- Name:
Or create ~/.cursor/mcp.json
:
🔍 Finding Your Full Path
Not sure what your full path is? Run this in the project directory:
Example output: /Users/yourname/Developer/mac-commander/build/index.js
Copy this exact path and use it in your configuration files above.
✅ Verify It's Working
After configuration:
- Restart your AI client (Claude Desktop, Cursor, etc.)
- Start a new chat/session
- Look for the MCP indicator (hammer icon in Claude Desktop)
- Try a test command: "Take a screenshot of my screen"
If it works, you'll see the AI successfully take a screenshot! 🎉
📈 Available Tools
screenshot
Capture a screenshot of the screen or a specific region.
Parameters:
outputPath
(optional): Path to save the screenshot as PNGregion
(optional): Object withx
,y
,width
,height
to capture specific area
click
Click at specific coordinates on the screen.
Parameters:
x
: X coordinatey
: Y coordinatebutton
: "left", "right", or "middle" (default: "left")doubleClick
: boolean (default: false)
type_text
Type text using the keyboard.
Parameters:
text
: Text to typedelay
: Delay between keystrokes in milliseconds (default: 50)
mouse_move
Move the mouse to specific coordinates.
Parameters:
x
: X coordinatey
: Y coordinatesmooth
: Whether to use smooth movement (default: true)
key_press
Press a key or key combination.
Parameters:
key
: Key to press (e.g., "Enter", "Escape", "cmd+a")
check_for_errors
Check the screen for common error indicators.
Parameters:
region
(optional): Specific region to check
wait
Wait for a specified amount of time.
Parameters:
milliseconds
: Time to wait
get_screen_info
Get information about the screen dimensions.
No parameters required.
list_windows
List all open windows with their titles and positions.
No parameters required.
get_active_window
Get information about the currently active window.
No parameters required.
find_window
Find a window by its title (partial match supported).
Parameters:
title
: Window title to search for
focus_window
Focus/activate a window by its title.
Parameters:
title
: Window title to focus
get_window_info
Get detailed information about a specific window.
Parameters:
title
: Window title to get info for
extract_text
Extract text from the screen using OCR (Optical Character Recognition).
Parameters:
region
(optional): Specific region to extract text from
find_text
Find specific text on the screen and get its location.
Parameters:
text
: Text to search forregion
(optional): Specific region to search in
drag_drop
Drag from one point to another with customizable duration and smoothness.
Parameters:
startX
: Starting X coordinatestartY
: Starting Y coordinateendX
: Ending X coordinateendY
: Ending Y coordinateduration
: Duration of the drag in milliseconds (default: 1000)smooth
: Whether to use smooth movement (default: true)button
: Mouse button to use for dragging (default: "left")
scroll
Scroll in any direction by steps or pixels with optional smooth animation.
Parameters:
direction
: Direction to scroll ("up", "down", "left", "right")amount
: Amount to scroll (pixels for pixelScroll, steps for normal scroll)x
(optional): X coordinate to scroll at (defaults to current mouse position)y
(optional): Y coordinate to scroll at (defaults to current mouse position)smooth
: Whether to use smooth scrolling animation (default: false)pixelScroll
: Whether to scroll by pixels (true) or steps (false) (default: false)
hover
Hover the mouse at a specific position for a duration.
Parameters:
x
: X coordinate to hover aty
: Y coordinate to hover atduration
: Duration to hover in milliseconds (default: 1000)
click_hold
Click and hold a mouse button at specific coordinates for a duration.
Parameters:
x
: X coordinate to click and holdy
: Y coordinate to click and holdduration
: Duration to hold the click in millisecondsbutton
: Mouse button to hold (default: "left")
relative_mouse_move
Move the mouse relative to its current position.
Parameters:
offsetX
: Relative X offset from current positionoffsetY
: Relative Y offset from current positionsmooth
: Whether to use smooth movement (default: true)
key_hold
Hold a key or key combination for a specific duration.
Parameters:
key
: Key to hold (e.g., 'shift', 'cmd', 'a', 'cmd+shift')duration
: Duration to hold the key in milliseconds
type_with_delay
Type text with realistic human-like delays between keystrokes.
Parameters:
text
: Text to typeminDelay
: Minimum delay between keystrokes in milliseconds (default: 50)maxDelay
: Maximum delay between keystrokes in milliseconds (default: 150)mistakes
: Whether to simulate occasional typos (default: false)
🚀 Usage Examples
🎯 Basic Commands
Once configured, you can ask your AI assistant to:
Screenshots & Visual Inspection:
- "Take a screenshot of my app"
- "Capture just the top-left corner of the screen"
- "Save a screenshot to ~/Desktop/app-screenshot.png"
Mouse & Keyboard Control:
- "Click the button at coordinates 100, 200"
- "Double-click on the center of the screen"
- "Type 'Hello World' in the current field"
- "Press cmd+s to save the file"
- "Press Enter to submit"
Window Management:
- "List all open windows"
- "Focus the Safari window"
- "Get information about the active window"
- "Find the window with 'Calculator' in the title"
Text Recognition & Search:
- "Extract all text from the screen"
- "Find the 'Submit' button on screen"
- "Look for any text containing 'error' on screen"
- "Read the text in the dialog box"
Error Detection:
- "Check if there are any error dialogs on screen"
- "Look for error messages in my app"
- "Scan for any warning or error indicators"
🔧 Advanced Automation Examples
UI Testing Workflow:
Bug Investigation:
Automated Form Filling:
🚀 Advanced Automation Features
Drag and Drop Operations:
Natural Scrolling:
Human-like Typing:
Complex Mouse Gestures:
Advanced Keyboard Shortcuts:
Smooth Navigation:
Development
- Run in development mode:
npm run dev
- Test with MCP Inspector:
npm run inspector
⚠️ Limitations & Troubleshooting
Known Limitations
- OCR Accuracy: Text recognition depends on font size, contrast, and clarity
- Permission Requirements: Must manually grant Screen Recording and Accessibility permissions
- First OCR Run: Initial text extraction may be slower due to model loading
- macOS Only: This server only works on macOS systems
🐛 Common Issues
"Permission denied" or "Screen recording not allowed"
- ✅ Grant Screen Recording permission to your AI client
- ✅ Grant Accessibility permission to your AI client
- 🔄 Restart your AI client after granting permissions
"Command not found" or "Cannot find module"
- ✅ Make sure you ran
npm install
andnpm run build
- ✅ Use the absolute path to
build/index.js
in your config - ✅ Verify Node.js is installed:
node --version
"MCP server not showing up"
- ✅ Check your configuration JSON syntax is valid
- ✅ Restart your AI client completely
- ✅ Try the test script:
node test-server.js
"Screenshots are black or empty"
- ✅ Grant Screen Recording permission
- ✅ Make sure the app you're screenshotting is visible (not minimized)
🆘 Getting Help
If you're still having issues:
- Run the test script:
node test-server.js
to verify basic functionality - Check the console: Look for error messages in your AI client
- Open an issue: Create a GitHub issue with:
- Your macOS version
- Your AI client (Claude Desktop, Cursor, etc.)
- The exact error message
- Your configuration file (with paths anonymized)
🔒 Security & Privacy
Important Security Notes
⚠️ This server has powerful capabilities and requires significant system permissions.
What this server can access:
- ✅ Screen content: Can take screenshots of anything visible
- ✅ Keyboard input: Can type any text or key combinations
- ✅ Mouse control: Can click anywhere on screen
- ✅ Window information: Can see and control application windows
- ✅ Text recognition: Can read any text visible on screen
Security best practices:
- 🏠 Only use in trusted environments: Don't use on shared or public computers
- 🤝 Review AI requests: Be mindful of what you ask the AI to do
- 🔐 Sensitive data: Avoid using when sensitive information is visible
- 🚫 Revoke access: You can remove permissions anytime in System Settings
Privacy Notes
- No data is sent externally by this MCP server itself
- Your AI client (Claude Desktop, etc.) may process screenshots/data according to their privacy policies
- Screenshots are temporary and not permanently stored unless you specify a save path
- OCR processing happens locally on your machine
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Built with Model Context Protocol (MCP)
- Uses @nut-tree-fork/nut-js for system automation
- OCR powered by Tesseract.js
- Image processing with node-canvas
Made with ❤️ for the MCP community
Having issues? Open a GitHub issue • Want to contribute? Check our contributing guide
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
An MCP server that allows AI tools like Claude Desktop, Claude Code, and Cursor to visually interact with macOS applications by capturing screenshots and controlling the mouse and keyboard.
Related MCP Servers
- -securityAlicense-qualityAn MCP server that implements Claude Code-like functionality, allowing the AI to analyze codebases, modify files, execute commands, and manage projects through direct file system interactions.Last updated -179PythonMIT License
- -securityFlicense-qualityAn MCP server that allows AI assistants like Claude to execute terminal commands on the user's computer and return the output, functioning like a terminal through AI.Last updated -7Python
- -securityAlicense-qualityAn MCP server that bridges AI agents with GUI automation capabilities, allowing them to control mouse, keyboard, windows, and take screenshots to interact with desktop applications.Last updated -PythonMIT License
- -securityAlicense-qualityAn MCP server that enables AI assistants like Claude to access and manipulate Apple Notes on macOS, allowing for retrieving, creating, and managing notes through natural language interactions.Last updated -60TypeScriptMIT License