macos-control-mcp
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| screenshotA | Capture a screenshot of the entire screen or a specific app window. Returns a PNG image. |
| screen_ocrA | OCR the screen using Apple Vision. Returns every text element with pixel coordinates (x, y, centerX, centerY). Use centerX/centerY with click_at to click on any text. |
| find_text_on_screenA | Find specific text on screen and get its clickable coordinates. Like Ctrl+F for the entire screen. Returns matches with centerX/centerY for use with click_at. |
| click_atA | Click at x,y screen coordinates. Returns a screenshot by default (disable with return_screenshot=false). Use screenshot + screen_ocr to find coordinates first. Prefer batch_actions when combining with other actions. |
| double_click_atA | Double-click at x,y screen coordinates. Returns a screenshot by default (disable with return_screenshot=false). Prefer batch_actions when combining with other actions. |
| type_textA | Type text using keyboard input. If app is specified, focuses that app first to ensure keystrokes go to the right place. Without app, types into the frontmost app. Prefer batch_actions when combining with other actions. |
| press_keyA | Press a key combo (e.g. press 's' with ['command'] for Cmd+S). Supports a-z, 0-9, return, tab, space, delete, escape, arrows, f1-f12. If app is specified, focuses that app first. Prefer batch_actions when combining with other actions. |
| scrollA | Scroll in the frontmost application. Prefer batch_actions when combining with other actions. |
| launch_appA | Open or focus a macOS application by name. Prefer batch_actions when combining with other actions. |
| list_running_appsA | List all visible running macOS applications. |
| get_ui_elementsA | Get the accessibility tree of an app window. Returns UI element roles, names, and positions. |
| click_elementA | Click a named UI element in an app window. Returns a screenshot by default (disable with return_screenshot=false). Use get_ui_elements to discover element names. Prefer batch_actions when combining with other actions. |
| open_urlB | Open a URL in Safari or Chrome. |
| get_clipboardA | Read the current text contents of the macOS clipboard. |
| set_clipboardC | Write text to the macOS clipboard. |
| execute_javascriptB | Run JavaScript in the active browser tab. Much faster than screenshot+OCR for web pages. Returns the result. |
| get_page_textA | Get all visible text from the current browser page. Faster than OCR for web content. |
| get_page_elementsA | Get all interactive elements (buttons, inputs, selects, radios, checkboxes, links) from the current browser page. Returns structured text showing each element's type, label, value, and state. Use this instead of screenshot+OCR to understand web page content. Then use click_by_text or fill_by_label to interact. |
| click_by_textA | Click a button, link, tab, radio, or checkbox by its visible text. Much more reliable than coordinate clicking. Scrolls element into view before clicking. |
| fill_by_labelA | Fill a form field by its label text. Finds the input/textarea associated with the label and fills it. Works with React/Vue/Angular apps. On failure, lists available field labels for debugging. |
| select_optionA | Select a dropdown option by the dropdown's label and the option text. On failure, lists available options for debugging. |
| batch_actionsA | PREFERRED: Always use this tool instead of calling individual action tools (click_at, type_text, press_key, launch_app, etc.) one at a time. Combine multiple steps into a single batch call — this is dramatically faster. Returns a single screenshot by default (disable with return_screenshot=false). Only use individual tools when you need to read the result of one action before deciding the next. Stops on first error. Max 20 actions per call. Example — open Notes and write text (1 call instead of 6): [{ "action": "launch_app", "name": "Notes" }, { "action": "key", "key": "n", "modifiers": ["command"] }, { "action": "set_clipboard", "text": "Hello\nWorld" }, { "action": "key", "key": "v", "modifiers": ["command"] }] |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/PeterHdd/macos-control-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server