Schema | macos-control-mcp

macos-control-mcp

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
No arguments

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }

Tools

Functions exposed to the LLM to take actions

Name	Description
screenshotA	Capture a screenshot of the entire screen or a specific app window. Returns a PNG image.
screen_ocrA	OCR the screen using Apple Vision. Returns every text element with pixel coordinates (x, y, centerX, centerY). Use centerX/centerY with click_at to click on any text.
find_text_on_screenA	Find specific text on screen and get its clickable coordinates. Like Ctrl+F for the entire screen. Returns matches with centerX/centerY for use with click_at.
click_atA	Click at x,y screen coordinates. Returns a screenshot by default (disable with return_screenshot=false). Use screenshot + screen_ocr to find coordinates first. Prefer batch_actions when combining with other actions.
double_click_atA	Double-click at x,y screen coordinates. Returns a screenshot by default (disable with return_screenshot=false). Prefer batch_actions when combining with other actions.
type_textA	Type text using keyboard input. If app is specified, focuses that app first to ensure keystrokes go to the right place. Without app, types into the frontmost app. Prefer batch_actions when combining with other actions.
press_keyA	Press a key combo (e.g. press 's' with ['command'] for Cmd+S). Supports a-z, 0-9, return, tab, space, delete, escape, arrows, f1-f12. If app is specified, focuses that app first. Prefer batch_actions when combining with other actions.
scrollA	Scroll in the frontmost application. Prefer batch_actions when combining with other actions.
launch_appA	Open or focus a macOS application by name. Prefer batch_actions when combining with other actions.
list_running_appsA	List all visible running macOS applications.
get_ui_elementsA	Get the accessibility tree of an app window. Returns UI element roles, names, and positions.
click_elementA	Click a named UI element in an app window. Returns a screenshot by default (disable with return_screenshot=false). Use get_ui_elements to discover element names. Prefer batch_actions when combining with other actions.
open_urlB	Open a URL in Safari or Chrome.
get_clipboardA	Read the current text contents of the macOS clipboard.
set_clipboardC	Write text to the macOS clipboard.
execute_javascriptB	Run JavaScript in the active browser tab. Much faster than screenshot+OCR for web pages. Returns the result.
get_page_textA	Get all visible text from the current browser page. Faster than OCR for web content.
get_page_elementsA	Get all interactive elements (buttons, inputs, selects, radios, checkboxes, links) from the current browser page. Returns structured text showing each element's type, label, value, and state. Use this instead of screenshot+OCR to understand web page content. Then use click_by_text or fill_by_label to interact.
click_by_textA	Click a button, link, tab, radio, or checkbox by its visible text. Much more reliable than coordinate clicking. Scrolls element into view before clicking.
fill_by_labelA	Fill a form field by its label text. Finds the input/textarea associated with the label and fills it. Works with React/Vue/Angular apps. On failure, lists available field labels for debugging.
select_optionA	Select a dropdown option by the dropdown's label and the option text. On failure, lists available options for debugging.
batch_actionsA	PREFERRED: Always use this tool instead of calling individual action tools (click_at, type_text, press_key, launch_app, etc.) one at a time. Combine multiple steps into a single batch call — this is dramatically faster. Returns a single screenshot by default (disable with return_screenshot=false). Only use individual tools when you need to read the result of one action before deciding the next. Stops on first error. Max 20 actions per call. Example — open Notes and write text (1 call instead of 6): [{ "action": "launch_app", "name": "Notes" }, { "action": "key", "key": "n", "modifiers": ["command"] }, { "action": "set_clipboard", "text": "Hello\nWorld" }, { "action": "key", "key": "v", "modifiers": ["command"] }]

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PeterHdd/macos-control-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server