Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
screenshotA

Capture a screenshot of the entire screen or a specific app window. Returns a PNG image.

screen_ocrA

OCR the screen using Apple Vision. Returns every text element with pixel coordinates (x, y, centerX, centerY). Use centerX/centerY with click_at to click on any text.

find_text_on_screenA

Find specific text on screen and get its clickable coordinates. Like Ctrl+F for the entire screen. Returns matches with centerX/centerY for use with click_at.

click_atA

Click at x,y screen coordinates. Returns a screenshot by default (disable with return_screenshot=false). Use screenshot + screen_ocr to find coordinates first. Prefer batch_actions when combining with other actions.

double_click_atA

Double-click at x,y screen coordinates. Returns a screenshot by default (disable with return_screenshot=false). Prefer batch_actions when combining with other actions.

type_textA

Type text using keyboard input. If app is specified, focuses that app first to ensure keystrokes go to the right place. Without app, types into the frontmost app. Prefer batch_actions when combining with other actions.

press_keyA

Press a key combo (e.g. press 's' with ['command'] for Cmd+S). Supports a-z, 0-9, return, tab, space, delete, escape, arrows, f1-f12. If app is specified, focuses that app first. Prefer batch_actions when combining with other actions.

scrollA

Scroll in the frontmost application. Prefer batch_actions when combining with other actions.

launch_appA

Open or focus a macOS application by name. Prefer batch_actions when combining with other actions.

list_running_appsA

List all visible running macOS applications.

get_ui_elementsA

Get the accessibility tree of an app window. Returns UI element roles, names, and positions.

click_elementA

Click a named UI element in an app window. Returns a screenshot by default (disable with return_screenshot=false). Use get_ui_elements to discover element names. Prefer batch_actions when combining with other actions.

open_urlB

Open a URL in Safari or Chrome.

get_clipboardA

Read the current text contents of the macOS clipboard.

set_clipboardC

Write text to the macOS clipboard.

execute_javascriptB

Run JavaScript in the active browser tab. Much faster than screenshot+OCR for web pages. Returns the result.

get_page_textA

Get all visible text from the current browser page. Faster than OCR for web content.

get_page_elementsA

Get all interactive elements (buttons, inputs, selects, radios, checkboxes, links) from the current browser page. Returns structured text showing each element's type, label, value, and state. Use this instead of screenshot+OCR to understand web page content. Then use click_by_text or fill_by_label to interact.

click_by_textA

Click a button, link, tab, radio, or checkbox by its visible text. Much more reliable than coordinate clicking. Scrolls element into view before clicking.

fill_by_labelA

Fill a form field by its label text. Finds the input/textarea associated with the label and fills it. Works with React/Vue/Angular apps. On failure, lists available field labels for debugging.

select_optionA

Select a dropdown option by the dropdown's label and the option text. On failure, lists available options for debugging.

batch_actionsA

PREFERRED: Always use this tool instead of calling individual action tools (click_at, type_text, press_key, launch_app, etc.) one at a time. Combine multiple steps into a single batch call — this is dramatically faster. Returns a single screenshot by default (disable with return_screenshot=false). Only use individual tools when you need to read the result of one action before deciding the next. Stops on first error. Max 20 actions per call.

Example — open Notes and write text (1 call instead of 6): [{ "action": "launch_app", "name": "Notes" }, { "action": "key", "key": "n", "modifiers": ["command"] }, { "action": "set_clipboard", "text": "Hello\nWorld" }, { "action": "key", "key": "v", "modifiers": ["command"] }]

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PeterHdd/macos-control-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server