Skip to main content
Glama
syedazharmbnr1

Computer Use MCP Server

Computer Use MCP Server

Python MCP macOS License Tools

A production-grade macOS Computer Use MCP Server that exposes 33 tools across 10 categories for full desktop automation via the Model Context Protocol. Control mouse, keyboard, screenshots, clipboard, windows, and more from any MCP-compatible AI client.

Works with Claude Code, Cursor, VS Code, Windsurf, LM Studio, Ollama, llama.cpp, MLX, and any MCP-compatible tool.


Features

33 Tools Across 10 Categories

Category

Tools

Description

Mouse (12)

mouse_click, left_click, right_click, middle_click, double_click, triple_click, left_mouse_down, left_mouse_up, mouse_move, mouse_drag, scroll, mouse_scroll

Full mouse control with coordinate-based clicking, dragging with 20-step interpolation, directional scrolling

Keyboard (5)

key, hold_key, keyboard_type, keyboard_press, keyboard_hotkey

Unified key combos (cmd+c), hold-for-duration, Unicode text typing, individual key press, modifier hotkeys

Screenshot (1)

take_screenshot

Full-screen or region capture with Retina scaling, coordinate metadata, and configurable resolution

Display (3)

switch_display, zoom, list_displays

Multi-monitor switching, high-res region zoom for reading small text, display enumeration

Clipboard (2)

read_clipboard, write_clipboard

Read/write system clipboard via NSPasteboard

Window (2)

get_active_window, list_windows

Frontmost window info, enumerate all visible windows with position/size

Screen (2)

get_screen_info, get_cursor_position

Display dimensions, Retina scale, accessibility status, cursor coordinates

System (3)

open_application, wait, run_shell_command

Launch apps by name, timed waits, shell command execution

Access (2)

request_access, list_granted_applications

App permission tracking for session-based access control

Batch (1)

computer_batch

Execute multiple actions in a single call - eliminates round-trip latency


Quick Start

# 1. Clone
git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp

# 2. Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt  # or: pip install mcp mss pillow pyobjc-framework-Quartz

# 3. Test
python3 __main__.py

The server communicates over stdio (stdin/stdout) using the MCP JSON-RPC protocol.


Installation

Prerequisites

  • macOS (uses Quartz framework for input simulation)

  • Python 3.10+

  • Accessibility permissions (System Settings > Privacy & Security > Accessibility)

Install Dependencies

git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp
python3 -m venv .venv
source .venv/bin/activate
pip install mcp>=1.26.0 mss pillow pyobjc-framework-Quartz

Verify Installation

python3 -c "
from server.computer_use_server import ComputerUseMCPServer
server = ComputerUseMCPServer()
tools = server._collect_all_tools()
print(f'Server OK - {len(tools)} tools registered')
"

Expected output: Server OK - 33 tools registered

Grant Accessibility Permission

The server needs macOS accessibility access to simulate mouse/keyboard input:

  1. Open System Settings > Privacy & Security > Accessibility

  2. Add your terminal app (Terminal, iTerm2, VS Code, etc.)

  3. Toggle the permission ON

Screenshot capture works without accessibility permission. Only mouse/keyboard tools require it.


Configuration for AI Coding Tools

The server uses stdio transport - it reads from stdin and writes to stdout. Every MCP client connects the same way: spawn the Python process and pipe stdio.

Claude Code

Edit ~/.claude/settings.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"],
      "cwd": "/path/to/computer-use-mcp"
    }
  }
}

Then run /mcp in Claude Code to connect.

Cursor

Create .cursor/mcp.json in your project root (or ~/.cursor/mcp.json globally):

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

VS Code + GitHub Copilot

Create .vscode/mcp.json in your workspace:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

JetBrains IDEs

Add via Settings > Tools > MCP Servers, using the same command/args pattern.

Zed

Add to your Zed settings (~/.config/zed/settings.json):

{
  "language_models": {
    "mcp_servers": {
      "computer-use": {
        "command": "/path/to/computer-use-mcp/.venv/bin/python3",
        "args": ["/path/to/computer-use-mcp/__main__.py"]
      }
    }
  }
}

Cline / Continue.dev

Both support the standard MCP JSON config format. Add to their respective config files using the same command + args pattern shown above.


Per LM Arena rankings and real-world testing, these are the best models for MCP tool calling:

Top Open Source Models (LM Arena Elo)

Rank

Model

Provider

Parameters

Highlights

1

GLM-5

Zhipu AI

MoE

#1 open source (Elo 1451), 77.8% SWE-bench Verified

2

Kimi K2.5

Moonshot AI

MoE

HumanEval 99.0, stable across 200-300 sequential tool calls

3

GLM-4.7

Zhipu AI

MoE

HumanEval 94.2, AIME 2025 95.7, GPQA 85.7

4

GLM-5.1

Zhipu AI

744B MoE / 40B active

MIT license, 200K context, 8+ hour continuous agentic sessions

5

Qwen 3.6 Plus

Alibaba

Dense

1M context, native function calling, always-on CoT reasoning

6

Gemma 4 31B

Google

31B Dense

#3 Arena text, Apache 2.0, native tool calling, 256K context

7

Llama 4 Scout

Meta

17B active / 16 experts

10M context window, multimodal, beats Gemini 2.0 Flash-Lite

8

Llama 4 Maverick

Meta

17B active / 128 experts

Beats GPT-4o, best multimodal in class

9

Mistral Small 4

Mistral AI

119B MoE / 6B active

Unified instruct+reasoning+coding+vision, 256K context

10

Qwen 3.5

Alibaba

Multiple sizes

Most stable tool calling, rarely hallucinates calls

Best Models by Platform

Ollama (run locally via ollama pull <model>):

  • gemma4 (E2B / E4B / 26B MoE / 31B Dense) — native function calling, best sub-32B for agents

  • qwen3.5 / qwen3.6-plus — most stable tool calling, rarely drops parameters

  • llama4 (Scout / Maverick) — native multimodal + tools, 10M context

  • kimi-k2.5 — 200+ sequential tool calls without drift

  • glm-5.1 — long-horizon agentic coding (8+ hours continuous)

  • mistral-small4 — unified model, 6B active, fast

  • granite4 — enterprise-grade tool calling

  • phi-4-mini — compact with function calling support

  • deepseek-r1 — strong reasoning + tool use

llama.cpp (GGUF format):

  • bartowski/Gemma-4-31B-IT-GGUF — best open weight for agents

  • bartowski/Qwen3.5-32B-Instruct-GGUF — stable tool calling

  • bartowski/Llama-4-Scout-17B-GGUF — 10M context, multimodal

  • bartowski/GLM-5.1-40B-GGUF — top open source coding

  • Any model with Jinja chat template + function calling support

MLX (Apple Silicon via mlx-community):

  • mlx-community/Gemma-4-31B-IT-4bit — best performance/quality on Apple Silicon

  • mlx-community/Qwen3.5-32B-Instruct-4bit — stable tool calls

  • mlx-community/Llama-4-Scout-17B-4bit — multimodal + tools

  • mlx-community/Mistral-Small-4-6B-4bit — fast, 6B active

LM Studio: All of the above models are available through LM Studio's model browser with native MCP host support.


Configuration for Local Model Frameworks

LM Studio

LM Studio has native MCP host support since v0.3.17.

  1. Open LM Studio > Settings > MCP

  2. Add a new MCP server with:

    • Command: /path/to/computer-use-mcp/.venv/bin/python3

    • Args: ["/path/to/computer-use-mcp/__main__.py"]

  3. Select a model with tool calling support:

    • Top picks: Gemma 4 31B, Qwen 3.5/3.6, Llama 4 Scout, GLM-5.1, Mistral Small 4, Kimi K2.5

  4. The tools will appear in the chat interface

llama.cpp (Native MCP - March 2026+)

llama.cpp merged native MCP client support in March 2026 (PR #18655), adding a full agentic loop with MCP server management in the WebUI.

Start llama-server with MCP:

# Start with a top function-calling model (pick one)
llama-server --jinja -fa -hf bartowski/Gemma-4-31B-IT-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Qwen3.5-32B-Instruct-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Llama-4-Scout-17B-GGUF:Q4_K_M --port 8080

Then in the llama.cpp WebUI:

  1. Go to MCP Server Settings

  2. Add this server with command: /path/to/.venv/bin/python3 /path/to/__main__.py

  3. The 33 tools will be available in the agentic loop

Via llama-mcp-server bridge:

npm install -g llama-mcp-server

Configure in claude_desktop_config.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Supported models for tool calling: Gemma 4, Qwen 3.5/3.6, Llama 4 Scout/Maverick, GLM-5.1, Kimi K2.5, Mistral Small 4, Llama 3.3, DeepSeek R1, Granite 4, Phi-4-mini, Hermes 3, Functionary v3.

Ollama

Ollama does not have native MCP support yet, but several bridge solutions work:

Option A: MCP-Bridge (recommended)

MCP-Bridge acts as middleware between Ollama's OpenAI-compatible API and MCP servers.

git clone https://github.com/SecretiveShell/MCP-Bridge.git
cd MCP-Bridge

Configure config.json:

{
  "inference_server": {
    "base_url": "http://localhost:11434/v1",
    "api_key": "ollama"
  },
  "mcp_servers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Option B: ollama-mcp-bridge

git clone https://github.com/patruff/ollama-mcp-bridge.git
cd ollama-mcp-bridge
npm install && npm run build

Add the computer-use server to the bridge config.

Recommended Ollama models (April 2026):

  • gemma4:31b — best sub-32B for agents, native function calling

  • qwen3.5:32b — most stable tool calling

  • llama4:scout — 10M context, multimodal + tools

  • kimi-k2.5 — 200+ sequential tool calls without drift

  • glm-5.1 — long-horizon agentic (8+ hours continuous)

  • mistral-small4 — fast, 6B active params

  • granite4 — enterprise tool calling

MLX / Apple Silicon

For Apple Silicon Macs, use vLLM-MLX for optimized local inference with MCP bridge:

Install vLLM-MLX:

pip install git+https://github.com/waybarrios/vllm-mlx.git

Start the inference server:

# Pick a model (top recommendations for tool calling)
vllm-mlx serve mlx-community/Gemma-4-31B-IT-4bit --port 8000
vllm-mlx serve mlx-community/Qwen3.5-32B-Instruct-4bit --port 8000
vllm-mlx serve mlx-community/Llama-4-Scout-17B-4bit --port 8000

Connect via MCP-Bridge:

{
  "inference_server": {
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed"
  },
  "mcp_servers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Performance: M4 Max achieves ~402 tokens/sec on small models, ~1112 tokens/sec with continuous batching.

Alternative: oMLX provides a macOS menu bar app with MCP tool integration.

vLLM

vLLM has native MCP integration with GPU-optimized inference.

pip install vllm
vllm serve google/gemma-4-31b-it --port 8000       # or any tool-calling model
vllm serve Qwen/Qwen3.5-32B-Instruct --port 8000   # stable tool calling
vllm serve meta-llama/Llama-4-Scout-17B --port 8000 # multimodal + tools

Connect via MCP-Bridge using http://localhost:8000/v1 as the base URL.

Generic OpenAI-Compatible API

Any service exposing an OpenAI-compatible API (local or remote) can use this server through MCP-Bridge:

  1. Start your inference server (Ollama, llama.cpp, vLLM, MLX, TGI, etc.)

  2. Point MCP-Bridge at it with the base_url

  3. Add this server to MCP-Bridge's mcp_servers config

  4. MCP-Bridge intercepts API requests, enriches them with tool definitions, executes tool calls, and returns results


Tool Reference

Batch Operations — computer_batch

Execute multiple actions in a single call to eliminate round-trip latency:

{
  "actions": [
    {"action": "left_click", "coordinate": [100, 200]},
    {"action": "type", "text": "Hello, world!"},
    {"action": "key", "text": "Return"},
    {"action": "wait", "duration": 1},
    {"action": "screenshot"}
  ]
}

Supported actions: key, type, mouse_move, left_click, left_click_drag, right_click, middle_click, double_click, triple_click, scroll, hold_key, screenshot, cursor_position, left_mouse_down, left_mouse_up, wait

Mouse Tools

Tool

Parameters

Description

left_click

coordinate: [x, y]

Left-click at coordinates

right_click

coordinate: [x, y]

Right-click (context menu)

middle_click

coordinate: [x, y]

Middle-click (scroll wheel)

double_click

coordinate: [x, y]

Double-click (select word)

triple_click

coordinate: [x, y]

Triple-click (select line)

mouse_click

x, y, button, click_count

General click with full control

mouse_move

x, y or coordinate: [x, y]

Move cursor without clicking

mouse_drag

start_coordinate, coordinate

Drag with 20-step interpolation

left_mouse_down

(none)

Press and hold left button

left_mouse_up

(none)

Release left button

scroll

coordinate, scroll_direction, scroll_amount

Directional scroll (up/down/left/right)

mouse_scroll

amount, x, y

Scroll wheel (positive=up, negative=down)

Keyboard Tools

Tool

Parameters

Description

key

text: "cmd+c", repeat

Unified key press with modifiers joined by +

hold_key

text: "shift", duration

Hold key for N seconds then release

keyboard_type

text

Type text character by character (Unicode)

keyboard_press

key

Press a single named key

keyboard_hotkey

keys: ["cmd", "c"]

Press key combination as array

Supported keys: return, tab, space, delete, escape, arrows (left, right, up, down), home, end, pageup, pagedown, f1-f12, a-z, 0-9, symbols.

Modifiers: cmd/command, shift, alt/option, ctrl/control, fn

Screenshot & Display Tools

Tool

Parameters

Description

take_screenshot

region (optional), max_dimension

Capture screen as base64 PNG with coordinate metadata

zoom

region: [x0, y0, x1, y1]

High-res crop of last screenshot (for reading small text)

switch_display

display

Switch active monitor for screenshots. Use "auto" for main.

list_displays

(none)

Enumerate all connected displays

Other Tools

Tool

Parameters

Description

read_clipboard

(none)

Read clipboard text

write_clipboard

text

Write text to clipboard

get_active_window

(none)

Frontmost window app, title, position, size

list_windows

(none)

All visible windows

get_screen_info

(none)

Screen dimensions, Retina scale, accessibility status

get_cursor_position

(none)

Current cursor coordinates

open_application

name or app

Launch macOS app by name

wait

duration

Pause for N seconds (0-100)

run_shell_command

command, timeout

Execute shell command

request_access

apps[], reason

Register apps for session access control

list_granted_applications

(none)

List currently granted apps


Architecture

computer-use-mcp/
├── __main__.py                    # Entry point (python -m or direct)
├── __init__.py                    # Package metadata
├── pyproject.toml                 # Dependencies & build config
├── .mcp.json                     # Universal MCP client config
└── server/
    ├── __init__.py                # Re-exports all tool modules
    ├── computer_use_server.py     # MCP Server class, tool registry, stdio transport
    └── tools/
        ├── __init__.py            # Exports all tool getters/handlers
        ├── access_tools.py        # request_access, list_granted_applications
        ├── batch_tools.py         # computer_batch (action orchestrator)
        ├── clipboard_tools.py     # read/write clipboard (NSPasteboard)
        ├── display_tools.py       # switch_display, zoom, list_displays
        ├── keyboard_tools.py      # key, hold_key, type, press, hotkey (Quartz)
        ├── mouse_tools.py         # 12 mouse tools (Quartz CGEvent)
        ├── screen_tools.py        # screen info, cursor position (Quartz)
        ├── screenshot_tools.py    # screenshot capture (mss + PIL)
        ├── system_tools.py        # open app, wait, shell command
        └── window_tools.py        # active window, list windows (Quartz + AppKit)

How It Works

  1. Transport: stdio (JSON-RPC 2.0 over stdin/stdout)

  2. Tool Registry: ComputerUseMCPServer collects tools from 10 category modules, maps tool names to handlers

  3. Input Simulation: macOS Quartz CGEvent API for mouse/keyboard events posted to kCGHIDEventTap

  4. Screenshots: mss library for fast capture, PIL for resizing, base64 encoding

  5. Coordinate System: All tools use logical screen coordinates (Retina-aware). The server handles physical-to-logical scaling automatically.

Coordinate Mapping

Screenshots include metadata for mapping image pixels to screen coordinates:

click_x = (pixel_x / image_width) * logical_screen_width
click_y = (pixel_y / image_height) * logical_screen_height

On Retina displays, logical coordinates differ from physical pixels. The server handles this transparently.


Troubleshooting

"Accessibility permission not granted"

Go to System Settings > Privacy & Security > Accessibility and add your terminal/IDE app.

Server fails to start

Ensure you're using the venv Python (not system Python):

/path/to/computer-use-mcp/.venv/bin/python3 __main__.py

Mouse/keyboard tools return errors but screenshots work

Screenshot capture doesn't need accessibility permission, but input simulation does. Grant accessibility access to the process running the server.

"ModuleNotFoundError: No module named 'server'"

The __main__.py adds its directory to sys.path automatically. If running as a module (python -m computer_use), set the cwd to the parent directory of computer_use/.

Multi-monitor: wrong screen captured

Use list_displays to see all monitors, then switch_display to select the correct one. Use switch_display("auto") to reset.


Contributing

Contributions are welcome! This server is designed to be extensible:

  1. Add new tools by creating a file in server/tools/

  2. Define get_*_tools() and handle_*_tool() functions

  3. Register in server/computer_use_server.py tool_sources list

  4. Update server/tools/__init__.py exports

Please ensure new tools follow the existing patterns for error handling and JSON response format.


License

MIT License - see LICENSE for details.

A
license - permissive license
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/syedazharmbnr1/computer-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server