Computer Use MCP Server
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Computer Use MCP Servertake a screenshot and save it to the desktop"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Computer Use MCP Server
A production-grade macOS Computer Use MCP Server that exposes 33 tools across 10 categories for full desktop automation via the Model Context Protocol. Control mouse, keyboard, screenshots, clipboard, windows, and more from any MCP-compatible AI client.
Works with Claude Code, Cursor, VS Code, Windsurf, LM Studio, Ollama, llama.cpp, MLX, and any MCP-compatible tool.
Features
33 Tools Across 10 Categories
Category | Tools | Description |
Mouse (12) |
| Full mouse control with coordinate-based clicking, dragging with 20-step interpolation, directional scrolling |
Keyboard (5) |
| Unified key combos ( |
Screenshot (1) |
| Full-screen or region capture with Retina scaling, coordinate metadata, and configurable resolution |
Display (3) |
| Multi-monitor switching, high-res region zoom for reading small text, display enumeration |
Clipboard (2) |
| Read/write system clipboard via NSPasteboard |
Window (2) |
| Frontmost window info, enumerate all visible windows with position/size |
Screen (2) |
| Display dimensions, Retina scale, accessibility status, cursor coordinates |
System (3) |
| Launch apps by name, timed waits, shell command execution |
Access (2) |
| App permission tracking for session-based access control |
Batch (1) |
| Execute multiple actions in a single call - eliminates round-trip latency |
Quick Start
# 1. Clone
git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp
# 2. Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt # or: pip install mcp mss pillow pyobjc-framework-Quartz
# 3. Test
python3 __main__.pyThe server communicates over stdio (stdin/stdout) using the MCP JSON-RPC protocol.
Installation
Prerequisites
macOS (uses Quartz framework for input simulation)
Python 3.10+
Accessibility permissions (System Settings > Privacy & Security > Accessibility)
Install Dependencies
git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp
python3 -m venv .venv
source .venv/bin/activate
pip install mcp>=1.26.0 mss pillow pyobjc-framework-QuartzVerify Installation
python3 -c "
from server.computer_use_server import ComputerUseMCPServer
server = ComputerUseMCPServer()
tools = server._collect_all_tools()
print(f'Server OK - {len(tools)} tools registered')
"Expected output: Server OK - 33 tools registered
Grant Accessibility Permission
The server needs macOS accessibility access to simulate mouse/keyboard input:
Open System Settings > Privacy & Security > Accessibility
Add your terminal app (Terminal, iTerm2, VS Code, etc.)
Toggle the permission ON
Screenshot capture works without accessibility permission. Only mouse/keyboard tools require it.
Configuration for AI Coding Tools
The server uses stdio transport - it reads from stdin and writes to stdout. Every MCP client connects the same way: spawn the Python process and pipe stdio.
Claude Code
Edit ~/.claude/settings.json:
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"],
"cwd": "/path/to/computer-use-mcp"
}
}
}Then run /mcp in Claude Code to connect.
Cursor
Create .cursor/mcp.json in your project root (or ~/.cursor/mcp.json globally):
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}VS Code + GitHub Copilot
Create .vscode/mcp.json in your workspace:
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}Windsurf
Edit ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}JetBrains IDEs
Add via Settings > Tools > MCP Servers, using the same command/args pattern.
Zed
Add to your Zed settings (~/.config/zed/settings.json):
{
"language_models": {
"mcp_servers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}
}Cline / Continue.dev
Both support the standard MCP JSON config format. Add to their respective config files using the same command + args pattern shown above.
Recommended Models for Tool Calling (April 2026)
Per LM Arena rankings and real-world testing, these are the best models for MCP tool calling:
Top Open Source Models (LM Arena Elo)
Rank | Model | Provider | Parameters | Highlights |
1 | GLM-5 | Zhipu AI | MoE | #1 open source (Elo 1451), 77.8% SWE-bench Verified |
2 | Kimi K2.5 | Moonshot AI | MoE | HumanEval 99.0, stable across 200-300 sequential tool calls |
3 | GLM-4.7 | Zhipu AI | MoE | HumanEval 94.2, AIME 2025 95.7, GPQA 85.7 |
4 | GLM-5.1 | Zhipu AI | 744B MoE / 40B active | MIT license, 200K context, 8+ hour continuous agentic sessions |
5 | Qwen 3.6 Plus | Alibaba | Dense | 1M context, native function calling, always-on CoT reasoning |
6 | Gemma 4 31B | 31B Dense | #3 Arena text, Apache 2.0, native tool calling, 256K context | |
7 | Llama 4 Scout | Meta | 17B active / 16 experts | 10M context window, multimodal, beats Gemini 2.0 Flash-Lite |
8 | Llama 4 Maverick | Meta | 17B active / 128 experts | Beats GPT-4o, best multimodal in class |
9 | Mistral Small 4 | Mistral AI | 119B MoE / 6B active | Unified instruct+reasoning+coding+vision, 256K context |
10 | Qwen 3.5 | Alibaba | Multiple sizes | Most stable tool calling, rarely hallucinates calls |
Best Models by Platform
Ollama (run locally via ollama pull <model>):
gemma4(E2B / E4B / 26B MoE / 31B Dense) — native function calling, best sub-32B for agentsqwen3.5/qwen3.6-plus— most stable tool calling, rarely drops parametersllama4(Scout / Maverick) — native multimodal + tools, 10M contextkimi-k2.5— 200+ sequential tool calls without driftglm-5.1— long-horizon agentic coding (8+ hours continuous)mistral-small4— unified model, 6B active, fastgranite4— enterprise-grade tool callingphi-4-mini— compact with function calling supportdeepseek-r1— strong reasoning + tool use
llama.cpp (GGUF format):
bartowski/Gemma-4-31B-IT-GGUF— best open weight for agentsbartowski/Qwen3.5-32B-Instruct-GGUF— stable tool callingbartowski/Llama-4-Scout-17B-GGUF— 10M context, multimodalbartowski/GLM-5.1-40B-GGUF— top open source codingAny model with Jinja chat template + function calling support
MLX (Apple Silicon via mlx-community):
mlx-community/Gemma-4-31B-IT-4bit— best performance/quality on Apple Siliconmlx-community/Qwen3.5-32B-Instruct-4bit— stable tool callsmlx-community/Llama-4-Scout-17B-4bit— multimodal + toolsmlx-community/Mistral-Small-4-6B-4bit— fast, 6B active
LM Studio: All of the above models are available through LM Studio's model browser with native MCP host support.
Configuration for Local Model Frameworks
LM Studio
LM Studio has native MCP host support since v0.3.17.
Open LM Studio > Settings > MCP
Add a new MCP server with:
Command:
/path/to/computer-use-mcp/.venv/bin/python3Args:
["/path/to/computer-use-mcp/__main__.py"]
Select a model with tool calling support:
Top picks: Gemma 4 31B, Qwen 3.5/3.6, Llama 4 Scout, GLM-5.1, Mistral Small 4, Kimi K2.5
The tools will appear in the chat interface
llama.cpp (Native MCP - March 2026+)
llama.cpp merged native MCP client support in March 2026 (PR #18655), adding a full agentic loop with MCP server management in the WebUI.
Start llama-server with MCP:
# Start with a top function-calling model (pick one)
llama-server --jinja -fa -hf bartowski/Gemma-4-31B-IT-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Qwen3.5-32B-Instruct-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Llama-4-Scout-17B-GGUF:Q4_K_M --port 8080Then in the llama.cpp WebUI:
Go to MCP Server Settings
Add this server with command:
/path/to/.venv/bin/python3 /path/to/__main__.pyThe 33 tools will be available in the agentic loop
Via llama-mcp-server bridge:
npm install -g llama-mcp-serverConfigure in claude_desktop_config.json:
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}Supported models for tool calling: Gemma 4, Qwen 3.5/3.6, Llama 4 Scout/Maverick, GLM-5.1, Kimi K2.5, Mistral Small 4, Llama 3.3, DeepSeek R1, Granite 4, Phi-4-mini, Hermes 3, Functionary v3.
Ollama
Ollama does not have native MCP support yet, but several bridge solutions work:
Option A: MCP-Bridge (recommended)
MCP-Bridge acts as middleware between Ollama's OpenAI-compatible API and MCP servers.
git clone https://github.com/SecretiveShell/MCP-Bridge.git
cd MCP-BridgeConfigure config.json:
{
"inference_server": {
"base_url": "http://localhost:11434/v1",
"api_key": "ollama"
},
"mcp_servers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}Option B: ollama-mcp-bridge
git clone https://github.com/patruff/ollama-mcp-bridge.git
cd ollama-mcp-bridge
npm install && npm run buildAdd the computer-use server to the bridge config.
Recommended Ollama models (April 2026):
gemma4:31b— best sub-32B for agents, native function callingqwen3.5:32b— most stable tool callingllama4:scout— 10M context, multimodal + toolskimi-k2.5— 200+ sequential tool calls without driftglm-5.1— long-horizon agentic (8+ hours continuous)mistral-small4— fast, 6B active paramsgranite4— enterprise tool calling
MLX / Apple Silicon
For Apple Silicon Macs, use vLLM-MLX for optimized local inference with MCP bridge:
Install vLLM-MLX:
pip install git+https://github.com/waybarrios/vllm-mlx.gitStart the inference server:
# Pick a model (top recommendations for tool calling)
vllm-mlx serve mlx-community/Gemma-4-31B-IT-4bit --port 8000
vllm-mlx serve mlx-community/Qwen3.5-32B-Instruct-4bit --port 8000
vllm-mlx serve mlx-community/Llama-4-Scout-17B-4bit --port 8000Connect via MCP-Bridge:
{
"inference_server": {
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed"
},
"mcp_servers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}Performance: M4 Max achieves ~402 tokens/sec on small models, ~1112 tokens/sec with continuous batching.
Alternative: oMLX provides a macOS menu bar app with MCP tool integration.
vLLM
vLLM has native MCP integration with GPU-optimized inference.
pip install vllm
vllm serve google/gemma-4-31b-it --port 8000 # or any tool-calling model
vllm serve Qwen/Qwen3.5-32B-Instruct --port 8000 # stable tool calling
vllm serve meta-llama/Llama-4-Scout-17B --port 8000 # multimodal + toolsConnect via MCP-Bridge using http://localhost:8000/v1 as the base URL.
Generic OpenAI-Compatible API
Any service exposing an OpenAI-compatible API (local or remote) can use this server through MCP-Bridge:
Start your inference server (Ollama, llama.cpp, vLLM, MLX, TGI, etc.)
Point MCP-Bridge at it with the
base_urlAdd this server to MCP-Bridge's
mcp_serversconfigMCP-Bridge intercepts API requests, enriches them with tool definitions, executes tool calls, and returns results
Tool Reference
Batch Operations — computer_batch
Execute multiple actions in a single call to eliminate round-trip latency:
{
"actions": [
{"action": "left_click", "coordinate": [100, 200]},
{"action": "type", "text": "Hello, world!"},
{"action": "key", "text": "Return"},
{"action": "wait", "duration": 1},
{"action": "screenshot"}
]
}Supported actions: key, type, mouse_move, left_click, left_click_drag, right_click, middle_click, double_click, triple_click, scroll, hold_key, screenshot, cursor_position, left_mouse_down, left_mouse_up, wait
Mouse Tools
Tool | Parameters | Description |
|
| Left-click at coordinates |
|
| Right-click (context menu) |
|
| Middle-click (scroll wheel) |
|
| Double-click (select word) |
|
| Triple-click (select line) |
|
| General click with full control |
|
| Move cursor without clicking |
|
| Drag with 20-step interpolation |
| (none) | Press and hold left button |
| (none) | Release left button |
|
| Directional scroll (up/down/left/right) |
|
| Scroll wheel (positive=up, negative=down) |
Keyboard Tools
Tool | Parameters | Description |
|
| Unified key press with modifiers joined by |
|
| Hold key for N seconds then release |
|
| Type text character by character (Unicode) |
|
| Press a single named key |
|
| Press key combination as array |
Supported keys: return, tab, space, delete, escape, arrows (left, right, up, down), home, end, pageup, pagedown, f1-f12, a-z, 0-9, symbols.
Modifiers: cmd/command, shift, alt/option, ctrl/control, fn
Screenshot & Display Tools
Tool | Parameters | Description |
|
| Capture screen as base64 PNG with coordinate metadata |
|
| High-res crop of last screenshot (for reading small text) |
|
| Switch active monitor for screenshots. Use |
| (none) | Enumerate all connected displays |
Other Tools
Tool | Parameters | Description |
| (none) | Read clipboard text |
|
| Write text to clipboard |
| (none) | Frontmost window app, title, position, size |
| (none) | All visible windows |
| (none) | Screen dimensions, Retina scale, accessibility status |
| (none) | Current cursor coordinates |
|
| Launch macOS app by name |
|
| Pause for N seconds (0-100) |
|
| Execute shell command |
|
| Register apps for session access control |
| (none) | List currently granted apps |
Architecture
computer-use-mcp/
├── __main__.py # Entry point (python -m or direct)
├── __init__.py # Package metadata
├── pyproject.toml # Dependencies & build config
├── .mcp.json # Universal MCP client config
└── server/
├── __init__.py # Re-exports all tool modules
├── computer_use_server.py # MCP Server class, tool registry, stdio transport
└── tools/
├── __init__.py # Exports all tool getters/handlers
├── access_tools.py # request_access, list_granted_applications
├── batch_tools.py # computer_batch (action orchestrator)
├── clipboard_tools.py # read/write clipboard (NSPasteboard)
├── display_tools.py # switch_display, zoom, list_displays
├── keyboard_tools.py # key, hold_key, type, press, hotkey (Quartz)
├── mouse_tools.py # 12 mouse tools (Quartz CGEvent)
├── screen_tools.py # screen info, cursor position (Quartz)
├── screenshot_tools.py # screenshot capture (mss + PIL)
├── system_tools.py # open app, wait, shell command
└── window_tools.py # active window, list windows (Quartz + AppKit)How It Works
Transport: stdio (JSON-RPC 2.0 over stdin/stdout)
Tool Registry:
ComputerUseMCPServercollects tools from 10 category modules, maps tool names to handlersInput Simulation: macOS Quartz
CGEventAPI for mouse/keyboard events posted tokCGHIDEventTapScreenshots:
msslibrary for fast capture, PIL for resizing, base64 encodingCoordinate System: All tools use logical screen coordinates (Retina-aware). The server handles physical-to-logical scaling automatically.
Coordinate Mapping
Screenshots include metadata for mapping image pixels to screen coordinates:
click_x = (pixel_x / image_width) * logical_screen_width
click_y = (pixel_y / image_height) * logical_screen_heightOn Retina displays, logical coordinates differ from physical pixels. The server handles this transparently.
Troubleshooting
"Accessibility permission not granted"
Go to System Settings > Privacy & Security > Accessibility and add your terminal/IDE app.
Server fails to start
Ensure you're using the venv Python (not system Python):
/path/to/computer-use-mcp/.venv/bin/python3 __main__.pyMouse/keyboard tools return errors but screenshots work
Screenshot capture doesn't need accessibility permission, but input simulation does. Grant accessibility access to the process running the server.
"ModuleNotFoundError: No module named 'server'"
The __main__.py adds its directory to sys.path automatically. If running as a module (python -m computer_use), set the cwd to the parent directory of computer_use/.
Multi-monitor: wrong screen captured
Use list_displays to see all monitors, then switch_display to select the correct one. Use switch_display("auto") to reset.
Contributing
Contributions are welcome! This server is designed to be extensible:
Add new tools by creating a file in
server/tools/Define
get_*_tools()andhandle_*_tool()functionsRegister in
server/computer_use_server.pytool_sources listUpdate
server/tools/__init__.pyexports
Please ensure new tools follow the existing patterns for error handling and JSON response format.
License
MIT License - see LICENSE for details.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/syedazharmbnr1/computer-use-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server