Automating macOS using Claude Computer Use
Written by punkpeye on .
- title: 'Automating macOS using Claude Computer Use'
altTitle: 'Using Python to automate macOS using Claude Computer Use'
description: 'Learn how to automate macOS operations using Claude''s Computer Use API and Python. Discover open-source projects that extend Claude''s screen viewing and control capabilities beyond browsers to system-wide automation.'
publishedDate: 2024-10-23 09:01:57
tags: anthropic, automation, macOS
author: 'Frank Fiegel'
authorGitHubUsername: 'punkpeye'
- Automate macOS using Claude Computer Use and Python
- Related Projects
Just yesterday I shared A 5-Minute Setup Guide for Claude Computer Use. Claude's Computer Use API enables automation of computer operations through screen viewing, cursor control, and text input. While the original demo focused on browser automation, this guide extends these capabilities to system-wide macOS automation.
Automate macOS using Claude Computer Use and Python
System Requirements
- macOS 12 (Monterey) or later
- Python 3.8+
- Homebrew package manager
- Terminal application (Terminal.app, iTerm2, etc.)
Permissions Setup
It is necessary to give Terminal (or iTerm) permission to control the computer:
- Open System Settings
- Navigate to Privacy & Security → Accessibility
- Enable control for your terminal application
How It Works
The automation system consists of three main components:
-
Claude's Computer Use API
- Provides screen analysis capabilities
- Interprets visual elements
- Makes decisions about automation actions
-
cliclick
- Handles low-level system interaction
- Manages mouse movements and clicks
- Controls keyboard input
-
Python Bridge (computer.py)
- Coordinates between Claude and system
- Handles scaling and coordinate translation
- Manages system commands
Setup Guide
-
Install
cliclick
for mouse & keyboard emulation: -
Clone the Anthropic quickstart repository:
-
Navigate to the demo directory:
-
Replace
computer-use-demo/computer_use_demo/tools/computer.py
with the modified version below -
Run the setup script:
-
Activate the virtual environment:
-
Export required environment variables:
-
Start the application:
computer.py
The meat and potatoes of the application.
Shoutout to wong2 who is the original author of this code. Here shared the snippet of this code on his Twitter and Reddit. Consider contributing improvements to the original gist.
Never leave automated sessions unattended - The tool can control your mouse and keyboard, which could lead to unintended actions
By using this tool, you accept responsibility for any actions performed through the automation interface. When in doubt, test in a safe environment first.
Related Projects
A few projects are mentioned here: https://os-world.github.io/
What follows are projects that I've discovered after the fact.
Another Python Project
Open Interface
Open Interface is the most impressive open-source projects I've seen so far in this category.
https://github.com/AmberSahdev/Open-Interface/
- Self-drives computers by sending user requests to an LLM backend (GPT-4V, etc) to figure out the required steps.
- Automatically executes the steps by simulating keyboard and mouse input.
- Course-corrects by sending the LLMs a current screenshot of the computer as needed.
The codebase is also written in Python.
open-interpreter
Suprised it took me so long to find this project, but this one takes the cake. 🍰
https://github.com/OpenInterpreter/open-interpreter
It has many contributors, it's well documented, and it has a simple CLI interface. Works on Windows and Mac.
Written by punkpeye (@punkpeye)