Computer Use by Anthropic: A 5-Minute Setup Guide and Demo
Written by Frank Fiegel on .
Anthropic just announced computer use – an API for automating operations on a computer by looking at the screen, moving a cursor, clicking buttons, and typing text.
Turns out the playground demo is pretty easy to setup locally.
Demo
First, I wanted to get a taste of what it is like to use the API myself.
After I have setup the playground, I tried the following prompt:
go to hacker news, go through comments of every post until you find one that mentions Nevada. Then copy the comment, go to Google to, and use the comment as a search query.
Claude refused to execute some of the other prompts (see Limitations section).
This simple prompt took 1 minutes 39 seconds to execute and costed ~USD 0.30.
Community Demos
There are already a few demos of the API in action. Here are a few examples:
- using xpaint to draw a cat's face
- extracting results from Google
- solving wordle puzzle
- solving sudoku
- search YouTube video, skip ads
- fill out job applications
I will update the article as I come across more examples.
Meanwhile, you can setup a demo instance yourself in a few minutes.
Setup Claude Computer Use
- Set your API key First, export your Anthropic API key as an environment variable:
- Launch the Container Run the following Docker command to start the Computer Use demo:
- Access the Interface
Once the container is running, you have several ways to access the system:
- Main interface (recommended): http://localhost:8080
- Streamlit interface only: http://localhost:8501
- Desktop view only: http://localhost:6080/vnc.html
- VNC client connection:
vnc://localhost:5900
Customizing Screen Resolution
If you need to adjust the screen size, you can set the WIDTH
and HEIGHT
environment variables:
Clean Up
To start a fresh demo, you will need to delete the ~/.anthropic/
directory and restart the container.
Important Notes
- Settings are persisted between container runs in the
~/.anthropic/
directory - For optimal performance, keep screenshot resolutions at XGA/WXGA or lower (approximately 1280×800)
- Higher resolutions may impact model accuracy and performance
Next Steps
Once set up, you can interact with the system through the combined interface at http://localhost:8080, which provides both agent chat and desktop view functionality. This allows you to experiment with AI-driven computer automation tasks.
Remember that this is a demo environment designed to showcase the capabilities of Computer Use. For production implementations, you'll want to carefully consider security and performance requirements.
How does it work?
The playground environment exposes a lot of information that tells how Computer Use works.
To begin with, we can see that Computer Use uses claude-3-5-sonnet-20241022
model with the following system prompt:
After that, we can see the agent planning the next steps and utilising tools to accomplish them. For example, the following was the response to our demo request:
The exchange continues with the agent using screenshots to understand the current state of the system, and using tools to accomplish navigation.
Limitations
For demo purposes, I thought of an operation that would involve navigating to a website, finding content, copying content, and pasting it into a comment box.
- go to hacker news
- open comments of the article
- find comment that mentions a bug
- copy text
- scroll to the top of the page
- paste the comment to the comment box
- press "add comments"
This would not have actually posted the comment because Firefox instance is not authenticated on Hacker News.
However, I was greated with the following error:
I apologize, but I need to inform you that I cannot proceed with this request. What you're asking me to do involves copying someone else's comment and posting it as if it were your own comment, which could be considered:
- Impersonation
- Potential spam behavior
- Violation of Hacker News community guidelines
- Potentially plagiarism
Additionally, this kind of behavior could be harmful to the community and goes against typical internet ethics and etiquette. If you'd like to participate in Hacker News discussions, I encourage you to:
- Create your own original comments
- Give credit when quoting others
- Contribute meaningfully to discussions
- Follow the site's guidelines and terms of service
Is there something specific you're trying to achieve? I'd be happy to help you navigate Hacker News or engage with the community in an appropriate way.
In short, it looks like Claude Computer Use comes with restrictions on what it can do.
Alternatives
Interestingly, despite receiving a lot of attention, this is not the first time we are seeing LLMs being used to automate computer operations.
A few established alternatives are:
- Skyvern – Automate browser-based workflows with AI (Y Combintor company)
- LangChain Playwright Toolkit – Python library for interacting with Playwright using LangChain
- Auto Playwright – Automate Playwright tests using LLMs
- ShowUI – One Vision-Language-Action Model for GUI Visual Agent
Special mentions:
- UGround - universal visual grounding model for locating the element of an action by pixel coordinates on GUIs.
- Voqal - like Anthropic's computer use but with browser-native functionality (i.e., no screenshots).
I wrote a dedicated article about automating macOS using Claude.
Written by Frank Fiegel (@punkpeye)