computer-use
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@computer-useTake a screenshot of the current display"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
plugin-computer-use
A persistent stdio MCP server that exposes the Anthropic
computer-use
action surface (screenshot, click, move, keyboard, clipboard, batch) against the
SideButton agent desktop on DISPLAY=:10.
This repo is the scaffold + dispatch core for the Computer Use epic (SCRUM-1399). It is delivered by SCRUM-1397:
the long-lived stdio MCP server loop (
initialize/tools/list/tools/call),the ported
computer.pydispatch base (DISPLAY targeting, screenshot → base64 PNG, coordinate scaling, single-owner lock, xdotool runner),the full tool surface declared so
tools/listreturns it,screenshotwired end-to-end as the proof action.
The individual tool bodies land in sibling tickets (SCRUM-1400…1405) and
hosting this as a runtime: "service" plugin is SCRUM-1406.
Why a persistent server
The current SideButton plugin model
(the-assistant packages/server/src/plugins)
spawns a fresh, stateless handler process per tools/call and SIGKILLs it at
a 30s timeout. That cannot host the computer-use surface, which needs cross-call
state: a held mouse button (left_mouse_down … left_mouse_up), the
screenshot→coordinate session, session grants, and holds up to ~100s. So this is
a single, long-lived child process that speaks MCP over stdio.
Related MCP server: openowl
Tool surface
24 tools, grouped by the sibling ticket that owns each body. Only screenshot
is implemented here; the rest are declared and return a clear pending-owner error
until their ticket lands. Full input schemas:
docs/computer-use-mcp-tools-schema.md.
Group | Ticket | Tools |
capture | SCRUM-1400 |
|
click | SCRUM-1401 |
|
move / drag / scroll | SCRUM-1402 |
|
keyboard | SCRUM-1403 |
|
clipboard + session | SCRUM-1404 |
|
utility / batch | SCRUM-1405 |
|
Surface count. This is the 24-tool surface the epic (SCRUM-1399) specifies. The clipboard + session group follows the explicit enumeration in SCRUM-1404 (
read_clipboard/write_clipboardsplit +list_granted_applications), which is the 2-tool delta over the work plan's interim count of 22.src/tools.pyis the single source of truth;docs/computer-use-mcp-tools-schema.md(AC4) is generated from it.
Bare names + collisions. Names are the canonical Anthropic action ids.
screenshot,type,scroll,wait,clickcollide with core SideButton MCP tools, and the current loader drops the entire plugin on any collision. That is fine standalone (this server owns its namespace); namespacing on aggregation is deferred to SCRUM-1406 (recommended: bare names in the child, prefix/slug-namespace on the host).
Layout
plugin-computer-use/
├── plugin.json # generated service-plugin manifest (proposes runtime:"service")
├── src/
│ ├── server.py # stdio MCP loop: initialize / tools/list / tools/call
│ ├── computer.py # dispatch base (ported computer.py)
│ └── tools.py # canonical tool surface (single source of truth)
├── scripts/
│ └── build_manifest.py # regenerates plugin.json + the schema doc from tools.py
├── tests/ # unittest: dispatch-base unit + stdio round-trip + manifest
├── docs/
│ └── computer-use-mcp-tools-schema.md # generated; the AC4 schema doc
├── run_tests.sh # runs the suite (xvfb-wrapped when no DISPLAY)
├── pyproject.toml # dependency-free, python>=3.10
├── README.md LICENSE .gitignoreRun it standalone
# speak MCP by hand (newline-delimited JSON-RPC):
printf '%s\n' \
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
'{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
'{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"screenshot","arguments":{}}}' \
| DISPLAY=:10 python3 src/server.pyinitialize returns the handshake, tools/list the 24-tool surface, and the
screenshot call a base64 PNG image block.
Test
./run_tests.sh # uses $DISPLAY if set, else wraps in xvfb-run
# or directly:
DISPLAY=:10 python3 -m unittest discover -s tests -vtests/test_dispatch_base.py— coordinate-scaling math, xdotool command construction, single-owner lock, screenshot-backend detection, surface shape.tests/test_stdio_roundtrip.py—initialize→tools/list→tools/call screenshotover a spawned server (AC1/AC2/AC3), plus error paths.tests/test_manifest.py—plugin.json+ schema doc are present and in sync withsrc/tools.py.
The screenshot round-trip needs an X display; run_tests.sh provides one via
xvfb-run when $DISPLAY is unset, so AC3 still exercises in headless CI.
System dependencies
System packages (apt), not pip — the plugin install copies no node_modules/venv
and runs no build step, so the server is stdlib-only and shells out to:
Tool | Used for | Notes |
a screenshot backend |
|
|
| pointer/keyboard actions | required by the click/move/keyboard groups (siblings). |
|
| already on the runner. |
| window ops | optional. |
scrot and gnome-screenshot are absent on the runner image, so the
screenshot backend falls through to ImageMagick import -window root (verified
on DISPLAY=:10). When SCRUM-1407 adds this plugin to the agent-runners catalog,
declare xdotool, a screenshot backend, and xclip in its system_deps.
DISPLAY and single-owner
The server targets the inherited
$DISPLAY, defaulting to:10(the runner desktop). It never hardcodes a display — the screen-record plugin's bug was capturing a non-existent:1.0.It takes a process-lifetime single-owner lock (
flock,/tmp/sidebutton-computer-use.lock, override withCU_LOCK_PATH) so only one session drives the shared pointer/keyboard; a second instance exits non-zero.
Service-manifest contract (input to SCRUM-1406)
plugin.json proposes the service shape the engine ticket implements against:
{
"name": "computer-use",
"runtime": "service", // new: not understood by today's loader
"service": {
"protocol": "mcp-stdio",
"command": ["python3", "src/server.py"],
"toolDiscovery": "tools/list", // host discovers the surface at runtime
"singleOwner": true,
"display": ":10"
},
"tools": [ /* full surface, mirrored from tools.py */ ]
}Intentionally not loadable today. The current
readPluginManifest/loader.tsrequire a per-toolhandlerand know noruntimefield, sosidebutton plugin installwill reject this manifest by design — that is the exact gap SCRUM-1406 closes (teach the loaderruntime: "service": launch thecommand, discover tools viatools/list, routetools/callto the child, namespace on aggregation). This ticket does not modify the loader or the agent-runners catalog.
Configuration (env)
Var | Default | Purpose |
|
| target X display |
|
| screen size for coordinate scaling |
|
| post-action settle before a screenshot |
|
| single-owner lock file |
License
MIT © 2026 SideButton
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sidebutton/plugin-computer-use'
If you have feedback or need assistance with the MCP directory API, please join our Discord server