Skip to main content
Glama
sidebutton

computer-use

by sidebutton

plugin-computer-use

A persistent stdio MCP server that exposes the Anthropic computer-use action surface (screenshot, click, move, keyboard, clipboard, batch) against the SideButton agent desktop on DISPLAY=:10.

This repo is the scaffold + dispatch core for the Computer Use epic (SCRUM-1399). It is delivered by SCRUM-1397:

  • the long-lived stdio MCP server loop (initialize / tools/list / tools/call),

  • the ported computer.py dispatch base (DISPLAY targeting, screenshot → base64 PNG, coordinate scaling, single-owner lock, xdotool runner),

  • the full tool surface declared so tools/list returns it,

  • screenshot wired end-to-end as the proof action.

The individual tool bodies land in sibling tickets (SCRUM-1400…1405) and hosting this as a runtime: "service" plugin is SCRUM-1406.

Why a persistent server

The current SideButton plugin model (the-assistant packages/server/src/plugins) spawns a fresh, stateless handler process per tools/call and SIGKILLs it at a 30s timeout. That cannot host the computer-use surface, which needs cross-call state: a held mouse button (left_mouse_downleft_mouse_up), the screenshot→coordinate session, session grants, and holds up to ~100s. So this is a single, long-lived child process that speaks MCP over stdio.

Related MCP server: openowl

Tool surface

24 tools, grouped by the sibling ticket that owns each body. Only screenshot is implemented here; the rest are declared and return a clear pending-owner error until their ticket lands. Full input schemas: docs/computer-use-mcp-tools-schema.md.

Group

Ticket

Tools

capture

SCRUM-1400

screenshot ✅, zoom

click

SCRUM-1401

left_click, right_click, middle_click, double_click, triple_click

move / drag / scroll

SCRUM-1402

mouse_move, left_click_drag, scroll, left_mouse_down, left_mouse_up

keyboard

SCRUM-1403

type, key, hold_key

clipboard + session

SCRUM-1404

read_clipboard, write_clipboard, request_access, list_granted_applications, open_application, switch_display

utility / batch

SCRUM-1405

computer_batch, wait, cursor_position

Surface count. This is the 24-tool surface the epic (SCRUM-1399) specifies. The clipboard + session group follows the explicit enumeration in SCRUM-1404 (read_clipboard / write_clipboard split + list_granted_applications), which is the 2-tool delta over the work plan's interim count of 22. src/tools.py is the single source of truth; docs/computer-use-mcp-tools-schema.md (AC4) is generated from it.

Bare names + collisions. Names are the canonical Anthropic action ids. screenshot, type, scroll, wait, click collide with core SideButton MCP tools, and the current loader drops the entire plugin on any collision. That is fine standalone (this server owns its namespace); namespacing on aggregation is deferred to SCRUM-1406 (recommended: bare names in the child, prefix/slug-namespace on the host).

Layout

plugin-computer-use/
├── plugin.json        # generated service-plugin manifest (proposes runtime:"service")
├── src/
│   ├── server.py      # stdio MCP loop: initialize / tools/list / tools/call
│   ├── computer.py    # dispatch base (ported computer.py)
│   └── tools.py       # canonical tool surface (single source of truth)
├── scripts/
│   └── build_manifest.py   # regenerates plugin.json + the schema doc from tools.py
├── tests/             # unittest: dispatch-base unit + stdio round-trip + manifest
├── docs/
│   └── computer-use-mcp-tools-schema.md   # generated; the AC4 schema doc
├── run_tests.sh       # runs the suite (xvfb-wrapped when no DISPLAY)
├── pyproject.toml     # dependency-free, python>=3.10
├── README.md  LICENSE  .gitignore

Run it standalone

# speak MCP by hand (newline-delimited JSON-RPC):
printf '%s\n' \
  '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
  '{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
  '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"screenshot","arguments":{}}}' \
  | DISPLAY=:10 python3 src/server.py

initialize returns the handshake, tools/list the 24-tool surface, and the screenshot call a base64 PNG image block.

Test

./run_tests.sh          # uses $DISPLAY if set, else wraps in xvfb-run
# or directly:
DISPLAY=:10 python3 -m unittest discover -s tests -v
  • tests/test_dispatch_base.py — coordinate-scaling math, xdotool command construction, single-owner lock, screenshot-backend detection, surface shape.

  • tests/test_stdio_roundtrip.pyinitializetools/listtools/call screenshot over a spawned server (AC1/AC2/AC3), plus error paths.

  • tests/test_manifest.pyplugin.json + schema doc are present and in sync with src/tools.py.

The screenshot round-trip needs an X display; run_tests.sh provides one via xvfb-run when $DISPLAY is unset, so AC3 still exercises in headless CI.

System dependencies

System packages (apt), not pip — the plugin install copies no node_modules/venv and runs no build step, so the server is stdlib-only and shells out to:

Tool

Used for

Notes

a screenshot backend

screenshot

gnome-screenshot or scrot or ImageMagick (import/convert). The runner ships ImageMagick.

xdotool

pointer/keyboard actions

required by the click/move/keyboard groups (siblings).

xclip

clipboard

already on the runner.

wmctrl

window ops

optional.

scrot and gnome-screenshot are absent on the runner image, so the screenshot backend falls through to ImageMagick import -window root (verified on DISPLAY=:10). When SCRUM-1407 adds this plugin to the agent-runners catalog, declare xdotool, a screenshot backend, and xclip in its system_deps.

DISPLAY and single-owner

  • The server targets the inherited $DISPLAY, defaulting to :10 (the runner desktop). It never hardcodes a display — the screen-record plugin's bug was capturing a non-existent :1.0.

  • It takes a process-lifetime single-owner lock (flock, /tmp/sidebutton-computer-use.lock, override with CU_LOCK_PATH) so only one session drives the shared pointer/keyboard; a second instance exits non-zero.

Service-manifest contract (input to SCRUM-1406)

plugin.json proposes the service shape the engine ticket implements against:

{
  "name": "computer-use",
  "runtime": "service",            // new: not understood by today's loader
  "service": {
    "protocol": "mcp-stdio",
    "command": ["python3", "src/server.py"],
    "toolDiscovery": "tools/list", // host discovers the surface at runtime
    "singleOwner": true,
    "display": ":10"
  },
  "tools": [ /* full surface, mirrored from tools.py */ ]
}

Intentionally not loadable today. The current readPluginManifest/loader.ts require a per-tool handler and know no runtime field, so sidebutton plugin install will reject this manifest by design — that is the exact gap SCRUM-1406 closes (teach the loader runtime: "service": launch the command, discover tools via tools/list, route tools/call to the child, namespace on aggregation). This ticket does not modify the loader or the agent-runners catalog.

Configuration (env)

Var

Default

Purpose

DISPLAY

:10

target X display

CU_WIDTH / CU_HEIGHT

1920 / 1080

screen size for coordinate scaling

CU_SCREENSHOT_DELAY

2.0

post-action settle before a screenshot

CU_LOCK_PATH

/tmp/sidebutton-computer-use.lock

single-owner lock file

License

MIT © 2026 SideButton

Install Server
A
license - permissive license
B
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sidebutton/plugin-computer-use'

If you have feedback or need assistance with the MCP directory API, please join our Discord server