Which integrations are available for this server?

Integrates with the GNOME desktop environment via a bundled Shell extension to provide window awareness, focus control, and reliable window-list data. Provides screen capture and input simulation capabilities on the Wayland display server using PipeWire and the RemoteDesktop portal.

How do I use screen-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@screen-mcp capture the screen and annotate clickable elements" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

screen-mcp

by 88plug

Overview Schema Related Servers Score Discussions

Python

Local

screen-mcp

Ask DeepWiki

Give a model eyes and hands on a Linux Wayland desktop: screenshot, click, type, scroll, drag, and read any visible app.

plugin-validate License: FSL-1.1-ALv2 Claude Code plugin Docs

screen-mcp is an MCP server for Claude Code that lets a model see and operate your GNOME/Wayland desktop. It captures any monitor through PipeWire, drives the pointer and keyboard through the xdg-desktop-portal RemoteDesktop portal, and optionally reads the screen with OCR and grounds icons with an OmniParser ONNX model. It is for developers who want an agent that can use real desktop apps, not just a browser. It is pure Python and runs CPU-only.

Quickstart

Install the plugin in Claude Code:

/plugin marketplace add 88plug/screen-mcp
/plugin install screen-mcp@screen-mcp

Then run the one-time dependency setup (the server has system and Python deps the manifest cannot install for you):

# in the installed plugin dir (or a clone)
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt

On first use the desktop portal pops a consent dialog asking which monitor(s) to share. Pick one, and ask the model to take a screenshot:

Take a screenshot of my desktop and tell me which window is focused.

You should get back a labeled capture plus the focused-window name within a few seconds. The portal returns a restore token cached at ~/.config/mcp-screen so later runs are silent.

IMPORTANT

screen-mcp runs on Linux + Wayland + GNOME only, and grounding is CPU-only by design. SeeRequirements before installing.

Related MCP server: desk-mcp

What it does

The server turns one capture-and-act loop into MCP tools a model can call directly:

Screenshot any monitor or region, with numbered Set-of-Marks overlays and click coordinates.
Click, type, scroll, and drag in any visible app, including native Wayland apps that xdotool and XTEST cannot reach.
Read on-screen text with OCR (RapidOCR) and ground icons with an OmniParser ONNX model (both optional).
Sense changes: an ambient layer diffs frames so the agent knows when something opened or when an action was a no-op.
Cache learned screens: a write-through world model lets a recognized screen skip OCR on the next visit.
Gate destructive actions: an opt-in ack guard blocks close-combos and destructive-keyword clicks until the caller passes a confirmation token.

It also ships a drive-screen skill that encodes the locate → ground → act → confirm loop, so the model knows how to use the tools well out of the box.

Principles — The Agent Oath

screen-mcp is a reference enforcer of The Agent Oath: the user-takeover guard yields control the instant a human moves the mouse (a STOPPED result), keeping the human in charge of their own desktop. That's §2 (human agency) and §11 (human oversight) made executable — don't fight the human for the mouse. The opt-in ack gate (§7, don't bypass safety) and the on-screen visibility of every action (§5, transparency) round it out.

MCP tools

These are the tools the server exposes. Every action also accepts space (view / desktop / norm, default view — coords as seen in the last screenshot), shot: true to return a screenshot after, verify: true to warn on no-change misclicks, force: true to bypass the user-takeover guard, and element: <id> to click an element id from the last annotated shot.

Tool	What it does
`screen_screenshot`	Capture the desktop. `region` / `monitor` to zoom, `annotate=true` for numbered marks, `use_cache=true` to reuse learned elements, `fresh=true` to force a current frame on a static monitor.
`screen_list_monitors`	Monitors (origin/size/scale), desktop bounds, focused windows.
`screen_move_mouse`	Move the pointer to `x,y`.
`screen_click`	Click at `x,y` or in place. `button`, `double`.
`screen_scroll`	Wheel scroll by direction and amount.
`screen_drag`	Press-drag from one point to another.
`screen_key`	Press a key or combo, e.g. `Ctrl+L`, `Enter`, `Alt+Tab`.
`screen_type`	Type text into the focused window (Unicode via clipboard paste, ASCII via keysyms).
`screen_focus`	Raise and give keyboard focus to a window by app/title/id.
`screen_do`	Run a batch of ordered actions in one call.
`screen_tour`	Visit several UI states, return a labeled thumbnail of each.
`screen_read_page`	Auto-scroll a scrollable view and accumulate every interactable.
`screen_wait`	Block until the screen settles, then optionally screenshot.
`screen_session`	Recorder: `start` / `stop` / `list` / `status` / `replay-path`.
`screen_reload`	Hot-reload the server in place after edits.
`screen_diag`	Health dump: session, cursor, grounding backends, world-model stats.

Requirements

IMPORTANT

screen-mcp targets a specific stack. It will not run elsewhere.

Linux + Wayland + GNOME. The awareness layer uses a bundled GNOME Shell extension; AT-SPI is the fallback for GTK apps.
Python 3.10+ (tested on 3.14).
GStreamer >= 1.28 (uses leaky-type; the older drop= was removed in 1.28).
PipeWire and xdg-desktop-portal-gnome.
wl-clipboard for the Unicode paste path in screen_type.
A DejaVu Sans Bold font for Set-of-Marks labels (falls back to PIL's default).

Grounding is CPU-only by design: the server hard-disables the GPU (CUDA_VISIBLE_DEVICES="") for predictable latency and no driver flake.

Install the system deps before the Python deps. See requirements.txt for the full pacman / apt one-liners.

# Arch / Manjaro
sudo pacman -S python-gobject gobject-introspection \
               gstreamer gst-plugins-base gst-plugins-good gst-libav \
               pipewire pipewire-pulse xdg-desktop-portal-gnome \
               wl-clipboard ttf-dejavu

# Python deps
pip install -r requirements.txt

The bundled window-info extension gives the awareness layer reliable focused-window and window-list data, and lets screen_focus activate windows directly. Installing it needs a one-time Wayland re-login.

gnome-shell-extension/window-info@local/install.sh
gnome-extensions enable window-info@local

For the kernel input backend, grant access to /dev/uinput by adding your user to the input group. The launcher (bin/screen-mcp) fails with a clear message if a required dep is missing, so a misconfigured install never silently half-works.

Manual MCP setup

If you are not using the plugin, wire the server in directly. Add to ~/.claude.json under mcpServers:

{
  "screen": {
    "command": "python3",
    "args": ["/path/to/screen-mcp/server.py"]
  }
}

Configuration

screen-mcp reads optional environment variables. A few common ones:

MCP_SCREEN_GUARD=1 — enable the ack gate. Destructive combos (Ctrl+W, Alt+F4), OCR-matched destructive keywords, and out-of-allowlist actions block unless the caller passes ack=<reason>.
MCP_SCREEN_APPS="firefox,terminal" — with the guard on, restrict actions to this allowlist of focused apps.
MCP_SCREEN_AMBIENT=0 — disable the ambient SENSE hint block.
MCP_SCREEN_CPU_THREADS=6 — ONNX intra-op thread count for OmniParser.
MCP_SCREEN_MAX_EDGE=2576 — screenshot downscale target (long edge).

Variable	Effect
`MCP_SCREEN_GUARD=1`	Enable the reliability ack gate (destructive combos / keywords / out-of-allowlist actions block until `ack`).
`MCP_SCREEN_APPS`	With guard on, allowlist of focused apps.
`MCP_SCREEN_AUDIT_FRAMES=1`	Add pre/post frame hash + `changed_bbox` to each audit line (~100-500ms per action).
`MCP_SCREEN_AMBIENT=0`	Disable the ambient `SENSE` hint block.
`MCP_SCREEN_GUARD_PX=40`	Threshold for the user-takeover guard (live pointer vs last-commanded).
`MCP_SCREEN_CPU_THREADS=6`	ONNX intra-op thread count for OmniParser.
`MCP_SCREEN_MAX_EDGE=2576`	Screenshot downscale target (long edge).
`MCP_SCREEN_NO_FRESH=1`	Disable forced fresh-frame capture on static monitors.
`MCP_SCREEN_FOCUS_SETTLE_MS=150`	Delay after `screen_focus` activates a window.
`MCP_SCREEN_NO_NUDGE=1`	Disable the pointer damage-nudge that primes a static monitor's frame.

Path	What
`~/.config/mcp-screen/token`	Portal restore token (one-time consent).
`~/.local/share/mcp-screen/world/map.db`	World-model cache (per-screen learned elements).
`~/.local/share/mcp-screen/sessions/<sid>/`	Recorder trajectories, frames, `replay.html`.
`~/.local/state/mcp-screen/actions.jsonl`	Reliability audit log (one JSON line per action).
`/tmp/screen_err.txt`	Last unhandled tool traceback (dev diagnostic).

Development

pytest -q          # tests run without a live D-Bus (conftest stubs)

Edit a .py, then call screen_reload in the running session to re-exec the server in place while preserving the MCP connection. On any tool exception the dispatcher writes the full traceback to /tmp/screen_err.txt; read it when debugging crashes.

Contributing

Issues and pull requests are welcome. Please run pytest -q before opening a PR. See CLAUDE.md for architecture notes and the hard-won ops details behind the capture and input layers.

License

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/88plug/screen-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server