screen-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@screen-mcpcapture the screen and annotate clickable elements"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
screen-mcp
An MCP server that gives a model eyes and hands on a Linux Wayland desktop. Screenshots via PipeWire, pointer/keyboard via the RemoteDesktop portal, OCR + icon detection via RapidOCR + an OmniParser ONNX, an ambient sense layer that diffs frames so the agent knows when something opened / nothing changed, a write-through world-model cache so a recognised screen skips OCR, and an opt-in ack gate that blocks close-combos / destructive-keyword clicks until the caller passes a confirmation token.
Current version: 1.3.2.
Requirements
Linux + Wayland + GNOME (the awareness layer uses a bundled GNOME Shell extension; AT-SPI is the fallback for GTK apps).
Python 3.10+ (tested on 3.14).
GStreamer >= 1.28 (uses
leaky-type; the olderdrop=was removed in 1.28). PipeWire +xdg-desktop-portal-gnome.wl-clipboard(for the Unicode paste path inscreen_type).A DejaVu Sans Bold font (Set-of-Marks labels; falls back to PIL's default).
Related MCP server: desk-mcp
Install
System packages first — see requirements.txt for the full
pacman / apt one-liners.
# Arch
sudo pacman -S python-gobject gobject-introspection \
gstreamer gst-plugins-base gst-plugins-good gst-libav \
pipewire pipewire-pulse xdg-desktop-portal-gnome \
wl-clipboard ttf-dejavu
# Python deps
pip install -r requirements.txtInstall the GNOME Shell extension (optional but recommended — gives the awareness layer reliable focused-window + window-list data):
gnome-shell-extension/window-info@local/install.sh
# then enable via gnome-extensions enable window-info@localWire it into Claude Code
Add to ~/.claude.json under mcpServers:
{
"mcp-screen": {
"command": "python3",
"args": ["/path/to/mcp-screen/server.py"]
}
}The first run triggers an xdg-desktop-portal consent dialog (pick which
monitor(s) to share). The portal returns a restore token which is persisted to
~/.config/mcp-screen/token — subsequent runs are silent.
Tools
Name | What it does |
| Capture the desktop. |
| Monitors (origin/size/scale), desktop bounds, focused windows. |
| Move pointer to |
| Click at |
| Wheel scroll. |
| Press-drag from |
| Press a key/combo: |
| Type text (Unicode via |
| Raise + give KEYBOARD focus to a window ( |
| Batched ordered actions in one call. |
| Visit several UI states and get a labeled thumbnail of each. |
| Auto-scroll a scrollable view in one call; accumulates every interactable. |
| Block until the screen settles, then optionally screenshot. |
| Recorder: |
| Hot-reload the server in place after edits (no |
| Health dump: session/geo, cursor, grounding backends, world-model stats. |
Every action takes space: 'view' \| 'desktop' \| 'norm' (default view — coords
as seen in the last screenshot), shot: true to return a screenshot after,
verify: true to warn on no-screen-change misclicks, force: true to bypass
the user-takeover guard, and element: <id> to click an element id returned by
the last annotate=true shot (server resolves exact coords; no guessing).
Environment variables
Variable | Effect |
| Enable the reliability ack gate. Destructive combos ( |
| With guard on, restrict actions to this allowlist of focused apps. |
| Add pre/post frame hash + |
| Disable the ambient |
| Threshold for the user-takeover guard (live pointer vs last-commanded). |
| ONNX intra-op thread count for OmniParser. |
| Screenshot downscale target (long edge). |
| Disable forced fresh-frame capture on static monitors (screenshots may then return the keepalive-resent stale frame). |
| Delay after |
| Disable the pointer damage-nudge used to prime/refresh a static monitor's frame. |
Data paths
Path | What |
| Portal restore token (one-time consent). |
| World-model SQLite cache (per-screen learned elements). |
| Recorder trajectories + WebP frames + |
| Reliability audit log (one JSON line per action). |
| Last unhandled tool traceback (dev-diagnostic only). |
Dev workflow
pytest -q # 78 tests, ~0.7s, no live D-Bus needed (conftest stubs)Edit a .py, then in the running Claude Code session:
screen_reload # re-execs the server in place (preserves the MCP connection)On any tool exception the dispatcher writes the full traceback to
/tmp/screen_err.txt (the JSON-RPC error only carries the message); read it
when debugging crashes.
Ops notes (hard-won — read before touching capture/input)
Fractional scaling —
NotifyPointerMotionAbsolutecoords are logical and local to each stream (keyed bynode_id). Don't add a global logical origin; the portal clamps with "Invalid position". Seeinput.global_to_logical.Cursor position —
cursor_mode=METADATA(4)means the cursor is NOT baked into frames. PipeWire attaches aSPA_META_Cursorto its src pad, butvideoconvertstrips it and PyGObject can't downcast it —capture.pyreads it via actypespad-probe with x86-64 offsets. We composite a marker back into plain screenshots so the pointer stays visible.User-takeover guard —
input.guard_usercompares the live pointer to where WE last commanded it; >MCP_SCREEN_GUARD_PXpx drift ⇒ caller took the mouse ⇒ STOP. Passforce=trueto bypass / take control back. Fails open if the cursor can't be read.Unicode typing — the portal keysym path drops non-ASCII;
input.type_textauto-pastes any non-ASCII string viawl-copy+ Ctrl+V, with afinallyrestoring the prior clipboard (orwl-copy --clearif it couldn't be saved) so sensitive text never outlives the call. Falls back to ASCII-only keysyms ifwl-clipboardis absent. xdotool / XTEST can NOT reach native-Wayland apps.Modifier+letter combos —
input.keylowercases single-letter trailing parts when modifiers are present, so"Ctrl+A"is select-all, not Ctrl+Shift+a (capital-A is the X11 keysym for shifted A). Standalonekey("A")keeps its case for legacy text-input behavior.GPU is hard-disabled (
CUDA_VISIBLE_DEVICES=""at server top); grounding is CPU-only by design — predictable latency, no driver flake.
Install as a Claude Code plugin
screen-mcp ships as a Claude Code plugin that bundles the MCP server and a
drive-screen skill (the locate → ground → act → confirm loop).
/plugin marketplace add 88plug/screen-mcp
/plugin install screen-mcp@screen-mcpOne-time setup after install (the server has system + Python deps the manifest can't install for you):
# in the installed plugin dir (or a clone)
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
# system packages (Arch/Manjaro names; use your distro equivalents):
# gstreamer>=1.28, pipewire, python-gobject, xdg-desktop-portal-gnome, wl-clipboardRequirements: Linux + Wayland + GNOME. First run pops an xdg-desktop-portal
RemoteDesktop + ScreenCast consent dialog (token cached at ~/.config/mcp-screen).
Optional: /dev/uinput (group input) for the kernel input backend, and the
bundled GNOME-Shell extension for full window awareness (one-time Wayland re-login).
The launcher (bin/screen-mcp) fails with a clear message if the deps are missing,
so a misconfigured install never silently half-works.
License
FSL-1.1-ALv2 © 2026 88plug — Functional Source License; converts to Apache 2.0 two years after each release.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/88plug/screen-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server