Skip to main content
Glama

Screen Agent

Give AI coding tools eyes and hands.

An MCP server that lets Claude Code, Cursor, and other AI tools see your screen and interact with your desktop.

Why?

AI coding assistants are powerful but blind — they can edit files and run commands, but they can't see what's on your screen. Screen Agent fixes that by providing screen capture and desktop interaction as MCP tools.

You: "The form in the browser has a bug — can you see it?"
Claude: [captures screen] I see the registration form. The email
        validation shows an error even though the format is correct.
        The regex pattern in validators.ts is too restrictive...

Install

pip install screen-agent

Quick Start

Use with Claude Code

  1. Add to your MCP config (~/.claude/mcp.json or .mcp.json):

{
  "mcpServers": {
    "screen": {
      "command": "screen-agent",
      "args": ["serve"]
    }
  }
}
  1. Restart Claude Code. That's it — Claude can now see your screen.

Use as Python library

import asyncio
from screen_agent import capture_screen, mouse_click, keyboard_type

async def main():
    screenshot = await capture_screen()
    print(f"Captured {screenshot['width']}x{screenshot['height']}px")

    await mouse_click(400, 300)
    await keyboard_type("Hello from screen-agent!")

asyncio.run(main())

Tools

Tool

Description

capture_screen

Screenshot the full screen or a region

click

Click at screen coordinates

type_text

Type text at cursor position

press_key

Press key / key combo (e.g. Cmd+C)

scroll

Scroll up or down

move_mouse

Move cursor

drag

Click and drag

get_cursor_position

Get cursor coordinates

list_windows

List visible windows

focus_window

Focus a window by title

get_active_window

Get active window info

Optional: OCR Plugin

pip install screen-agent[ocr]

Adds two more tools:

Tool

Description

ocr

Extract all screen text with positions

find_text

Find text on screen and get coordinates

Safety: Input Guardian

Screen Agent is designed with user-first safety:

User always has priority. The moment you touch your keyboard or mouse, the agent pauses instantly. It only resumes after you've been idle for 1.5 seconds (configurable). The agent never fights you for control.

App allowlist. The agent must declare which apps it needs access to. It can only interact with apps on the list. Need to work across Chrome and Figma? Just add both.

Claude: [calls add_app("Chrome")]
        [calls add_app("Figma")]
        I can now operate in Chrome and Figma.

        [clicks in Chrome]      ← allowed
        [clicks in Figma]       ← allowed
        [clicks in Slack]       ← rejected, not on the list

User:   *moves mouse*
Claude: [paused — waiting for user to finish]
        ...user stops...
Claude: [resumes after 1.5s idle] Continuing where I left off.

Safety Tool

Description

add_app

Add an app to the allowed list (e.g. "Chrome", "Figma")

remove_app

Remove an app from the allowed list

set_region

Restrict to a pixel region on screen

clear_scope

Remove all restrictions

get_agent_status

Check guardian state, user activity, allowed apps

Platform Support

Screenshot

Input Control

Window Management

macOS

mss

pyautogui

AppleScript

Linux

mss

pyautogui

wmctrl

Windows

mss

pyautogui

Planned

macOS Permissions

Screen Agent needs two permissions on macOS:

  • Screen Recording — for screenshots

  • Accessibility — for keyboard/mouse control

Grant them in: System Settings → Privacy & Security

Architecture

┌──────────────────────────────────────────────┐
│  MCP Client (Claude Code / Cursor / etc.)    │
└──────────────┬───────────────────────────────┘
               │  MCP Protocol (stdio/SSE)
               ▼
┌──────────────────────────────────────────────┐
│  Screen Agent MCP Server                     │
│                                              │
│  ┌────────────────────────────────────────┐  │
│  │  Input Guardian (pynput)               │  │
│  │  • Monitors keyboard + mouse globally  │  │
│  │  • User active? → PAUSE all actions    │  │
│  │  • Scope lock → reject out-of-bounds   │  │
│  └────────────────────────────────────────┘  │
│       │ clearance granted                    │
│       ▼                                      │
│  capture.py  ─  mss (cross-platform)         │
│  input.py    ─  pyautogui                    │
│  window.py   ─  AppleScript / wmctrl         │
│  plugins/    ─  OCR, CV (optional)           │
└──────────────────────────────────────────────┘

Configuration

Transport modes

# stdio (default) — for Claude Code and most MCP clients
screen-agent serve

# SSE — for HTTP-based clients
screen-agent serve --transport sse --port 8765

System check

screen-agent check

Verifies all dependencies and platform permissions.

Development

git clone https://github.com/chriswu727/screen-agent.git
cd screen-agent
pip install -e ".[dev]"
pytest

License

MIT

-
security - not tested
A
license - permissive license
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/chriswu727/screen-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server