Screen Agent
Allows AI agents to view and interact with the Figma desktop application through screen capture and UI automation, including mouse clicks, keyboard input, and scrolling, when the app is added to the allowlist.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Screen Agenthelp me debug this error I'm seeing on screen"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Screen Agent
Give AI coding tools eyes and hands.
An MCP server that lets Claude Code, Cursor, and other AI tools see your screen and interact with your desktop.
Why?
AI coding assistants are powerful but blind — they can edit files and run commands, but they can't see what's on your screen. Screen Agent fixes that by providing screen capture and desktop interaction as MCP tools.
You: "The form in the browser has a bug — can you see it?"
Claude: [captures screen] I see the registration form. The email
validation shows an error even though the format is correct.
The regex pattern in validators.ts is too restrictive...Install
pip install screen-agentQuick Start
Use with Claude Code
Add to your MCP config (
~/.claude/mcp.jsonor.mcp.json):
{
"mcpServers": {
"screen": {
"command": "screen-agent",
"args": ["serve"]
}
}
}Restart Claude Code. That's it — Claude can now see your screen.
Use as Python library
import asyncio
from screen_agent import capture_screen, mouse_click, keyboard_type
async def main():
screenshot = await capture_screen()
print(f"Captured {screenshot['width']}x{screenshot['height']}px")
await mouse_click(400, 300)
await keyboard_type("Hello from screen-agent!")
asyncio.run(main())Tools
Tool | Description |
| Screenshot the full screen or a region |
| Click at screen coordinates |
| Type text at cursor position |
| Press key / key combo (e.g. Cmd+C) |
| Scroll up or down |
| Move cursor |
| Click and drag |
| Get cursor coordinates |
| List visible windows |
| Focus a window by title |
| Get active window info |
Optional: OCR Plugin
pip install screen-agent[ocr]Adds two more tools:
Tool | Description |
| Extract all screen text with positions |
| Find text on screen and get coordinates |
Safety: Input Guardian
Screen Agent is designed with user-first safety:
User always has priority. The moment you touch your keyboard or mouse, the agent pauses instantly. It only resumes after you've been idle for 1.5 seconds (configurable). The agent never fights you for control.
App allowlist. The agent must declare which apps it needs access to. It can only interact with apps on the list. Need to work across Chrome and Figma? Just add both.
Claude: [calls add_app("Chrome")]
[calls add_app("Figma")]
I can now operate in Chrome and Figma.
[clicks in Chrome] ← allowed
[clicks in Figma] ← allowed
[clicks in Slack] ← rejected, not on the list
User: *moves mouse*
Claude: [paused — waiting for user to finish]
...user stops...
Claude: [resumes after 1.5s idle] Continuing where I left off.Safety Tool | Description |
| Add an app to the allowed list (e.g. "Chrome", "Figma") |
| Remove an app from the allowed list |
| Restrict to a pixel region on screen |
| Remove all restrictions |
| Check guardian state, user activity, allowed apps |
Platform Support
Screenshot | Input Control | Window Management | |
macOS | mss | pyautogui | AppleScript |
Linux | mss | pyautogui | wmctrl |
Windows | mss | pyautogui | Planned |
macOS Permissions
Screen Agent needs two permissions on macOS:
Screen Recording — for screenshots
Accessibility — for keyboard/mouse control
Grant them in: System Settings → Privacy & Security
Architecture
┌──────────────────────────────────────────────┐
│ MCP Client (Claude Code / Cursor / etc.) │
└──────────────┬───────────────────────────────┘
│ MCP Protocol (stdio/SSE)
▼
┌──────────────────────────────────────────────┐
│ Screen Agent MCP Server │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Input Guardian (pynput) │ │
│ │ • Monitors keyboard + mouse globally │ │
│ │ • User active? → PAUSE all actions │ │
│ │ • Scope lock → reject out-of-bounds │ │
│ └────────────────────────────────────────┘ │
│ │ clearance granted │
│ ▼ │
│ capture.py ─ mss (cross-platform) │
│ input.py ─ pyautogui │
│ window.py ─ AppleScript / wmctrl │
│ plugins/ ─ OCR, CV (optional) │
└──────────────────────────────────────────────┘Configuration
Transport modes
# stdio (default) — for Claude Code and most MCP clients
screen-agent serve
# SSE — for HTTP-based clients
screen-agent serve --transport sse --port 8765System check
screen-agent checkVerifies all dependencies and platform permissions.
Development
git clone https://github.com/chriswu727/screen-agent.git
cd screen-agent
pip install -e ".[dev]"
pytestLicense
MIT
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/chriswu727/screen-agent'
If you have feedback or need assistance with the MCP directory API, please join our Discord server