Skip to main content
Glama

MCP Screenshot Server

by batteryshark
README.md7.21 kB
# MCP Screenshot Server A FastMCP server that captures screenshots and returns them as images for visual analysis by LLMs. Perfect for UI debugging, troubleshooting, and visual inspection of applications. ## ⚠️ SECURITY WARNING **This MCP server can capture and record your screen content.** It has access to everything visible on your display, including sensitive information like passwords, private messages, financial data, and confidential documents. **Recommendations:** - Only enable this MCP server when you specifically need screenshot functionality - Disable it when working with sensitive information - Be aware that any connected LLM can request screenshots of your current screen - Consider using window-specific capture instead of full-screen capture when possible **You have been warned.** Use responsibly. ## Features - **🤖 Enhanced Smart Capture**: Natural language queries like "what am I watching on YouTube" with auto-zoom - **🧠 Smart Capture**: Automatically finds the most relevant window (browsers, media, dev tools) - **🎯 Context-Aware**: Provide hints like "youtube", "code", "browser" for targeted capture - **📱 Active Window Capture**: Automatically capture the focused window - **📋 Window Management**: List, activate, and capture specific windows by ID - **📐 Region Capture**: Capture specific rectangular areas - **🔍 Text Enhancement**: OpenCV-powered image processing for better text readability - **⚡ Quality Modes**: Optimize for overview, readable text, or pixel-perfect detail - **🔧 Permission Checking**: Verify macOS permissions and available functionality - **💻 Cross-Platform**: macOS (native), Windows/Linux (fallback) - **🚫 Ultra-Wide Friendly**: Avoids massive full-screen captures by default ## Installation ```bash # Clone and setup git clone https://github.com/yourusername/mcp-screenshot.git cd mcp-screenshot uv venv --python 3.12 uv add fastmcp pillow pyautogui opencv-python numpy # Run the server uv run python server.py ``` ## MCP Configuration ### Local/stdio Configuration ```json { "mcpServers": { "screenshot": { "type": "stdio", "command": "uv", "args": ["run", "--directory", "/path/to/mcp-screenshot", "python", "server.py"] } } } ``` **Configuration Locations:** - **VS Code**: `.vscode/mcp.json` (project) or user settings - **Claude Desktop**: `claude_desktop_config.json` - **Cursor**: `.cursor/mcp.json` (project) or `~/.cursor/mcp.json` (user) ## Tools ### `screenshot_smart_enhanced` 🤖 **NEW PRIMARY TOOL** Enhanced smart screenshot with natural language understanding and auto-zoom. - `query`: Natural language query like "what am I watching on YouTube" or "show me my code" - `auto_zoom`: Automatically capture focused regions if initial screenshot is unclear - `quality_mode`: "overview" (0.4x), "readable" (0.8x), "detail" (1.0x) - `enhance_text`: Apply sharpening/contrast for text readability - `format`: "png" or "jpeg" **Enhanced Logic:** 1. Parses natural language intent (media consumption, development, communication) 2. Intelligently finds matching windows based on user query 3. Auto-activates target window for clean capture 4. Auto-zooms into interesting content regions if needed 5. Provides contextual response messages ### `screenshot_smart` ⭐ **STANDARD SMART TOOL** Intelligently finds and captures the most relevant window automatically. - `context`: Optional hint ("youtube", "browser", "code", "slack", etc.) - Same quality/enhancement options as enhanced version **Smart Logic:** 1. Looks for windows matching context hint 2. Falls back to active window if relevant 3. Prioritizes browsers, media apps, development tools 4. Avoids tiny/system windows 5. Last resort: center region (not full screen!) ### `screenshot_full` ⚠️ **AVOID ON ULTRA-WIDE** Capture entire desktop/screen (use sparingly). - Same parameters as smart capture - Default: `quality_mode="overview"` to handle large screens ### `screenshot_active_window` Capture currently focused window. - Same parameters as `screenshot_full` - Default: `quality_mode="readable"` ### `screenshot_region` Capture specific rectangular area. - `x`, `y`: Top-left coordinates - `width`, `height`: Region dimensions - Same quality/enhancement options - Default: `quality_mode="detail"` ### `check_permissions` 🔧 **NEW DIAGNOSTIC TOOL** Check what macOS permissions are available and what functionality works. Returns detailed permission status and available features. ### `list_windows` Get list of all open windows with IDs and bounds. Returns structured data with window information for use with `screenshot_window`. ### `activate_window` 🔧 **NEW UTILITY** Activate/focus a specific window by bringing it to front. - `window_id`: Window identifier from `list_windows` ### `screenshot_window` Capture specific window by ID from `list_windows`. - `window_id`: Window identifier - Same quality/enhancement options - Automatically activates window first for clean capture ## Quality Modes - **overview** (0.4x scale): Quick layout checks, fits ultra-wide screens - **readable** (0.8x scale): Text stays legible, good for debugging - **detail** (1.0x scale): Full resolution for pixel-perfect analysis ## Text Enhancement Automatic image processing for better LLM readability: - OpenCV sharpening filters - Contrast enhancement - Unsharp masking - Optimized for text recognition ## Platform Support - **macOS**: Native `screencapture` + AppleScript for window management - **Windows/Linux**: PyAutoGUI fallback (basic functionality) ## Permissions **macOS Permissions Required:** 1. **Screen Recording**: System Preferences > Security & Privacy > Privacy > Screen Recording - Add Terminal, Cursor, or whatever app runs the MCP server - Required for: All screenshot capture functionality 2. **Accessibility**: System Preferences > Security & Privacy > Privacy > Accessibility - Add Terminal, Cursor, or whatever app runs the MCP server - Required for: Window listing (`list_windows`) and selective window capture - **Note**: Smart capture works without this (uses active window detection) **Without Accessibility permissions:** - ✅ `screenshot_smart` works (uses active window) - ✅ `screenshot_active_window` works - ✅ `screenshot_full` and `screenshot_region` work - ❌ `list_windows` returns empty (can't enumerate all windows) - ❌ `screenshot_window` won't work (needs window list) ## Examples ```python # Enhanced smart capture (recommended): "What am I watching on YouTube?" "Show me what I'm working on" "What am I listening to?" "Show me my conversation" # Standard smart capture: "Take a smart screenshot of what I'm watching on YouTube" "Show me what's in my browser" "Capture my development environment" "Take a smart screenshot with context 'slack'" # Utility tools: "Check my macOS permissions" "Show me all open windows" "Activate window ID 5" "Screenshot window ID 3 in detail mode" "Capture region at coordinates 100,100 with size 800x600" "Take full screen screenshot in overview mode" # Last resort ``` The server returns images directly to the LLM context for immediate visual analysis.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/batteryshark/mcp-screenshot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server