Skip to main content
Glama

MCP Screenshot Server

by batteryshark
requirements.md4.64 kB
# MCP Screenshot Server Requirements ## Overview An MCP server that captures screenshots and returns them as image content to LLMs for visual analysis and troubleshooting. ## Core Features ### 1. Full Screen Capture - **Tool**: `screenshot_full` - **Description**: Capture entire desktop/screen - **Parameters**: - `quality_mode` (optional): "overview", "readable", "detail" (default "overview") - `enhance_text` (optional): Apply sharpening/contrast for text readability (default true) - `format` (optional): Image format ("png", "jpeg", default "png") - **Returns**: Image content directly to LLM context ### 2. Active Window Capture - **Tool**: `screenshot_active_window` - **Description**: Capture currently focused/active window - **Parameters**: - `quality_mode` (optional): "overview", "readable", "detail" (default "readable") - `enhance_text` (optional): Apply sharpening/contrast for text readability (default true) - `format` (optional): Image format ("png", "jpeg", default "png") - **Returns**: Image content of active window ### 3. Window List & Selective Capture - **Tool**: `list_windows` - **Description**: Get list of all open windows with IDs/titles - **Returns**: Structured data with window information ```json { "windows": [ {"id": 123, "title": "VS Code", "app": "Code", "bounds": [x, y, w, h]}, {"id": 456, "title": "Chrome", "app": "Google Chrome", "bounds": [x, y, w, h]} ] } ``` - **Tool**: `screenshot_window` - **Description**: Capture specific window by ID - **Parameters**: - `window_id`: Window identifier from list_windows - `quality_mode` (optional): "overview", "readable", "detail" (default "readable") - `enhance_text` (optional): Apply sharpening/contrast for text readability (default true) - `format` (optional): Image format ("png", "jpeg", default "png") ### 4. Region/Area Capture - **Tool**: `screenshot_region` - **Description**: Capture specific rectangular area of screen - **Parameters**: - `x`: Left coordinate - `y`: Top coordinate - `width`: Region width - `height`: Region height - `quality_mode` (optional): "overview", "readable", "detail" (default "detail") - `enhance_text` (optional): Apply sharpening/contrast for text readability (default true) - `format` (optional): Image format ("png", "jpeg", default "png") ## Technical Implementation ### Platform Support - **Primary**: macOS (using `screencapture`, `osascript`) - **Secondary**: Windows (using `pyautogui` or Windows API) - **Tertiary**: Linux (using `scrot`, `gnome-screenshot`) ### Image Handling - Use PIL/Pillow for image processing and scaling - Return images as MCP `ImageContent` type with base64 encoding - Implement smart scaling to keep images under reasonable size limits - Default scale factors chosen to balance quality vs. token usage - **Text Enhancement**: Apply sharpening filters and contrast enhancement for better OCR/readability - **Adaptive Scaling**: Detect high-DPI displays and adjust scaling accordingly - **Quality Modes**: - `overview` (0.3-0.5x): For layout/general UI structure - `readable` (0.7-1.0x): For text-heavy content that needs to be legible - `detail` (1.0x): For pixel-perfect analysis ### Dependencies ``` fastmcp>=2.10.0 pillow>=10.0.0 pyautogui>=0.9.54 # Cross-platform fallback opencv-python>=4.8.0 # For advanced image enhancement numpy>=1.24.0 # For image processing arrays ``` ### Error Handling - Handle permission errors (screen recording permissions on macOS) - Validate coordinates and window IDs - Graceful fallbacks between platform-specific methods ## Use Cases ### Agent Troubleshooting - "Take a screenshot of the error dialog" - "Show me what's currently on screen" - "Capture the browser window with the issue" ### UI Analysis - "Screenshot the settings panel so I can see the options" - "Capture this specific area where the bug appears" - "Show me all open windows so I can identify the problem" ### Documentation & Support - Visual context for debugging sessions - Automated issue diagnosis based on UI state - Screen state verification during automated processes ## Security Considerations - Screen recording permissions required on macOS - Sensitive information in screenshots (passwords, personal data) - Consider adding blur/redaction options for sensitive areas - Rate limiting to prevent screenshot spam ## Future Enhancements - OCR integration for text extraction from screenshots - Annotation tools (arrows, highlights, text overlay) - Screenshot comparison tools - Automatic sensitive data detection/blurring - Multi-monitor support with monitor selection

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/batteryshark/mcp-screenshot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server