requirements.md•4.64 kB
# MCP Screenshot Server Requirements
## Overview
An MCP server that captures screenshots and returns them as image content to LLMs for visual analysis and troubleshooting.
## Core Features
### 1. Full Screen Capture
- **Tool**: `screenshot_full`
- **Description**: Capture entire desktop/screen
- **Parameters**:
- `quality_mode` (optional): "overview", "readable", "detail" (default "overview")
- `enhance_text` (optional): Apply sharpening/contrast for text readability (default true)
- `format` (optional): Image format ("png", "jpeg", default "png")
- **Returns**: Image content directly to LLM context
### 2. Active Window Capture
- **Tool**: `screenshot_active_window`
- **Description**: Capture currently focused/active window
- **Parameters**:
- `quality_mode` (optional): "overview", "readable", "detail" (default "readable")
- `enhance_text` (optional): Apply sharpening/contrast for text readability (default true)
- `format` (optional): Image format ("png", "jpeg", default "png")
- **Returns**: Image content of active window
### 3. Window List & Selective Capture
- **Tool**: `list_windows`
- **Description**: Get list of all open windows with IDs/titles
- **Returns**: Structured data with window information
```json
{
"windows": [
{"id": 123, "title": "VS Code", "app": "Code", "bounds": [x, y, w, h]},
{"id": 456, "title": "Chrome", "app": "Google Chrome", "bounds": [x, y, w, h]}
]
}
```
- **Tool**: `screenshot_window`
- **Description**: Capture specific window by ID
- **Parameters**:
- `window_id`: Window identifier from list_windows
- `quality_mode` (optional): "overview", "readable", "detail" (default "readable")
- `enhance_text` (optional): Apply sharpening/contrast for text readability (default true)
- `format` (optional): Image format ("png", "jpeg", default "png")
### 4. Region/Area Capture
- **Tool**: `screenshot_region`
- **Description**: Capture specific rectangular area of screen
- **Parameters**:
- `x`: Left coordinate
- `y`: Top coordinate
- `width`: Region width
- `height`: Region height
- `quality_mode` (optional): "overview", "readable", "detail" (default "detail")
- `enhance_text` (optional): Apply sharpening/contrast for text readability (default true)
- `format` (optional): Image format ("png", "jpeg", default "png")
## Technical Implementation
### Platform Support
- **Primary**: macOS (using `screencapture`, `osascript`)
- **Secondary**: Windows (using `pyautogui` or Windows API)
- **Tertiary**: Linux (using `scrot`, `gnome-screenshot`)
### Image Handling
- Use PIL/Pillow for image processing and scaling
- Return images as MCP `ImageContent` type with base64 encoding
- Implement smart scaling to keep images under reasonable size limits
- Default scale factors chosen to balance quality vs. token usage
- **Text Enhancement**: Apply sharpening filters and contrast enhancement for better OCR/readability
- **Adaptive Scaling**: Detect high-DPI displays and adjust scaling accordingly
- **Quality Modes**:
- `overview` (0.3-0.5x): For layout/general UI structure
- `readable` (0.7-1.0x): For text-heavy content that needs to be legible
- `detail` (1.0x): For pixel-perfect analysis
### Dependencies
```
fastmcp>=2.10.0
pillow>=10.0.0
pyautogui>=0.9.54 # Cross-platform fallback
opencv-python>=4.8.0 # For advanced image enhancement
numpy>=1.24.0 # For image processing arrays
```
### Error Handling
- Handle permission errors (screen recording permissions on macOS)
- Validate coordinates and window IDs
- Graceful fallbacks between platform-specific methods
## Use Cases
### Agent Troubleshooting
- "Take a screenshot of the error dialog"
- "Show me what's currently on screen"
- "Capture the browser window with the issue"
### UI Analysis
- "Screenshot the settings panel so I can see the options"
- "Capture this specific area where the bug appears"
- "Show me all open windows so I can identify the problem"
### Documentation & Support
- Visual context for debugging sessions
- Automated issue diagnosis based on UI state
- Screen state verification during automated processes
## Security Considerations
- Screen recording permissions required on macOS
- Sensitive information in screenshots (passwords, personal data)
- Consider adding blur/redaction options for sensitive areas
- Rate limiting to prevent screenshot spam
## Future Enhancements
- OCR integration for text extraction from screenshots
- Annotation tools (arrows, highlights, text overlay)
- Screenshot comparison tools
- Automatic sensitive data detection/blurring
- Multi-monitor support with monitor selection