README.md•7.21 kB
# MCP Screenshot Server
A FastMCP server that captures screenshots and returns them as images for visual analysis by LLMs. Perfect for UI debugging, troubleshooting, and visual inspection of applications.
## ⚠️ SECURITY WARNING
**This MCP server can capture and record your screen content.** It has access to everything visible on your display, including sensitive information like passwords, private messages, financial data, and confidential documents.
**Recommendations:**
- Only enable this MCP server when you specifically need screenshot functionality
- Disable it when working with sensitive information
- Be aware that any connected LLM can request screenshots of your current screen
- Consider using window-specific capture instead of full-screen capture when possible
**You have been warned.** Use responsibly.
## Features
- **🤖 Enhanced Smart Capture**: Natural language queries like "what am I watching on YouTube" with auto-zoom
- **🧠 Smart Capture**: Automatically finds the most relevant window (browsers, media, dev tools)
- **🎯 Context-Aware**: Provide hints like "youtube", "code", "browser" for targeted capture
- **📱 Active Window Capture**: Automatically capture the focused window
- **📋 Window Management**: List, activate, and capture specific windows by ID
- **📐 Region Capture**: Capture specific rectangular areas
- **🔍 Text Enhancement**: OpenCV-powered image processing for better text readability
- **⚡ Quality Modes**: Optimize for overview, readable text, or pixel-perfect detail
- **🔧 Permission Checking**: Verify macOS permissions and available functionality
- **💻 Cross-Platform**: macOS (native), Windows/Linux (fallback)
- **🚫 Ultra-Wide Friendly**: Avoids massive full-screen captures by default
## Installation
```bash
# Clone and setup
git clone https://github.com/yourusername/mcp-screenshot.git
cd mcp-screenshot
uv venv --python 3.12
uv add fastmcp pillow pyautogui opencv-python numpy
# Run the server
uv run python server.py
```
## MCP Configuration
### Local/stdio Configuration
```json
{
"mcpServers": {
"screenshot": {
"type": "stdio",
"command": "uv",
"args": ["run", "--directory", "/path/to/mcp-screenshot", "python", "server.py"]
}
}
}
```
**Configuration Locations:**
- **VS Code**: `.vscode/mcp.json` (project) or user settings
- **Claude Desktop**: `claude_desktop_config.json`
- **Cursor**: `.cursor/mcp.json` (project) or `~/.cursor/mcp.json` (user)
## Tools
### `screenshot_smart_enhanced` 🤖 **NEW PRIMARY TOOL**
Enhanced smart screenshot with natural language understanding and auto-zoom.
- `query`: Natural language query like "what am I watching on YouTube" or "show me my code"
- `auto_zoom`: Automatically capture focused regions if initial screenshot is unclear
- `quality_mode`: "overview" (0.4x), "readable" (0.8x), "detail" (1.0x)
- `enhance_text`: Apply sharpening/contrast for text readability
- `format`: "png" or "jpeg"
**Enhanced Logic:**
1. Parses natural language intent (media consumption, development, communication)
2. Intelligently finds matching windows based on user query
3. Auto-activates target window for clean capture
4. Auto-zooms into interesting content regions if needed
5. Provides contextual response messages
### `screenshot_smart` ⭐ **STANDARD SMART TOOL**
Intelligently finds and captures the most relevant window automatically.
- `context`: Optional hint ("youtube", "browser", "code", "slack", etc.)
- Same quality/enhancement options as enhanced version
**Smart Logic:**
1. Looks for windows matching context hint
2. Falls back to active window if relevant
3. Prioritizes browsers, media apps, development tools
4. Avoids tiny/system windows
5. Last resort: center region (not full screen!)
### `screenshot_full` ⚠️ **AVOID ON ULTRA-WIDE**
Capture entire desktop/screen (use sparingly).
- Same parameters as smart capture
- Default: `quality_mode="overview"` to handle large screens
### `screenshot_active_window`
Capture currently focused window.
- Same parameters as `screenshot_full`
- Default: `quality_mode="readable"`
### `screenshot_region`
Capture specific rectangular area.
- `x`, `y`: Top-left coordinates
- `width`, `height`: Region dimensions
- Same quality/enhancement options
- Default: `quality_mode="detail"`
### `check_permissions` 🔧 **NEW DIAGNOSTIC TOOL**
Check what macOS permissions are available and what functionality works.
Returns detailed permission status and available features.
### `list_windows`
Get list of all open windows with IDs and bounds.
Returns structured data with window information for use with `screenshot_window`.
### `activate_window` 🔧 **NEW UTILITY**
Activate/focus a specific window by bringing it to front.
- `window_id`: Window identifier from `list_windows`
### `screenshot_window`
Capture specific window by ID from `list_windows`.
- `window_id`: Window identifier
- Same quality/enhancement options
- Automatically activates window first for clean capture
## Quality Modes
- **overview** (0.4x scale): Quick layout checks, fits ultra-wide screens
- **readable** (0.8x scale): Text stays legible, good for debugging
- **detail** (1.0x scale): Full resolution for pixel-perfect analysis
## Text Enhancement
Automatic image processing for better LLM readability:
- OpenCV sharpening filters
- Contrast enhancement
- Unsharp masking
- Optimized for text recognition
## Platform Support
- **macOS**: Native `screencapture` + AppleScript for window management
- **Windows/Linux**: PyAutoGUI fallback (basic functionality)
## Permissions
**macOS Permissions Required:**
1. **Screen Recording**: System Preferences > Security & Privacy > Privacy > Screen Recording
- Add Terminal, Cursor, or whatever app runs the MCP server
- Required for: All screenshot capture functionality
2. **Accessibility**: System Preferences > Security & Privacy > Privacy > Accessibility
- Add Terminal, Cursor, or whatever app runs the MCP server
- Required for: Window listing (`list_windows`) and selective window capture
- **Note**: Smart capture works without this (uses active window detection)
**Without Accessibility permissions:**
- ✅ `screenshot_smart` works (uses active window)
- ✅ `screenshot_active_window` works
- ✅ `screenshot_full` and `screenshot_region` work
- ❌ `list_windows` returns empty (can't enumerate all windows)
- ❌ `screenshot_window` won't work (needs window list)
## Examples
```python
# Enhanced smart capture (recommended):
"What am I watching on YouTube?"
"Show me what I'm working on"
"What am I listening to?"
"Show me my conversation"
# Standard smart capture:
"Take a smart screenshot of what I'm watching on YouTube"
"Show me what's in my browser"
"Capture my development environment"
"Take a smart screenshot with context 'slack'"
# Utility tools:
"Check my macOS permissions"
"Show me all open windows"
"Activate window ID 5"
"Screenshot window ID 3 in detail mode"
"Capture region at coordinates 100,100 with size 800x600"
"Take full screen screenshot in overview mode" # Last resort
```
The server returns images directly to the LLM context for immediate visual analysis.