Skip to main content
Glama
README.mdβ€’19.1 kB
# πŸ“Έ MCP Screenshot Server [![NPM Version](https://img.shields.io/npm/v/@ai-capabilities-suite/mcp-screenshot)](https://www.npmjs.com/package/@ai-capabilities-suite/mcp-screenshot) [![GitHub Release](https://img.shields.io/github/v/release/digital-defiance/mcp-screenshot?label=Release&logo=github)](https://github.com/digital-defiance/mcp-screenshot/releases) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Node.js Version](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen)](https://nodejs.org/) [![Docker Pulls](https://img.shields.io/docker/pulls/digidefiance/mcp-screenshot)](https://hub.docker.com/r/digidefiance/mcp-screenshot) **Give AI agents visual superpowers to see, analyze, and document your applications like senior UX designers.** This enterprise-grade MCP server transforms AI from code-only assistants into visual experts capable of UI analysis, accessibility auditing, documentation generation, and responsive design testing. ## πŸ”— Repository This package is now maintained in its own repository: **[https://github.com/Digital-Defiance/mcp-screenshot](https://github.com/Digital-Defiance/mcp-screenshot)** This repository is part of the [AI Capabilitites Suite](https://github.com/Digital-Defiance/ai-capabilitites-suite) on GitHub. ## πŸ€” Why Do AI Agents Need Visual Capabilities? **AI agents today are powerful but visually blind:** - ❌ Can read HTML/CSS but can't see actual layouts - ❌ Can suggest UI improvements without seeing the real user experience - ❌ Can't detect accessibility issues like poor contrast or spacing - ❌ Can't create visual documentation or bug reports - ❌ Can't analyze responsive design across different screen sizes **Result:** You're stuck manually creating screenshots, documentation, and visual analysis that AI should handle. --- ## 🎯 Revolutionary Use Cases ### πŸ“š **"AI, create professional documentation"** ``` You: "Document the checkout process with screenshots" AI: *Captures each step, analyzes UI flow* AI: "Generated complete user guide with 6 annotated screenshots, highlighting key interactions and error states..." ``` ### πŸ” **"AI, audit this page for accessibility"** ``` You: "Check if this form meets WCAG guidelines" AI: *Captures form, analyzes visual elements* AI: "Found 4 accessibility issues: contrast ratio 2.1:1 (needs 4.5:1), missing focus indicators, touch targets too small..." ``` ### πŸ› **"AI, create a detailed bug report"** ``` You: "The mobile navigation is broken" AI: *Captures different viewport sizes* AI: "Created bug report with before/after screenshots, exact breakpoint where it fails, and CSS fix suggestions..." ``` ### 🎨 **"AI, compare these design variations"** ``` You: "Which hero section design works better?" AI: *Captures both versions, analyzes visual hierarchy* AI: "Version B has 28% better visual flowβ€”CTA more prominent, text hierarchy clearer, better use of whitespace..." ``` ### πŸ“± **"AI, test responsive design"** ``` You: "How does this look on different screen sizes?" AI: *Captures multiple viewport sizes* AI: "Layout breaks at 768pxβ€”sidebar overlaps content. Here's the media query fix with visual proof..." ``` --- ## ✨ What This Changes **Before:** AI worked blind, relying on code descriptions - ❌ "The button looks wrong" β†’ AI guesses the issue - ❌ "Create documentation" β†’ AI writes generic text - ❌ "Check accessibility" β†’ AI only reviews code - ❌ "Test responsive design" β†’ AI can't see actual breakpoints **After:** AI sees and analyzes your actual user interface - βœ… **Visual debugging** - AI identifies exact pixel-level issues - βœ… **Smart documentation** - AI creates guides with real screenshots and annotations - βœ… **Accessibility audits** - AI measures actual contrast ratios and spacing - βœ… **Responsive testing** - AI captures and compares different screen sizes - βœ… **Design analysis** - AI evaluates visual hierarchy and user experience - βœ… **Professional reports** - AI creates detailed visual evidence for bugs and improvements --- ## πŸš€ Features - **Multi-format Support**: PNG, JPEG, WebP, BMP with configurable quality - **Flexible Capture**: Full screen, specific windows, or custom regions - **Privacy Protection**: PII masking with OCR-based detection for emails, phone numbers, and credit cards - **Security Controls**: Path validation, rate limiting, audit logging, and configurable policies - **Cross-platform**: Linux (X11/Wayland), macOS, Windows with native APIs - **Multi-monitor Support**: Capture from specific displays in multi-monitor setups - **Enterprise Security**: Window exclusion, audit logging, rate limiting - **AI-Optimized**: Structured responses perfect for AI agent workflows ## Installation ### NPM Installation ```bash npm install @ai-capabilities-suite/mcp-screenshot ``` ### System Requirements **Linux:** - X11: `imagemagick` package (provides `import` command) - Wayland: `grim` package ```bash # Ubuntu/Debian sudo apt-get install imagemagick grim # Fedora sudo dnf install ImageMagick grim # Arch sudo pacman -S imagemagick grim ``` **macOS:** - Built-in `screencapture` command (no additional dependencies) - Screen Recording permission required (System Preferences > Security & Privacy > Privacy > Screen Recording) **Windows:** - No additional dependencies required ### MCP Configuration Add to your MCP settings file (e.g., `~/.kiro/settings/mcp.json` or `.kiro/settings/mcp.json`): ```json { "mcpServers": { "screenshot": { "command": "node", "args": ["/path/to/mcp-screenshot/dist/cli.js"], "env": { "SCREENSHOT_ALLOWED_DIRS": "/home/user/screenshots,/tmp", "SCREENSHOT_MAX_CAPTURES_PER_MIN": "60", "SCREENSHOT_ENABLE_AUDIT_LOG": "true" } } } } ``` ## πŸ› οΈ 5 Professional MCP Tools **Purpose-built for AI agents to capture, analyze, and work with visual information:** The server exposes 5 comprehensive MCP tools that enable AI agents to see and understand your applications: ### 1. screenshot_capture_full Capture full screen or specific display. **Parameters:** - `display` (string, optional): Display ID to capture (defaults to primary display) - `format` (string, optional): Image format - `png`, `jpeg`, `webp`, or `bmp` (default: `png`) - `quality` (number, optional): Compression quality 1-100 for lossy formats (default: 90) - `savePath` (string, optional): File path to save screenshot (returns base64 if not provided) - `enablePIIMasking` (boolean, optional): Enable PII detection and masking (default: false) **Example:** ```json { "name": "screenshot_capture_full", "arguments": { "format": "png", "savePath": "/home/user/screenshots/desktop.png", "enablePIIMasking": true } } ``` **Response:** ```json { "status": "success", "filePath": "/home/user/screenshots/desktop.png", "metadata": { "width": 1920, "height": 1080, "format": "png", "fileSize": 245678, "timestamp": "2024-12-01T10:30:00.000Z", "display": { "id": "0", "name": "Primary Display", "resolution": { "width": 1920, "height": 1080 }, "position": { "x": 0, "y": 0 }, "isPrimary": true }, "piiMasking": { "emailsRedacted": 2, "phonesRedacted": 1, "creditCardsRedacted": 0, "customPatternsRedacted": 0 } } } ``` ### 2. screenshot_capture_window Capture specific application window by ID or title pattern. **Parameters:** - `windowId` (string, optional): Window identifier (use `windowId` or `windowTitle`) - `windowTitle` (string, optional): Window title pattern to match (use `windowId` or `windowTitle`) - `includeFrame` (boolean, optional): Include window frame and title bar (default: false) - `format` (string, optional): Image format (default: `png`) - `quality` (number, optional): Compression quality 1-100 (default: 90) - `savePath` (string, optional): File path to save screenshot **Example:** ```json { "name": "screenshot_capture_window", "arguments": { "windowTitle": "Chrome", "includeFrame": false, "format": "jpeg", "quality": 85 } } ``` **Response:** ```json { "status": "success", "data": "iVBORw0KGgoAAAANSUhEUgAA...", "mimeType": "image/jpeg", "metadata": { "width": 1280, "height": 720, "format": "jpeg", "fileSize": 89234, "timestamp": "2024-12-01T10:31:00.000Z", "window": { "id": "12345", "title": "Google Chrome", "processName": "chrome", "pid": 5678, "bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 } } } } ``` ### 3. screenshot_capture_region Capture specific rectangular region of the screen. **Parameters:** - `x` (number, required): X coordinate of top-left corner - `y` (number, required): Y coordinate of top-left corner - `width` (number, required): Width of region in pixels - `height` (number, required): Height of region in pixels - `format` (string, optional): Image format (default: `png`) - `quality` (number, optional): Compression quality 1-100 (default: 90) - `savePath` (string, optional): File path to save screenshot **Example:** ```json { "name": "screenshot_capture_region", "arguments": { "x": 100, "y": 100, "width": 800, "height": 600, "format": "png" } } ``` **Response:** ```json { "status": "success", "data": "iVBORw0KGgoAAAANSUhEUgAA...", "mimeType": "image/png", "metadata": { "width": 800, "height": 600, "format": "png", "fileSize": 123456, "timestamp": "2024-12-01T10:32:00.000Z", "region": { "x": 100, "y": 100, "width": 800, "height": 600 } } } ``` ### 4. screenshot_list_displays List all connected displays with resolution and position information. **Parameters:** None **Example:** ```json { "name": "screenshot_list_displays", "arguments": {} } ``` **Response:** ```json { "status": "success", "displays": [ { "id": "0", "name": "Primary Display", "resolution": { "width": 1920, "height": 1080 }, "position": { "x": 0, "y": 0 }, "isPrimary": true }, { "id": "1", "name": "Secondary Display", "resolution": { "width": 1920, "height": 1080 }, "position": { "x": 1920, "y": 0 }, "isPrimary": false } ] } ``` ### 5. screenshot_list_windows List all visible windows with title, process, and position information. **Parameters:** None **Example:** ```json { "name": "screenshot_list_windows", "arguments": {} } ``` **Response:** ```json { "status": "success", "windows": [ { "id": "12345", "title": "Google Chrome", "processName": "chrome", "pid": 5678, "bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 }, "isMinimized": false }, { "id": "67890", "title": "Terminal", "processName": "gnome-terminal", "pid": 9012, "bounds": { "x": 200, "y": 200, "width": 800, "height": 600 }, "isMinimized": false } ] } ``` ## Security Configuration The server enforces security policies to control screenshot operations. Configure via environment variables or security policy file. ### Environment Variables - `SCREENSHOT_ALLOWED_DIRS`: Comma-separated list of allowed directories for saving screenshots - `SCREENSHOT_MAX_CAPTURES_PER_MIN`: Maximum captures per minute (default: 60) - `SCREENSHOT_ENABLE_AUDIT_LOG`: Enable audit logging (default: true) - `SCREENSHOT_BLOCKED_WINDOWS`: Comma-separated list of window title patterns to exclude ### Security Policy File Create a `security-policy.json` file: ```json { "allowedDirectories": [ "/home/user/screenshots", "/tmp/screenshots" ], "blockedWindowPatterns": [ ".*Password.*", ".*1Password.*", ".*LastPass.*", ".*Bitwarden.*", ".*Authentication.*" ], "maxCapturesPerMinute": 60, "enableAuditLog": true } ``` Load the policy when starting the server: ```typescript import { MCPScreenshotServer } from '@ai-capabilities-suite/mcp-screenshot'; import * as fs from 'fs'; const policy = JSON.parse(fs.readFileSync('security-policy.json', 'utf-8')); const server = new MCPScreenshotServer(policy); await server.start(); ``` ## Error Handling All tools return structured error responses with error codes and remediation suggestions. ### Error Codes | Code | Description | Remediation | |------|-------------|-------------| | `PERMISSION_DENIED` | Insufficient permissions to capture | Grant Screen Recording permission (macOS) or check user permissions | | `INVALID_PATH` | File path outside allowed directories | Use a path within configured allowed directories | | `WINDOW_NOT_FOUND` | Specified window does not exist | Use `screenshot_list_windows` to find available windows | | `DISPLAY_NOT_FOUND` | Specified display does not exist | Use `screenshot_list_displays` to find available displays | | `UNSUPPORTED_FORMAT` | Requested format not supported | Use png, jpeg, webp, or bmp | | `CAPTURE_FAILED` | Screenshot capture failed | Check permissions and try again | | `RATE_LIMIT_EXCEEDED` | Too many captures in time window | Wait before making additional requests | | `INVALID_REGION` | Invalid region coordinates or dimensions | Ensure coordinates are non-negative and dimensions are positive | | `OUT_OF_MEMORY` | Insufficient memory for operation | Reduce capture size or close other applications | | `ENCODING_FAILED` | Image encoding failed | Try different format or reduce quality | | `FILE_SYSTEM_ERROR` | File system operation failed | Check permissions and disk space | ### Error Response Format ```json { "status": "error", "error": { "code": "WINDOW_NOT_FOUND", "message": "Window with ID '12345' not found", "details": { "windowId": "12345" }, "remediation": "Verify the window exists and is visible. Use screenshot_list_windows to see available windows." } } ``` ## Troubleshooting ### Linux Issues **Problem:** `import: command not found` or `grim: command not found` **Solution:** Install required packages: ```bash # X11 sudo apt-get install imagemagick # Wayland sudo apt-get install grim ``` **Problem:** Black screen or empty captures **Solution:** Check display server environment variables: ```bash echo $DISPLAY # Should show :0 or similar for X11 echo $WAYLAND_DISPLAY # Should show wayland-0 or similar for Wayland ``` ### macOS Issues **Problem:** `PERMISSION_DENIED` error **Solution:** Grant Screen Recording permission: 1. Open System Preferences > Security & Privacy > Privacy 2. Select "Screen Recording" from the list 3. Add your terminal application or Node.js to the allowed list 4. Restart the application **Problem:** Retina display captures are double resolution **Solution:** This is expected behavior. Retina displays have 2x pixel density. Use the `width` and `height` from metadata to determine actual dimensions. ### Windows Issues **Problem:** Capture fails with access denied **Solution:** Run the application with administrator privileges or check Windows Defender settings. **Problem:** Multi-monitor captures show wrong display **Solution:** Use `screenshot_list_displays` to get correct display IDs and positions. ### General Issues **Problem:** `RATE_LIMIT_EXCEEDED` error **Solution:** The server limits captures to prevent abuse. Wait 60 seconds or adjust `maxCapturesPerMinute` in security policy. **Problem:** `INVALID_PATH` error when saving **Solution:** Ensure the save path is within allowed directories configured in security policy. **Problem:** PII masking not working **Solution:** - Ensure tesseract.js is properly installed - Check that `eng.traineddata` language file is available - PII masking requires OCR which may be slow on large images **Problem:** Large file sizes **Solution:** - Use JPEG format with lower quality (60-80) for smaller files - Use WebP format for best compression - Reduce capture region size if possible **Problem:** Out of memory errors **Solution:** - Capture smaller regions instead of full screen - Reduce quality settings - Close other applications to free memory - Use streaming for very large captures ## Programmatic Usage ### TypeScript/JavaScript ```typescript import { MCPScreenshotServer } from '@ai-capabilities-suite/mcp-screenshot'; // Create server with custom security policy const server = new MCPScreenshotServer({ allowedDirectories: ['/home/user/screenshots'], maxCapturesPerMinute: 30, enableAuditLog: true, blockedWindowPatterns: ['.*Password.*'] }); // Start server await server.start(); // Server will handle MCP protocol requests via stdio // Keep process running process.on('SIGINT', async () => { await server.stop(); process.exit(0); }); ``` ### Direct Capture Engine Usage ```typescript import { createCaptureEngine } from '@ai-capabilities-suite/mcp-screenshot'; // Create platform-specific capture engine const engine = createCaptureEngine(); // Capture full screen const fullScreen = await engine.captureScreen(); // List and capture windows const windows = await engine.getWindows(); const window = windows.find(w => w.title.includes('Chrome')); if (window) { const buffer = await engine.captureWindow(window.id, false); } // Capture region const region = await engine.captureRegion(100, 100, 800, 600); // List displays const displays = await engine.getDisplays(); console.log(`Found ${displays.length} displays`); ``` ## Development This package is part of the AI Capabilities Suite monorepo. ### Build ```bash npm run build ``` ### Test ```bash # Run all tests npm test # Run specific test suites npm test -- capture npm test -- security npm test -- property # Run with coverage npm test -- --coverage ``` ### Project Structure ``` packages/mcp-screenshot/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ capture/ # Platform-specific capture engines β”‚ β”œβ”€β”€ processing/ # Image processing and encoding β”‚ β”œβ”€β”€ privacy/ # PII detection and masking β”‚ β”œβ”€β”€ security/ # Security policy enforcement β”‚ β”œβ”€β”€ storage/ # File operations β”‚ β”œβ”€β”€ tools/ # MCP tool implementations β”‚ β”œβ”€β”€ interfaces/ # TypeScript interfaces β”‚ β”œβ”€β”€ types/ # Type definitions β”‚ β”œβ”€β”€ errors/ # Error classes β”‚ β”œβ”€β”€ server.ts # MCP server implementation β”‚ └── cli.ts # CLI entry point β”œβ”€β”€ README.md β”œβ”€β”€ TESTING.md └── package.json ``` ## Contributing Contributions are welcome! Please ensure: - All tests pass (`npm test`) - Code follows TypeScript best practices - New features include tests and documentation - Security considerations are addressed ## License MIT ## Support For issues and questions: - GitHub Issues: [Create an issue](https://github.com/your-org/ai-capabilities-suite/issues) - Documentation: See TESTING.md for testing guide - Security: Report security issues privately to security@example.com

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Digital-Defiance/mcp-screenshot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server