Skip to main content
Glama

πŸ“Έ MCP ACS Screenshot Server

NPM Version GitHub Release License: MIT Node.js Version Docker Pulls

Give AI agents visual superpowers to see, analyze, and document your applications like senior UX designers.

This enterprise-grade MCP server transforms AI from code-only assistants into visual experts capable of UI analysis, accessibility auditing, documentation generation, and responsive design testing.

πŸ”— Repository

This package is now maintained in its own repository: https://github.com/Digital-Defiance/mcp-screenshot

This repository is part of the AI Capabilitites Suite on GitHub.

πŸ€” Why Do AI Agents Need Visual Capabilities?

AI agents today are powerful but visually blind:

  • ❌ Can read HTML/CSS but can't see actual layouts

  • ❌ Can suggest UI improvements without seeing the real user experience

  • ❌ Can't detect accessibility issues like poor contrast or spacing

  • ❌ Can't create visual documentation or bug reports

  • ❌ Can't analyze responsive design across different screen sizes

Result: You're stuck manually creating screenshots, documentation, and visual analysis that AI should handle.


🎯 Revolutionary Use Cases

πŸ“š "AI, create professional documentation"

You: "Document the checkout process with screenshots" AI: *Captures each step, analyzes UI flow* AI: "Generated complete user guide with 6 annotated screenshots, highlighting key interactions and error states..."

πŸ” "AI, audit this page for accessibility"

You: "Check if this form meets WCAG guidelines" AI: *Captures form, analyzes visual elements* AI: "Found 4 accessibility issues: contrast ratio 2.1:1 (needs 4.5:1), missing focus indicators, touch targets too small..."

πŸ› "AI, create a detailed bug report"

You: "The mobile navigation is broken" AI: *Captures different viewport sizes* AI: "Created bug report with before/after screenshots, exact breakpoint where it fails, and CSS fix suggestions..."

🎨 "AI, compare these design variations"

You: "Which hero section design works better?" AI: *Captures both versions, analyzes visual hierarchy* AI: "Version B has 28% better visual flowβ€”CTA more prominent, text hierarchy clearer, better use of whitespace..."

πŸ“± "AI, test responsive design"

You: "How does this look on different screen sizes?" AI: *Captures multiple viewport sizes* AI: "Layout breaks at 768pxβ€”sidebar overlaps content. Here's the media query fix with visual proof..."

✨ What This Changes

Before: AI worked blind, relying on code descriptions

  • ❌ "The button looks wrong" β†’ AI guesses the issue

  • ❌ "Create documentation" β†’ AI writes generic text

  • ❌ "Check accessibility" β†’ AI only reviews code

  • ❌ "Test responsive design" β†’ AI can't see actual breakpoints

After: AI sees and analyzes your actual user interface

  • βœ… Visual debugging - AI identifies exact pixel-level issues

  • βœ… Smart documentation - AI creates guides with real screenshots and annotations

  • βœ… Accessibility audits - AI measures actual contrast ratios and spacing

  • βœ… Responsive testing - AI captures and compares different screen sizes

  • βœ… Design analysis - AI evaluates visual hierarchy and user experience

  • βœ… Professional reports - AI creates detailed visual evidence for bugs and improvements


πŸš€ Features

  • Multi-format Support: PNG, JPEG, WebP, BMP with configurable quality

  • Flexible Capture: Full screen, specific windows, or custom regions

  • Privacy Protection: PII masking with OCR-based detection for emails, phone numbers, and credit cards

  • Security Controls: Path validation, rate limiting, audit logging, and configurable policies

  • Cross-platform: Linux (X11/Wayland), macOS, Windows with native APIs

  • Multi-monitor Support: Capture from specific displays in multi-monitor setups

  • Enterprise Security: Window exclusion, audit logging, rate limiting

  • AI-Optimized: Structured responses perfect for AI agent workflows

Installation

NPM Installation

npm install @ai-capabilities-suite/mcp-screenshot

System Requirements

Linux:

  • X11: imagemagick package (provides import command)

  • Wayland: grim package

# Ubuntu/Debian sudo apt-get install imagemagick grim # Fedora sudo dnf install ImageMagick grim # Arch sudo pacman -S imagemagick grim

macOS:

  • Built-in screencapture command (no additional dependencies)

  • Screen Recording permission required (System Preferences > Security & Privacy > Privacy > Screen Recording)

Windows:

  • No additional dependencies required

MCP Configuration

Add to your MCP settings file (e.g., ~/.kiro/settings/mcp.json or .kiro/settings/mcp.json):

{ "mcpServers": { "screenshot": { "command": "node", "args": ["/path/to/mcp-screenshot/dist/cli.js"], "env": { "SCREENSHOT_ALLOWED_DIRS": "/home/user/screenshots,/tmp", "SCREENSHOT_MAX_CAPTURES_PER_MIN": "60", "SCREENSHOT_ENABLE_AUDIT_LOG": "true" } } } }

πŸ› οΈ 5 Professional MCP Tools

Purpose-built for AI agents to capture, analyze, and work with visual information:

The server exposes 5 comprehensive MCP tools that enable AI agents to see and understand your applications:

1. screenshot_capture_full

Capture full screen or specific display.

Parameters:

  • display (string, optional): Display ID to capture (defaults to primary display)

  • format (string, optional): Image format - png, jpeg, webp, or bmp (default: png)

  • quality (number, optional): Compression quality 1-100 for lossy formats (default: 90)

  • savePath (string, optional): File path to save screenshot (returns base64 if not provided)

  • enablePIIMasking (boolean, optional): Enable PII detection and masking (default: false)

Example:

{ "name": "screenshot_capture_full", "arguments": { "format": "png", "savePath": "/home/user/screenshots/desktop.png", "enablePIIMasking": true } }

Response:

{ "status": "success", "filePath": "/home/user/screenshots/desktop.png", "metadata": { "width": 1920, "height": 1080, "format": "png", "fileSize": 245678, "timestamp": "2024-12-01T10:30:00.000Z", "display": { "id": "0", "name": "Primary Display", "resolution": { "width": 1920, "height": 1080 }, "position": { "x": 0, "y": 0 }, "isPrimary": true }, "piiMasking": { "emailsRedacted": 2, "phonesRedacted": 1, "creditCardsRedacted": 0, "customPatternsRedacted": 0 } } }

2. screenshot_capture_window

Capture specific application window by ID or title pattern.

Parameters:

  • windowId (string, optional): Window identifier (use windowId or windowTitle)

  • windowTitle (string, optional): Window title pattern to match (use windowId or windowTitle)

  • includeFrame (boolean, optional): Include window frame and title bar (default: false)

  • format (string, optional): Image format (default: png)

  • quality (number, optional): Compression quality 1-100 (default: 90)

  • savePath (string, optional): File path to save screenshot

Example:

{ "name": "screenshot_capture_window", "arguments": { "windowTitle": "Chrome", "includeFrame": false, "format": "jpeg", "quality": 85 } }

Response:

{ "status": "success", "data": "iVBORw0KGgoAAAANSUhEUgAA...", "mimeType": "image/jpeg", "metadata": { "width": 1280, "height": 720, "format": "jpeg", "fileSize": 89234, "timestamp": "2024-12-01T10:31:00.000Z", "window": { "id": "12345", "title": "Google Chrome", "processName": "chrome", "pid": 5678, "bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 } } } }

3. screenshot_capture_region

Capture specific rectangular region of the screen.

Parameters:

  • x (number, required): X coordinate of top-left corner

  • y (number, required): Y coordinate of top-left corner

  • width (number, required): Width of region in pixels

  • height (number, required): Height of region in pixels

  • format (string, optional): Image format (default: png)

  • quality (number, optional): Compression quality 1-100 (default: 90)

  • savePath (string, optional): File path to save screenshot

Example:

{ "name": "screenshot_capture_region", "arguments": { "x": 100, "y": 100, "width": 800, "height": 600, "format": "png" } }

Response:

{ "status": "success", "data": "iVBORw0KGgoAAAANSUhEUgAA...", "mimeType": "image/png", "metadata": { "width": 800, "height": 600, "format": "png", "fileSize": 123456, "timestamp": "2024-12-01T10:32:00.000Z", "region": { "x": 100, "y": 100, "width": 800, "height": 600 } } }

4. screenshot_list_displays

List all connected displays with resolution and position information.

Parameters: None

Example:

{ "name": "screenshot_list_displays", "arguments": {} }

Response:

{ "status": "success", "displays": [ { "id": "0", "name": "Primary Display", "resolution": { "width": 1920, "height": 1080 }, "position": { "x": 0, "y": 0 }, "isPrimary": true }, { "id": "1", "name": "Secondary Display", "resolution": { "width": 1920, "height": 1080 }, "position": { "x": 1920, "y": 0 }, "isPrimary": false } ] }

5. screenshot_list_windows

List all visible windows with title, process, and position information.

Parameters: None

Example:

{ "name": "screenshot_list_windows", "arguments": {} }

Response:

{ "status": "success", "windows": [ { "id": "12345", "title": "Google Chrome", "processName": "chrome", "pid": 5678, "bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 }, "isMinimized": false }, { "id": "67890", "title": "Terminal", "processName": "gnome-terminal", "pid": 9012, "bounds": { "x": 200, "y": 200, "width": 800, "height": 600 }, "isMinimized": false } ] }

Security Configuration

The server enforces security policies to control screenshot operations. Configure via environment variables or security policy file.

Environment Variables

  • SCREENSHOT_ALLOWED_DIRS: Comma-separated list of allowed directories for saving screenshots

  • SCREENSHOT_MAX_CAPTURES_PER_MIN: Maximum captures per minute (default: 60)

  • SCREENSHOT_ENABLE_AUDIT_LOG: Enable audit logging (default: true)

  • SCREENSHOT_BLOCKED_WINDOWS: Comma-separated list of window title patterns to exclude

Security Policy File

Create a security-policy.json file:

{ "allowedDirectories": ["/home/user/screenshots", "/tmp/screenshots"], "blockedWindowPatterns": [ ".*Password.*", ".*1Password.*", ".*LastPass.*", ".*Bitwarden.*", ".*Authentication.*" ], "maxCapturesPerMinute": 60, "enableAuditLog": true }

Load the policy when starting the server:

import { MCPScreenshotServer } from "@ai-capabilities-suite/mcp-screenshot"; import * as fs from "fs"; const policy = JSON.parse(fs.readFileSync("security-policy.json", "utf-8")); const server = new MCPScreenshotServer(policy); await server.start();

Error Handling

All tools return structured error responses with error codes and remediation suggestions.

Error Codes

Code

Description

Remediation

PERMISSION_DENIED

Insufficient permissions to capture

Grant Screen Recording permission (macOS) or check user permissions

INVALID_PATH

File path outside allowed directories

Use a path within configured allowed directories

WINDOW_NOT_FOUND

Specified window does not exist

Use

screenshot_list_windows

to find available windows

DISPLAY_NOT_FOUND

Specified display does not exist

Use

screenshot_list_displays

to find available displays

UNSUPPORTED_FORMAT

Requested format not supported

Use png, jpeg, webp, or bmp

CAPTURE_FAILED

Screenshot capture failed

Check permissions and try again

RATE_LIMIT_EXCEEDED

Too many captures in time window

Wait before making additional requests

INVALID_REGION

Invalid region coordinates or dimensions

Ensure coordinates are non-negative and dimensions are positive

OUT_OF_MEMORY

Insufficient memory for operation

Reduce capture size or close other applications

ENCODING_FAILED

Image encoding failed

Try different format or reduce quality

FILE_SYSTEM_ERROR

File system operation failed

Check permissions and disk space

Error Response Format

{ "status": "error", "error": { "code": "WINDOW_NOT_FOUND", "message": "Window with ID '12345' not found", "details": { "windowId": "12345" }, "remediation": "Verify the window exists and is visible. Use screenshot_list_windows to see available windows." } }

Troubleshooting

Linux Issues

Problem: import: command not found or grim: command not found

Solution: Install required packages:

# X11 sudo apt-get install imagemagick # Wayland sudo apt-get install grim

Problem: Black screen or empty captures

Solution: Check display server environment variables:

echo $DISPLAY # Should show :0 or similar for X11 echo $WAYLAND_DISPLAY # Should show wayland-0 or similar for Wayland

macOS Issues

Problem: PERMISSION_DENIED error

Solution: Grant Screen Recording permission:

  1. Open System Preferences > Security & Privacy > Privacy

  2. Select "Screen Recording" from the list

  3. Add your terminal application or Node.js to the allowed list

  4. Restart the application

Problem: Retina display captures are double resolution

Solution: This is expected behavior. Retina displays have 2x pixel density. Use the width and height from metadata to determine actual dimensions.

Windows Issues

Problem: Capture fails with access denied

Solution: Run the application with administrator privileges or check Windows Defender settings.

Problem: Multi-monitor captures show wrong display

Solution: Use screenshot_list_displays to get correct display IDs and positions.

General Issues

Problem: RATE_LIMIT_EXCEEDED error

Solution: The server limits captures to prevent abuse. Wait 60 seconds or adjust maxCapturesPerMinute in security policy.

Problem: INVALID_PATH error when saving

Solution: Ensure the save path is within allowed directories configured in security policy.

Problem: PII masking not working

Solution:

  • Ensure tesseract.js is properly installed

  • Check that eng.traineddata language file is available

  • PII masking requires OCR which may be slow on large images

Problem: Large file sizes

Solution:

  • Use JPEG format with lower quality (60-80) for smaller files

  • Use WebP format for best compression

  • Reduce capture region size if possible

Problem: Out of memory errors

Solution:

  • Capture smaller regions instead of full screen

  • Reduce quality settings

  • Close other applications to free memory

  • Use streaming for very large captures

Programmatic Usage

TypeScript/JavaScript

import { MCPScreenshotServer } from "@ai-capabilities-suite/mcp-screenshot"; // Create server with custom security policy const server = new MCPScreenshotServer({ allowedDirectories: ["/home/user/screenshots"], maxCapturesPerMinute: 30, enableAuditLog: true, blockedWindowPatterns: [".*Password.*"], }); // Start server await server.start(); // Server will handle MCP protocol requests via stdio // Keep process running process.on("SIGINT", async () => { await server.stop(); process.exit(0); });

Direct Capture Engine Usage

import { createCaptureEngine } from "@ai-capabilities-suite/mcp-screenshot"; // Create platform-specific capture engine const engine = createCaptureEngine(); // Capture full screen const fullScreen = await engine.captureScreen(); // List and capture windows const windows = await engine.getWindows(); const window = windows.find((w) => w.title.includes("Chrome")); if (window) { const buffer = await engine.captureWindow(window.id, false); } // Capture region const region = await engine.captureRegion(100, 100, 800, 600); // List displays const displays = await engine.getDisplays(); console.log(`Found ${displays.length} displays`);

Development

This package is part of the AI Capabilities Suite monorepo.

Build

npm run build

Test

# Run all tests npm test # Run specific test suites npm test -- capture npm test -- security npm test -- property # Run with coverage npm test -- --coverage

Project Structure

packages/mcp-screenshot/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ capture/ # Platform-specific capture engines β”‚ β”œβ”€β”€ processing/ # Image processing and encoding β”‚ β”œβ”€β”€ privacy/ # PII detection and masking β”‚ β”œβ”€β”€ security/ # Security policy enforcement β”‚ β”œβ”€β”€ storage/ # File operations β”‚ β”œβ”€β”€ tools/ # MCP tool implementations β”‚ β”œβ”€β”€ interfaces/ # TypeScript interfaces β”‚ β”œβ”€β”€ types/ # Type definitions β”‚ β”œβ”€β”€ errors/ # Error classes β”‚ β”œβ”€β”€ server.ts # MCP server implementation β”‚ └── cli.ts # CLI entry point β”œβ”€β”€ README.md β”œβ”€β”€ TESTING.md └── package.json

Contributing

Contributions are welcome! Please ensure:

  • All tests pass (npm test)

  • Code follows TypeScript best practices

  • New features include tests and documentation

  • Security considerations are addressed

License

MIT

Support

For issues and questions:

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Digital-Defiance/mcp-screenshot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server