Skip to main content
Glama
upnorthmedia

Screenshot MCP

by upnorthmedia

capture_screenshot

Capture full-page website screenshots with device presets, viewport customization, and wait conditions for reliable webpage documentation.

Instructions

Capture a full-page screenshot of a webpage with advanced options

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL of the webpage to screenshot
viewportNoViewport configuration
waitForNoWait condition before taking screenshot
standardDelayNoWhether to apply standard 2.5s delay after networkidle2 for better stability
delayNoAdditional delay in milliseconds before taking screenshot
waitUntilNoWhen to consider navigation completenetworkidle2

Implementation Reference

  • Core handler function that executes the screenshot capture using Puppeteer for browser automation, Sharp for image processing and resizing if necessary, handles viewport emulation, various wait conditions, and returns base64 PNG data with metadata.
    async captureScreenshot(url, options = {}) {
      if (this.activeScreenshots >= this.options.maxConcurrent) {
        throw createError('Too many concurrent screenshot requests', 'RATE_LIMIT_EXCEEDED');
      }
    
      this.activeScreenshots++;
    
      try {
        const validatedUrl = validateUrl(url);
        await this.initialize();
    
        const page = await this.browser.newPage();
        
        try {
          // Configure viewport
          await this.configureViewport(page, options.viewport);
          
          // Set timeouts
          page.setDefaultTimeout(this.options.timeout);
          page.setDefaultNavigationTimeout(this.options.timeout);
    
          // Navigate to URL
          await page.goto(validatedUrl, {
            waitUntil: options.waitUntil || 'networkidle2',
            timeout: this.options.timeout
          });
    
          // Wait for specific conditions if provided
          if (options.waitFor) {
            await this.waitForCondition(page, options.waitFor);
          }
    
          // Standard delay after networkidle2 for better stability
          const standardDelay = options.standardDelay !== false ? 2500 : 0;
          if (standardDelay > 0) {
            await new Promise(resolve => setTimeout(resolve, standardDelay));
          }
    
          // Additional delay if specified
          if (options.delay) {
            await new Promise(resolve => setTimeout(resolve, options.delay));
          }
    
          // Capture screenshot with dimension validation
          const screenshotOptions = {
            type: 'png',
            fullPage: true,
            encoding: 'base64',
            ...options.screenshotOptions
          };
    
          // * Take the screenshot
          const screenshotBase64 = await page.screenshot(screenshotOptions);
    
          // * Decode base64 to buffer for sharp processing
          let screenshotBuffer = Buffer.from(screenshotBase64, 'base64');
          let metadata;
          try {
            metadata = await sharp(screenshotBuffer).metadata();
          } catch (err) {
            throw createError('Failed to read screenshot metadata', 'IMAGE_METADATA_ERROR', { originalError: err.message });
          }
    
          // * Check if resizing is needed
          const maxDimension = 8000;
          if (metadata.width > maxDimension || metadata.height > maxDimension) {
            // * Calculate scale factor to fit within 8000x8000
            const scale = Math.min(maxDimension / metadata.width, maxDimension / metadata.height);
            const newWidth = Math.floor(metadata.width * scale);
            const newHeight = Math.floor(metadata.height * scale);
            try {
              screenshotBuffer = await sharp(screenshotBuffer)
                .resize({ width: newWidth, height: newHeight })
                .png()
                .toBuffer();
            } catch (err) {
              throw createError('Failed to resize screenshot image', 'IMAGE_RESIZE_ERROR', { originalError: err.message });
            }
          }
    
          // * Encode back to base64
          const finalBase64 = screenshotBuffer.toString('base64');
    
          return {
            success: true,
            data: finalBase64,
            metadata: {
              url: validatedUrl,
              timestamp: new Date().toISOString(),
              viewport: await page.viewport(),
              title: await page.title(),
              imageWidth: metadata.width,
              imageHeight: metadata.height
            }
          };
    
        } finally {
          await page.close();
        }
    
      } catch (error) {
        throw createError(
          `Screenshot capture failed: ${error.message}`,
          'CAPTURE_FAILED',
          { url, originalError: error.message }
        );
      } finally {
        this.activeScreenshots--;
      }
    }
  • Input schema for the capture_screenshot tool, defining parameters like url (required), viewport options with device presets, wait conditions, delays, and validation rules.
    inputSchema: {
      type: 'object',
      properties: {
        url: {
          type: 'string',
          description: 'The URL of the webpage to screenshot'
        },
        viewport: {
          type: 'object',
          properties: {
            preset: {
              type: 'string',
              enum: Object.keys(DEVICE_PRESETS),
              description: 'Device preset (mobile, tablet, desktop)'
            },
            width: {
              type: 'number',
              minimum: 100,
              maximum: 5000,
              description: 'Viewport width in pixels'
            },
            height: {
              type: 'number',
              minimum: 100,
              maximum: 5000,
              description: 'Viewport height in pixels'
            },
            deviceScaleFactor: {
              type: 'number',
              minimum: 0.1,
              maximum: 3,
              description: 'Device scale factor'
            },
            isMobile: {
              type: 'boolean',
              description: 'Whether to emulate mobile device'
            },
            hasTouch: {
              type: 'boolean',
              description: 'Whether device has touch support'
            }
          },
          description: 'Viewport configuration'
        },
        waitFor: {
          type: 'object',
          properties: {
            type: {
              type: 'string',
              enum: ['selector', 'function', 'timeout', 'networkidle'],
              description: 'Type of wait condition'
            },
            value: {
              type: 'string',
              description: 'Value for wait condition (selector, function, timeout in ms, or idle time for networkidle)'
            },
            timeout: {
              type: 'number',
              default: 10000,
              description: 'Timeout for wait condition in milliseconds'
            },
            idleTime: {
              type: 'number',
              default: 2000,
              description: 'Network idle time in milliseconds (for networkidle type)'
            }
          },
          description: 'Wait condition before taking screenshot'
        },
        standardDelay: {
          type: 'boolean',
          default: true,
          description: 'Whether to apply standard 2.5s delay after networkidle2 for better stability'
        },
        delay: {
          type: 'number',
          description: 'Additional delay in milliseconds before taking screenshot'
        },
        waitUntil: {
          type: 'string',
          enum: ['load', 'domcontentloaded', 'networkidle0', 'networkidle2'],
          default: 'networkidle2',
          description: 'When to consider navigation complete'
        }
      },
      required: ['url']
  • index.js:30-120 (registration)
    Tool registration in the MCP tools list, defining name, description, and references the input schema. Used by the server's listTools request handler.
    {
      name: 'capture_screenshot',
      description: 'Capture a full-page screenshot of a webpage with advanced options',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL of the webpage to screenshot'
          },
          viewport: {
            type: 'object',
            properties: {
              preset: {
                type: 'string',
                enum: Object.keys(DEVICE_PRESETS),
                description: 'Device preset (mobile, tablet, desktop)'
              },
              width: {
                type: 'number',
                minimum: 100,
                maximum: 5000,
                description: 'Viewport width in pixels'
              },
              height: {
                type: 'number',
                minimum: 100,
                maximum: 5000,
                description: 'Viewport height in pixels'
              },
              deviceScaleFactor: {
                type: 'number',
                minimum: 0.1,
                maximum: 3,
                description: 'Device scale factor'
              },
              isMobile: {
                type: 'boolean',
                description: 'Whether to emulate mobile device'
              },
              hasTouch: {
                type: 'boolean',
                description: 'Whether device has touch support'
              }
            },
            description: 'Viewport configuration'
          },
          waitFor: {
            type: 'object',
            properties: {
              type: {
                type: 'string',
                enum: ['selector', 'function', 'timeout', 'networkidle'],
                description: 'Type of wait condition'
              },
              value: {
                type: 'string',
                description: 'Value for wait condition (selector, function, timeout in ms, or idle time for networkidle)'
              },
              timeout: {
                type: 'number',
                default: 10000,
                description: 'Timeout for wait condition in milliseconds'
              },
              idleTime: {
                type: 'number',
                default: 2000,
                description: 'Network idle time in milliseconds (for networkidle type)'
              }
            },
            description: 'Wait condition before taking screenshot'
          },
          standardDelay: {
            type: 'boolean',
            default: true,
            description: 'Whether to apply standard 2.5s delay after networkidle2 for better stability'
          },
          delay: {
            type: 'number',
            description: 'Additional delay in milliseconds before taking screenshot'
          },
          waitUntil: {
            type: 'string',
            enum: ['load', 'domcontentloaded', 'networkidle0', 'networkidle2'],
            default: 'networkidle2',
            description: 'When to consider navigation complete'
          }
        },
        required: ['url']
      }
    },
  • Dispatch handler in MCP callTool request handler that invokes the captureScreenshot method with parsed arguments and formats the MCP response with text, image, and metadata content blocks.
    case 'capture_screenshot':
      const result = await screenshotCapture.captureScreenshot(args.url, {
        viewport: args.viewport,
        waitFor: args.waitFor,
        delay: args.delay,
        waitUntil: args.waitUntil,
        standardDelay: args.standardDelay
      });
    
      return {
        content: [
          {
            type: 'text',
            text: `Screenshot captured successfully from ${args.url}`
          },
          {
            type: 'image',
            data: result.data,
            mimeType: 'image/png'
          },
          {
            type: 'text',
            text: `Metadata: ${JSON.stringify(result.metadata, null, 2)}`
          }
        ]
      };
  • Initializes the ScreenshotCapture instance used by the capture_screenshot tool handler, configured via environment variables for headless mode, timeouts, and concurrency limits.
    const screenshotCapture = new ScreenshotCapture({
      headless: process.env.BROWSER_HEADLESS !== 'false',
      timeout: parseInt(process.env.BROWSER_TIMEOUT) || 30000,
      maxConcurrent: parseInt(process.env.MAX_CONCURRENT_SCREENSHOTS) || 5
    });
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral insight. It mentions 'advanced options' but doesn't explain what these entail (e.g., viewport configuration, wait conditions, delays), nor does it cover performance implications, error handling, or output format. This leaves significant gaps for an agent to understand tool behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's appropriately sized and front-loaded with the core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 6 parameters, nested objects, no annotations, and no output schema, the description is inadequate. It doesn't explain the 'advanced options', behavioral traits like performance or errors, or what the tool returns (e.g., image format, size). The agent lacks sufficient context to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing detailed documentation for all parameters. The description adds no parameter-specific information beyond the generic 'advanced options' reference, which doesn't clarify individual parameters. Baseline 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('capture a full-page screenshot') and resource ('webpage'), with the qualifier 'with advanced options' hinting at additional capabilities. However, it doesn't explicitly differentiate from sibling tools like 'capture_element' which likely captures specific elements rather than full pages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided about when to use this tool versus alternatives like 'capture_element' or 'list_device_presets'. The description mentions 'advanced options' but doesn't specify scenarios where these are beneficial or when simpler alternatives might suffice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/upnorthmedia/ScreenshotMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server