Skip to main content
Glama

screenshot

Capture macOS screen images in full screen, specific regions, or windows. Returns base64-encoded images with dimension metadata for documentation or automation tasks.

Instructions

Capture a screenshot of the macOS screen. Supports full screen, a rectangular region, or a specific window by title. Returns a base64-encoded image with dimension metadata. Do not narrate visual observations or coordinate calculations. Brief task progress updates are acceptable.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modeYesCapture mode: "full" (entire screen), "region" (rectangular area), or "window" (specific window)full
xNoLeft edge x-coordinate in screen pixels (may be negative for secondary displays; required when mode is region)
yNoTop edge y-coordinate in screen pixels (may be negative for secondary displays; required when mode is region)
widthNoRegion width in screen pixels (required when mode is region)
heightNoRegion height in screen pixels (required when mode is region)
window_titleNoWindow title to capture (required when mode is window)
max_dimensionYesMaximum width or height of the returned image. 0 means no resize (default). When set, must be 256–4096.
formatYesOutput image format: "png" (default) or "jpeg"png
rulerYesWhen true, overlay coordinate rulers on the top and left edges of the screenshot. Tick labels show screen coordinates for precise positioning.

Implementation Reference

  • The handleScreenshot function executes the tool logic, including input validation, capturing the screen via `captureScreen`, and formatting the resulting image/metadata for the MCP response.
    async function handleScreenshot(
      args: Record<string, unknown>,
    ): Promise<CallToolResult> {
      const parsed = ScreenshotInputSchema.parse(args);
    
      try {
        const result = await captureScreen({
          mode: parsed.mode,
          region:
            parsed.mode === "region"
              ? { x: parsed.x!, y: parsed.y!, w: parsed.width!, h: parsed.height! }
              : undefined,
          windowTitle: parsed.mode === "window" ? parsed.window_title : undefined,
          maxDimension: parsed.max_dimension,
          format: parsed.format,
          ruler: parsed.ruler,
        });
    
        const mimeType = parsed.format === "jpeg" ? "image/jpeg" : "image/png";
    
        // Build coordinate mapping hint for agents
        const isIdentity =
          result.scaleX === 1 &&
          result.scaleY === 1 &&
          result.originX === 0 &&
          result.originY === 0;
    
        const coordinateHint = isIdentity
          ? "Coordinate mapping: screen = image pixel (1:1, no conversion needed)"
          : `Coordinate mapping: screen_x = ${result.originX} + image_x * ${result.scaleX}, screen_y = ${result.originY} + image_y * ${result.scaleY}`;
    
        return {
          content: [
            {
              type: "image" as const,
              data: result.base64,
              mimeType,
            },
            {
              type: "text" as const,
              text: `Image: ${result.width}x${result.height}\n${coordinateHint}`,
            },
          ],
        };
      } catch (error: unknown) {
        const message = error instanceof Error ? error.message : String(error);
    
        // Detect permission-related failures and include setup instructions
        const isPermissionError =
          /permission/i.test(message) ||
          /not permitted/i.test(message) ||
          /screen recording/i.test(message) ||
          /cannot be opened/i.test(message);
    
        const text = isPermissionError
          ? `Screenshot failed: ${message}\n\n${PERMISSION_INSTRUCTIONS}`
          : `Screenshot failed: ${message}`;
    
        return {
          content: [{ type: "text" as const, text }],
          isError: true,
        };
      }
    }
  • The ScreenshotInputSchema uses Zod for runtime input validation and includes cross-field refinements to ensure mandatory parameters for different screenshot modes (e.g., 'region' vs 'window').
    /** Full runtime validation schema with cross-field refinements. */
    const ScreenshotInputSchema = ScreenshotBaseSchema.refine(
      (data) => {
        if (data.mode === "region") {
          return (
            data.x !== undefined &&
            data.y !== undefined &&
            data.width !== undefined &&
            data.height !== undefined
          );
        }
        return true;
      },
      { message: 'x, y, width, and height are required when mode is "region"' },
    ).refine(
      (data) => {
        if (data.mode === "window") {
          return data.window_title !== undefined;
        }
        return true;
      },
      { message: 'window_title is required when mode is "window"' },
    );
  • The screenshot tool definition exported as part of screenshotToolDefinitions, used to register the tool in the MCP server.
    export const screenshotToolDefinitions: Tool[] = [
      {
        name: "screenshot",
        description:
          "Capture a screenshot of the macOS screen. Supports full screen, a rectangular region, or a specific window by title. Returns a base64-encoded image with dimension metadata. Do not narrate visual observations or coordinate calculations. Brief task progress updates are acceptable.",
        inputSchema: zodToToolInputSchema(ScreenshotBaseSchema),
        annotations: {
          readOnlyHint: true,
          destructiveHint: false,
        },
      },
    ];
  • The screenshotToolHandlers object maps the 'screenshot' tool name to its handler function, wrapping it in an enqueue function to manage execution flow.
    export const screenshotToolHandlers: Record<
      string,
      (args: Record<string, unknown>) => Promise<CallToolResult>
    > = {
      screenshot: (args) => enqueue(() => handleScreenshot(args)),
    };

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/antbotlab/mac-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server