screenshot
Capture macOS screen images in full screen, specific regions, or windows. Returns base64-encoded images with dimension metadata for documentation or automation tasks.
Instructions
Capture a screenshot of the macOS screen. Supports full screen, a rectangular region, or a specific window by title. Returns a base64-encoded image with dimension metadata. Do not narrate visual observations or coordinate calculations. Brief task progress updates are acceptable.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| mode | Yes | Capture mode: "full" (entire screen), "region" (rectangular area), or "window" (specific window) | full |
| x | No | Left edge x-coordinate in screen pixels (may be negative for secondary displays; required when mode is region) | |
| y | No | Top edge y-coordinate in screen pixels (may be negative for secondary displays; required when mode is region) | |
| width | No | Region width in screen pixels (required when mode is region) | |
| height | No | Region height in screen pixels (required when mode is region) | |
| window_title | No | Window title to capture (required when mode is window) | |
| max_dimension | Yes | Maximum width or height of the returned image. 0 means no resize (default). When set, must be 256–4096. | |
| format | Yes | Output image format: "png" (default) or "jpeg" | png |
| ruler | Yes | When true, overlay coordinate rulers on the top and left edges of the screenshot. Tick labels show screen coordinates for precise positioning. |
Implementation Reference
- src/tools/screenshot.ts:130-193 (handler)The handleScreenshot function executes the tool logic, including input validation, capturing the screen via `captureScreen`, and formatting the resulting image/metadata for the MCP response.
async function handleScreenshot( args: Record<string, unknown>, ): Promise<CallToolResult> { const parsed = ScreenshotInputSchema.parse(args); try { const result = await captureScreen({ mode: parsed.mode, region: parsed.mode === "region" ? { x: parsed.x!, y: parsed.y!, w: parsed.width!, h: parsed.height! } : undefined, windowTitle: parsed.mode === "window" ? parsed.window_title : undefined, maxDimension: parsed.max_dimension, format: parsed.format, ruler: parsed.ruler, }); const mimeType = parsed.format === "jpeg" ? "image/jpeg" : "image/png"; // Build coordinate mapping hint for agents const isIdentity = result.scaleX === 1 && result.scaleY === 1 && result.originX === 0 && result.originY === 0; const coordinateHint = isIdentity ? "Coordinate mapping: screen = image pixel (1:1, no conversion needed)" : `Coordinate mapping: screen_x = ${result.originX} + image_x * ${result.scaleX}, screen_y = ${result.originY} + image_y * ${result.scaleY}`; return { content: [ { type: "image" as const, data: result.base64, mimeType, }, { type: "text" as const, text: `Image: ${result.width}x${result.height}\n${coordinateHint}`, }, ], }; } catch (error: unknown) { const message = error instanceof Error ? error.message : String(error); // Detect permission-related failures and include setup instructions const isPermissionError = /permission/i.test(message) || /not permitted/i.test(message) || /screen recording/i.test(message) || /cannot be opened/i.test(message); const text = isPermissionError ? `Screenshot failed: ${message}\n\n${PERMISSION_INSTRUCTIONS}` : `Screenshot failed: ${message}`; return { content: [{ type: "text" as const, text }], isError: true, }; } } - src/tools/screenshot.ts:88-110 (schema)The ScreenshotInputSchema uses Zod for runtime input validation and includes cross-field refinements to ensure mandatory parameters for different screenshot modes (e.g., 'region' vs 'window').
/** Full runtime validation schema with cross-field refinements. */ const ScreenshotInputSchema = ScreenshotBaseSchema.refine( (data) => { if (data.mode === "region") { return ( data.x !== undefined && data.y !== undefined && data.width !== undefined && data.height !== undefined ); } return true; }, { message: 'x, y, width, and height are required when mode is "region"' }, ).refine( (data) => { if (data.mode === "window") { return data.window_title !== undefined; } return true; }, { message: 'window_title is required when mode is "window"' }, ); - src/tools/screenshot.ts:114-125 (registration)The screenshot tool definition exported as part of screenshotToolDefinitions, used to register the tool in the MCP server.
export const screenshotToolDefinitions: Tool[] = [ { name: "screenshot", description: "Capture a screenshot of the macOS screen. Supports full screen, a rectangular region, or a specific window by title. Returns a base64-encoded image with dimension metadata. Do not narrate visual observations or coordinate calculations. Brief task progress updates are acceptable.", inputSchema: zodToToolInputSchema(ScreenshotBaseSchema), annotations: { readOnlyHint: true, destructiveHint: false, }, }, ]; - src/tools/screenshot.ts:198-203 (registration)The screenshotToolHandlers object maps the 'screenshot' tool name to its handler function, wrapping it in an enqueue function to manage execution flow.
export const screenshotToolHandlers: Record< string, (args: Record<string, unknown>) => Promise<CallToolResult> > = { screenshot: (args) => enqueue(() => handleScreenshot(args)), };