mcp-screenshot

Overview Schema Related Servers Score Discussions

capture

Capture screenshots of specified regions (left, right, or full) with optional OCR, and save them in JSON, markdown, vertical, or horizontal formats to a dated directory in Downloads.

Instructions

Captures a screenshot of the specified region and performs OCR. Options:

region: 'left'/'right'/'full' (default: 'left')
format: 'json'/'markdown'/'vertical'/'horizontal' (default: 'markdown') The screenshot is saved to a dated directory in Downloads.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`format`	No		markdown
`region`	No		left

Implementation Reference

index.ts:242-284 (handler)

Main handler for CallToolRequestSchema implementing the 'capture' tool logic: validates input, captures screenshot using takeScreenshot, performs OCR with performOCR, and returns the result or error.

server.setRequestHandler(CallToolRequestSchema, async (request) => {
	try {
		const { name, arguments: args } = request.params;

		if (name !== "capture") {
			throw new Error(`Unknown tool: ${name}`);
		}

		const parsed = ScreenshotArgsSchema.safeParse(args);
		if (!parsed.success) {
			throw new Error(`Invalid arguments: ${parsed.error}`);
		}

		console.error(
			`Debug: Starting screenshot capture for region: ${parsed.data.region}, format: ${parsed.data.format}`,
		);
		const imagePath = await takeScreenshot(parsed.data.region);
		console.error(`Debug: Screenshot saved to: ${imagePath}`);

		const ocrText = await performOCR(imagePath, parsed.data.format);
		console.error("Debug: OCR completed");

		return {
			content: [
				{
					type: "text",
					text: `Screenshot saved to: ${imagePath}\n\nOCR Results:\n${ocrText}`,
				},
			],
		};
	} catch (error) {
		console.error("Error:", error);
		return {
			content: [
				{
					type: "text",
					text: `Error: ${error instanceof Error ? error.message : String(error)}`,
				},
			],
			isError: true,
		};
	}
});

index.ts:227-240 (registration)

Registration of the 'capture' tool in the ListToolsRequestSchema handler, including name, description, and input schema.

server.setRequestHandler(ListToolsRequestSchema, async () => ({
	tools: [
		{
			name: "capture",
			description:
				"Captures a screenshot of the specified region and performs OCR. " +
				"Options:\n" +
				"- region: 'left'/'right'/'full' (default: 'left')\n" +
				"- format: 'json'/'markdown'/'vertical'/'horizontal' (default: 'markdown')\n" +
				"The screenshot is saved to a dated directory in Downloads.",
			inputSchema: zodToJsonSchema(ScreenshotArgsSchema) as ToolInput,
		},
	],
}));

index.ts:26-31 (schema)

Zod schema defining input parameters for the 'capture' tool: region (left/right/full) and format (json/markdown/vertical/horizontal).

const ScreenshotArgsSchema = z.object({
	region: z.enum(["left", "right", "full"]).default("left"),
	format: z
		.enum(["json", "markdown", "vertical", "horizontal"])
		.default("markdown"),
});

index.ts:87-146 (helper)

Helper function to take screenshot of full screen and crop to left/right region if specified, saves to dated Downloads folder.

async function takeScreenshot(
	region: z.infer<typeof ScreenshotArgsSchema>["region"],
): Promise<string> {
	const dateDir = await ensureDateDirectory();
	const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
	const filename = `screenshot-${region}-${timestamp}.png`;
	const filepath = join(dateDir, filename);

	try {
		// Get main display dimensions
		const { width, height } = await getDisplayDimensions();
		console.error(
			`Debug: Display dimensions - width: ${width}, height: ${height}`,
		);

		// Always capture full screen
		await execFileAsync("screencapture", [filepath]);

		// Process image if needed
		if (region !== "full") {
			const tempFilePath = `${filepath}.temp.png`;
			await sharp(filepath).toFile(tempFilePath);

			const metadata = await sharp(tempFilePath).metadata();
			if (!metadata.width || !metadata.height) {
				throw new Error("Failed to get image dimensions");
			}

			const halfWidth = Math.floor(metadata.width / 2);

			// Extract left or right half
			if (region === "left") {
				await sharp(tempFilePath)
					.extract({
						left: 0,
						top: 0,
						width: halfWidth,
						height: metadata.height,
					})
					.toFile(filepath);
			} else if (region === "right") {
				await sharp(tempFilePath)
					.extract({
						left: halfWidth,
						top: 0,
						width: halfWidth,
						height: metadata.height,
					})
					.toFile(filepath);
			}

			// Remove temporary file
			await execFileAsync("rm", [tempFilePath]);
		}

		return filepath;
	} catch (error) {
		throw new Error(`Screenshot capture failed: ${error}`);
	}
}

index.ts:147-212 (helper)

Helper function for OCR on the screenshot image, first tries API then falls back to Tesseract.js, formats output as specified.

async function performOCR(
	imagePath: string,
	format = "markdown",
): Promise<string> {
	try {
		const formData = new FormData();
		formData.append("file", createReadStream(imagePath), {
			filename: imagePath.split("/").pop(),
		});

		const response = await axios.post(
			`${API_CONFIG.OCR_API_URL}${API_CONFIG.OCR_API_PATH}?format=${format}`,
			formData,
			{
				headers: formData.getHeaders(),
			},
		);

		if (response.status !== 200) {
			throw new Error(`OCR API returned status ${response.status}`);
		}

		// Remove <br> tags
		const content = response.data.content.replace(/<br\s*\/?>/g, "");
		return content;
	} catch (error) {
		console.error("OCR API error, falling back to Tesseract.js:", error);

		try {
			// Configure worker for both Japanese and English recognition
			console.error("OCR: Creating worker for Japanese and English...");
			const worker = await createWorker("jpn+eng");
			console.error("OCR: Starting recognition...");

			const {
				data: { text },
			} = await worker.recognize(imagePath);
			console.error("OCR: Recognition completed");
			await worker.terminate();

			// Format output according to specified format
			let formattedText = text.trim();
			switch (format) {
				case "json":
					formattedText = JSON.stringify({ content: text.trim() });
					break;
				case "markdown":
					formattedText = `\`\`\`\n${text.trim()}\n\`\`\``;
					break;
				case "vertical":
					formattedText = text.trim().split("\n").join("\n\n");
					break;
				case "horizontal":
					formattedText = text.trim().replace(/\n/g, " ");
					break;
			}

			return formattedText;
		} catch (tesseractError) {
			console.error("Tesseract.js error details:", tesseractError);
			throw new Error(
				`Both OCR API and Tesseract.js failed. API error: ${error instanceof Error ? error.message : String(error)}. Tesseract error: ${tesseractError instanceof Error ? tesseractError.message : String(tesseractError)}`,
			);
		}
	}
}

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the tool captures screenshots, performs OCR, saves files to a dated directory in Downloads, and provides default values for parameters. However, it misses details like error handling or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by a bulleted list of options for clarity. Every sentence earns its place by providing essential information without redundancy, making it efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is mostly complete. It covers purpose, parameters, and behavioral aspects like file saving. A minor gap is the lack of detail on OCR output or error cases, but overall it suffices for informed use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, so the description must compensate. It adds meaningful context by explaining what 'region' and 'format' control, including their allowed values and defaults. This goes beyond the schema's enum lists, though it could elaborate on the effects of each format option.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('captures a screenshot', 'performs OCR') and resources ('specified region'), making it immediately understandable. It distinguishes the tool's dual functionality of screenshot capture and OCR processing, which is comprehensive for its domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through the mention of options like region and format, but does not explicitly state when to use this tool versus alternatives. Since there are no sibling tools, this is less critical, but it lacks guidance on scenarios or prerequisites for effective use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

captureA

Related Tools

TakeSourceScreenshot
@yshk-mrt/obs-mcp
getScreenshotB
@livoras/better-playwright-mcp
playwright_screenshotC
@executeautomation/mcp-playwright
browser_take_screenshot
@maywzh/playwright-mcp
browser_take_screenshot
@lewisvoncken/playwright-mcp
browser_take_screenshot
@Angeluis001/playwright-mcp

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kazuph/mcp-screenshot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server