Skip to main content
Glama

capture

Capture screenshots of specified regions (left, right, or full) with optional OCR, and save them in JSON, markdown, vertical, or horizontal formats to a dated directory in Downloads.

Instructions

Captures a screenshot of the specified region and performs OCR. Options:

  • region: 'left'/'right'/'full' (default: 'left')

  • format: 'json'/'markdown'/'vertical'/'horizontal' (default: 'markdown') The screenshot is saved to a dated directory in Downloads.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
formatNomarkdown
regionNoleft

Implementation Reference

  • Main handler for CallToolRequestSchema implementing the 'capture' tool logic: validates input, captures screenshot using takeScreenshot, performs OCR with performOCR, and returns the result or error.
    server.setRequestHandler(CallToolRequestSchema, async (request) => { try { const { name, arguments: args } = request.params; if (name !== "capture") { throw new Error(`Unknown tool: ${name}`); } const parsed = ScreenshotArgsSchema.safeParse(args); if (!parsed.success) { throw new Error(`Invalid arguments: ${parsed.error}`); } console.error( `Debug: Starting screenshot capture for region: ${parsed.data.region}, format: ${parsed.data.format}`, ); const imagePath = await takeScreenshot(parsed.data.region); console.error(`Debug: Screenshot saved to: ${imagePath}`); const ocrText = await performOCR(imagePath, parsed.data.format); console.error("Debug: OCR completed"); return { content: [ { type: "text", text: `Screenshot saved to: ${imagePath}\n\nOCR Results:\n${ocrText}`, }, ], }; } catch (error) { console.error("Error:", error); return { content: [ { type: "text", text: `Error: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; } });
  • index.ts:227-240 (registration)
    Registration of the 'capture' tool in the ListToolsRequestSchema handler, including name, description, and input schema.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ { name: "capture", description: "Captures a screenshot of the specified region and performs OCR. " + "Options:\n" + "- region: 'left'/'right'/'full' (default: 'left')\n" + "- format: 'json'/'markdown'/'vertical'/'horizontal' (default: 'markdown')\n" + "The screenshot is saved to a dated directory in Downloads.", inputSchema: zodToJsonSchema(ScreenshotArgsSchema) as ToolInput, }, ], }));
  • Zod schema defining input parameters for the 'capture' tool: region (left/right/full) and format (json/markdown/vertical/horizontal).
    const ScreenshotArgsSchema = z.object({ region: z.enum(["left", "right", "full"]).default("left"), format: z .enum(["json", "markdown", "vertical", "horizontal"]) .default("markdown"), });
  • Helper function to take screenshot of full screen and crop to left/right region if specified, saves to dated Downloads folder.
    async function takeScreenshot( region: z.infer<typeof ScreenshotArgsSchema>["region"], ): Promise<string> { const dateDir = await ensureDateDirectory(); const timestamp = new Date().toISOString().replace(/[:.]/g, "-"); const filename = `screenshot-${region}-${timestamp}.png`; const filepath = join(dateDir, filename); try { // Get main display dimensions const { width, height } = await getDisplayDimensions(); console.error( `Debug: Display dimensions - width: ${width}, height: ${height}`, ); // Always capture full screen await execFileAsync("screencapture", [filepath]); // Process image if needed if (region !== "full") { const tempFilePath = `${filepath}.temp.png`; await sharp(filepath).toFile(tempFilePath); const metadata = await sharp(tempFilePath).metadata(); if (!metadata.width || !metadata.height) { throw new Error("Failed to get image dimensions"); } const halfWidth = Math.floor(metadata.width / 2); // Extract left or right half if (region === "left") { await sharp(tempFilePath) .extract({ left: 0, top: 0, width: halfWidth, height: metadata.height, }) .toFile(filepath); } else if (region === "right") { await sharp(tempFilePath) .extract({ left: halfWidth, top: 0, width: halfWidth, height: metadata.height, }) .toFile(filepath); } // Remove temporary file await execFileAsync("rm", [tempFilePath]); } return filepath; } catch (error) { throw new Error(`Screenshot capture failed: ${error}`); } }
  • Helper function for OCR on the screenshot image, first tries API then falls back to Tesseract.js, formats output as specified.
    async function performOCR( imagePath: string, format = "markdown", ): Promise<string> { try { const formData = new FormData(); formData.append("file", createReadStream(imagePath), { filename: imagePath.split("/").pop(), }); const response = await axios.post( `${API_CONFIG.OCR_API_URL}${API_CONFIG.OCR_API_PATH}?format=${format}`, formData, { headers: formData.getHeaders(), }, ); if (response.status !== 200) { throw new Error(`OCR API returned status ${response.status}`); } // Remove <br> tags const content = response.data.content.replace(/<br\s*\/?>/g, ""); return content; } catch (error) { console.error("OCR API error, falling back to Tesseract.js:", error); try { // Configure worker for both Japanese and English recognition console.error("OCR: Creating worker for Japanese and English..."); const worker = await createWorker("jpn+eng"); console.error("OCR: Starting recognition..."); const { data: { text }, } = await worker.recognize(imagePath); console.error("OCR: Recognition completed"); await worker.terminate(); // Format output according to specified format let formattedText = text.trim(); switch (format) { case "json": formattedText = JSON.stringify({ content: text.trim() }); break; case "markdown": formattedText = `\`\`\`\n${text.trim()}\n\`\`\``; break; case "vertical": formattedText = text.trim().split("\n").join("\n\n"); break; case "horizontal": formattedText = text.trim().replace(/\n/g, " "); break; } return formattedText; } catch (tesseractError) { console.error("Tesseract.js error details:", tesseractError); throw new Error( `Both OCR API and Tesseract.js failed. API error: ${error instanceof Error ? error.message : String(error)}. Tesseract error: ${tesseractError instanceof Error ? tesseractError.message : String(tesseractError)}`, ); } } }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kazuph/mcp-screenshot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server