Skip to main content
Glama
kazuph
by kazuph

capture

Capture screenshots of specified regions (left, right, or full) with optional OCR, and save them in JSON, markdown, vertical, or horizontal formats to a dated directory in Downloads.

Instructions

Captures a screenshot of the specified region and performs OCR. Options:

  • region: 'left'/'right'/'full' (default: 'left')

  • format: 'json'/'markdown'/'vertical'/'horizontal' (default: 'markdown') The screenshot is saved to a dated directory in Downloads.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
formatNomarkdown
regionNoleft

Implementation Reference

  • Main handler for CallToolRequestSchema implementing the 'capture' tool logic: validates input, captures screenshot using takeScreenshot, performs OCR with performOCR, and returns the result or error.
    server.setRequestHandler(CallToolRequestSchema, async (request) => { try { const { name, arguments: args } = request.params; if (name !== "capture") { throw new Error(`Unknown tool: ${name}`); } const parsed = ScreenshotArgsSchema.safeParse(args); if (!parsed.success) { throw new Error(`Invalid arguments: ${parsed.error}`); } console.error( `Debug: Starting screenshot capture for region: ${parsed.data.region}, format: ${parsed.data.format}`, ); const imagePath = await takeScreenshot(parsed.data.region); console.error(`Debug: Screenshot saved to: ${imagePath}`); const ocrText = await performOCR(imagePath, parsed.data.format); console.error("Debug: OCR completed"); return { content: [ { type: "text", text: `Screenshot saved to: ${imagePath}\n\nOCR Results:\n${ocrText}`, }, ], }; } catch (error) { console.error("Error:", error); return { content: [ { type: "text", text: `Error: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; } });
  • index.ts:227-240 (registration)
    Registration of the 'capture' tool in the ListToolsRequestSchema handler, including name, description, and input schema.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ { name: "capture", description: "Captures a screenshot of the specified region and performs OCR. " + "Options:\n" + "- region: 'left'/'right'/'full' (default: 'left')\n" + "- format: 'json'/'markdown'/'vertical'/'horizontal' (default: 'markdown')\n" + "The screenshot is saved to a dated directory in Downloads.", inputSchema: zodToJsonSchema(ScreenshotArgsSchema) as ToolInput, }, ], }));
  • Zod schema defining input parameters for the 'capture' tool: region (left/right/full) and format (json/markdown/vertical/horizontal).
    const ScreenshotArgsSchema = z.object({ region: z.enum(["left", "right", "full"]).default("left"), format: z .enum(["json", "markdown", "vertical", "horizontal"]) .default("markdown"), });
  • Helper function to take screenshot of full screen and crop to left/right region if specified, saves to dated Downloads folder.
    async function takeScreenshot( region: z.infer<typeof ScreenshotArgsSchema>["region"], ): Promise<string> { const dateDir = await ensureDateDirectory(); const timestamp = new Date().toISOString().replace(/[:.]/g, "-"); const filename = `screenshot-${region}-${timestamp}.png`; const filepath = join(dateDir, filename); try { // Get main display dimensions const { width, height } = await getDisplayDimensions(); console.error( `Debug: Display dimensions - width: ${width}, height: ${height}`, ); // Always capture full screen await execFileAsync("screencapture", [filepath]); // Process image if needed if (region !== "full") { const tempFilePath = `${filepath}.temp.png`; await sharp(filepath).toFile(tempFilePath); const metadata = await sharp(tempFilePath).metadata(); if (!metadata.width || !metadata.height) { throw new Error("Failed to get image dimensions"); } const halfWidth = Math.floor(metadata.width / 2); // Extract left or right half if (region === "left") { await sharp(tempFilePath) .extract({ left: 0, top: 0, width: halfWidth, height: metadata.height, }) .toFile(filepath); } else if (region === "right") { await sharp(tempFilePath) .extract({ left: halfWidth, top: 0, width: halfWidth, height: metadata.height, }) .toFile(filepath); } // Remove temporary file await execFileAsync("rm", [tempFilePath]); } return filepath; } catch (error) { throw new Error(`Screenshot capture failed: ${error}`); } }
  • Helper function for OCR on the screenshot image, first tries API then falls back to Tesseract.js, formats output as specified.
    async function performOCR( imagePath: string, format = "markdown", ): Promise<string> { try { const formData = new FormData(); formData.append("file", createReadStream(imagePath), { filename: imagePath.split("/").pop(), }); const response = await axios.post( `${API_CONFIG.OCR_API_URL}${API_CONFIG.OCR_API_PATH}?format=${format}`, formData, { headers: formData.getHeaders(), }, ); if (response.status !== 200) { throw new Error(`OCR API returned status ${response.status}`); } // Remove <br> tags const content = response.data.content.replace(/<br\s*\/?>/g, ""); return content; } catch (error) { console.error("OCR API error, falling back to Tesseract.js:", error); try { // Configure worker for both Japanese and English recognition console.error("OCR: Creating worker for Japanese and English..."); const worker = await createWorker("jpn+eng"); console.error("OCR: Starting recognition..."); const { data: { text }, } = await worker.recognize(imagePath); console.error("OCR: Recognition completed"); await worker.terminate(); // Format output according to specified format let formattedText = text.trim(); switch (format) { case "json": formattedText = JSON.stringify({ content: text.trim() }); break; case "markdown": formattedText = `\`\`\`\n${text.trim()}\n\`\`\``; break; case "vertical": formattedText = text.trim().split("\n").join("\n\n"); break; case "horizontal": formattedText = text.trim().replace(/\n/g, " "); break; } return formattedText; } catch (tesseractError) { console.error("Tesseract.js error details:", tesseractError); throw new Error( `Both OCR API and Tesseract.js failed. API error: ${error instanceof Error ? error.message : String(error)}. Tesseract error: ${tesseractError instanceof Error ? tesseractError.message : String(tesseractError)}`, ); } } }
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kazuph/mcp-screenshot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server