Skip to main content
Glama

takeScreenshot

Capture a specific screen as a base64 encoded image for web development analysis. Use screen IDs from listScreens to target the desired display.

Instructions

Take a screenshot of a specific screen and return it as a base64 encoded string.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
screenIdNoID of the screen to capture. Use listScreens to find available screens. Default is 1 (main screen)
timeoutNoMaximum time to wait in milliseconds (default: 0, no timeout)

Implementation Reference

  • Core implementation of takeScreenshot: captures screenshot using platform-specific shell commands, handles multiple screens on macOS, resizes for screenId=2 using sharp, returns base64 PNG data.
    export async function takeScreenshot(
    	url: string,
    	options: {
    		width?: number
    		height?: number
    		fullPage?: boolean
    		waitForSelector?: string
    		timeout?: number
    		screenId?: number
    	} = {}
    ): Promise<string> {
    	try {
    		const screenId = options.screenId !== undefined ? options.screenId : 1
    
    		// Create a temporary file path
    		const tmpDir = os.tmpdir()
    		const screenshotPath = path.join(tmpDir, `screenshot-${Date.now()}.png`)
    
    		// Determine which command to use based on OS
    		const platform = os.platform()
    		let cmd = ""
    
    		if (platform === "darwin") {
    			// Get display info first to check if the requested display exists
    			const screens = await getAvailableScreens()
    			const displayExists = screens.some(
    				(screen) => screen.id === screenId
    			)
    
    			if (displayExists || screenId === 1) {
    				// On macOS, use screencapture with -D flag to specify display ID
    				cmd = `screencapture -D ${screenId} -x "${screenshotPath}"`
    			} else {
    				// Default to main display if the specified one doesn't exist
    				console.warn(
    					`Display ID ${screenId} not found, defaulting to main display`
    				)
    				cmd = `screencapture -x "${screenshotPath}"`
    			}
    		} else if (platform === "win32") {
    			// Windows implementation - doesn't support multiple screens yet
    			cmd = `powershell -command "Add-Type -AssemblyName System.Windows.Forms; [System.Windows.Forms.SendKeys]::SendWait('{PRTSC}'); Start-Sleep -Milliseconds 500; $img = [System.Windows.Forms.Clipboard]::GetImage(); $img.Save('${screenshotPath}')"`
    		} else if (platform === "linux") {
    			// Linux implementation - doesn't support multiple screens yet
    			cmd = `import -window root "${screenshotPath}"`
    		} else {
    			throw new Error(`Unsupported platform: ${platform}`)
    		}
    
    		// Execute the command
    		await execAsync(cmd, { timeout: options.timeout || 0 })
    
    		// If screenId is 2, resize the image to 819x1456 px
    		if (screenId === 2) {
    			const resizedPath = path.join(
    				tmpDir,
    				`screenshot-resized-${Date.now()}.png`
    			)
    
    			await sharp(screenshotPath)
    				.resize(819, 1456, {
    					fit: "contain",
    					background: { r: 255, g: 255, b: 255, alpha: 1 }
    				})
    				.toFile(resizedPath)
    
    			// Delete the original screenshot and use the resized one
    			await fs.promises.unlink(screenshotPath)
    
    			// Read the resized screenshot file and convert to base64
    			const screenshotData = await fs.promises.readFile(resizedPath)
    			const base64Data = screenshotData.toString("base64")
    
    			// Clean up the temporary file
    			await fs.promises.unlink(resizedPath)
    
    			return base64Data
    		}
    
    		// For other screenIds, continue with the original behavior
    		const screenshotData = await fs.promises.readFile(screenshotPath)
    		const base64Data = screenshotData.toString("base64")
    
    		// Clean up the temporary file
    		await fs.promises.unlink(screenshotPath)
    
    		return base64Data
    	} catch (error) {
    		console.error("Error taking screenshot:", error)
    		throw new Error(
    			`Failed to capture screenshot: ${error instanceof Error ? error.message : String(error)}`
    		)
    	}
    }
  • src/index.ts:54-111 (registration)
    MCP tool registration for 'takeScreenshot': sets name, description, Zod input schema (screenId, timeout), and wrapper handler that calls core takeScreenshot and formats MCP response with image.
    server.tool(
    	"takeScreenshot",
    	"Take a screenshot of a specific screen and return it as a base64 encoded string.",
    	{
    		screenId: z
    			.number()
    			.optional()
    			.describe(
    				"ID of the screen to capture. Use listScreens to find available screens. Default is 1 (main screen)"
    			),
    		timeout: z
    			.number()
    			.optional()
    			.describe(
    				"Maximum time to wait in milliseconds (default: 0, no timeout)"
    			)
    	},
    	async ({ screenId, timeout }) => {
    		try {
    			const targetScreenId = screenId !== undefined ? screenId : 1
    			console.log(`Taking screenshot of screen ${targetScreenId}...`)
    
    			// Call the screenshot function with options
    			const screenshot = await takeScreenshot("", {
    				screenId: targetScreenId,
    				timeout
    			})
    
    			console.log("Screenshot captured successfully")
    
    			return {
    				content: [
    					{
    						type: "text",
    						text: `Screenshot of screen ${targetScreenId} captured successfully`
    					},
    					{
    						type: "image",
    						data: screenshot,
    						mimeType: "image/png"
    					}
    				]
    			}
    		} catch (error: unknown) {
    			console.error("Error taking screenshot:", error)
    			const errorMessage =
    				error instanceof Error ? error.message : String(error)
    			return {
    				content: [
    					{
    						type: "text",
    						text: `Error taking screenshot: ${errorMessage}`
    					}
    				]
    			}
    		}
    	}
    )
  • Input schema using Zod for takeScreenshot tool parameters: optional screenId (number) and timeout (number).
    {
    	screenId: z
    		.number()
    		.optional()
    		.describe(
    			"ID of the screen to capture. Use listScreens to find available screens. Default is 1 (main screen)"
    		),
    	timeout: z
    		.number()
    		.optional()
    		.describe(
    			"Maximum time to wait in milliseconds (default: 0, no timeout)"
    		)
    },
  • Supporting function getAvailableScreens() that lists detectible displays/screens using system_profiler on macOS (parses output for multiple screens), fallbacks for other OS. Used by takeScreenshot to validate screenId and by listScreens tool.
    export async function getAvailableScreens(): Promise<
    	{ id: number; description: string }[]
    > {
    	try {
    		const platform = os.platform()
    
    		if (platform === "darwin") {
    			// Use system_profiler to get display information
    			const { stdout } = await execAsync(
    				"system_profiler SPDisplaysDataType"
    			)
    
    			// Parse the system_profiler output to properly identify displays
    			// First, separate the output into sections for each display
    			const displayData = stdout
    				.split("Graphics/Displays:")
    				.filter((section) => section.trim().length > 0)
    
    			if (displayData.length === 0) {
    				// Fallback if no displays were found
    				return [{ id: 1, description: "Main Display" }]
    			}
    
    			// Initial display list with Main Display
    			const displays: { id: number; description: string }[] = []
    
    			// Scan through raw output and extract display sections
    			const displaySections = stdout
    				.split(/^\s*\w+:/m)
    				.filter(
    					(section) =>
    						section.includes("Display Type") ||
    						section.includes("Resolution") ||
    						section.includes("Type:")
    				)
    
    			// Check if we're dealing with a more specific format
    			if (displaySections.length === 0) {
    				// Fallback to main display only
    				return [{ id: 1, description: "Main Display" }]
    			}
    
    			// Process each section to extract display info
    			for (let i = 0; i < displaySections.length; i++) {
    				const section = displaySections[i]
    				const lines = section.split("\n").map((line) => line.trim())
    
    				let displayName = ""
    				let resolution = ""
    				let displayType = ""
    
    				// Extract relevant display information
    				for (const line of lines) {
    					if (
    						line.includes("Display Type:") ||
    						line.includes("Type:")
    					) {
    						displayType = line.split(":")[1]?.trim() || "Unknown"
    					} else if (line.includes("Resolution:")) {
    						resolution = line.split(":")[1]?.trim() || ""
    					} else if (line.includes("Name:")) {
    						displayName = line.split(":")[1]?.trim() || ""
    					}
    				}
    
    				// Combine information for a descriptive name
    				let description = displayName || displayType || "Display"
    				if (resolution) {
    					description += ` (${resolution})`
    				}
    
    				// Add to display list
    				displays.push({
    					id: i + 1, // Display IDs in screencapture start at 1
    					description
    				})
    			}
    
    			// If we found displays but none matched our parsing, add a fallback
    			if (displays.length === 0) {
    				displays.push({ id: 1, description: "Main Display" })
    			}
    
    			// Use an alternative method: Check output from "system_profiler SPDisplaysDataType"
    			if (displays.length <= 1) {
    				// Try running a different command to get screen info
    				try {
    					const { stdout: screenInfo } = await execAsync(
    						"system_profiler SPDisplaysDataType | grep Resolution"
    					)
    					const resolutions = screenInfo
    						.split("\n")
    						.filter((line) => line.includes("Resolution"))
    
    					// If we have multiple resolutions, we likely have multiple screens
    					if (resolutions.length > 1) {
    						displays.length = 0 // Clear existing displays
    
    						// Add each detected display
    						for (let i = 0; i < resolutions.length; i++) {
    							const resolution =
    								resolutions[i].split(":")[1]?.trim() || ""
    							displays.push({
    								id: i + 1,
    								description:
    									i === 0
    										? `Main Display (${resolution})`
    										: `External Display ${i} (${resolution})`
    							})
    						}
    					}
    				} catch (err) {
    					console.error("Error getting alternative screen info:", err)
    				}
    			}
    
    			return displays
    		}
    
    		if (platform === "win32") {
    			// Windows implementation (placeholder)
    			return [
    				{
    					id: 0,
    					description:
    						"Primary Screen (multi-screen selection not supported yet)"
    				}
    			]
    		}
    
    		if (platform === "linux") {
    			// Linux implementation (placeholder)
    			return [
    				{
    					id: 0,
    					description:
    						"Primary Screen (multi-screen selection not supported yet)"
    				}
    			]
    		}
    
    		throw new Error(`Unsupported platform: ${platform}`)
    	} catch (error) {
    		console.error("Error listing screens:", error)
    		return [{ id: 1, description: "Main Display" }]
    	}
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the output behavior (returns base64 encoded string) and implies a capture action, but lacks details on permissions needed, potential side effects (e.g., screen flicker), error conditions, or rate limits. It's adequate but has gaps for a tool that interacts with system resources.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action and output. Every word earns its place, with no redundancy or unnecessary elaboration, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (capturing screenshots with two parameters), no annotations, and no output schema, the description is reasonably complete. It covers the purpose, output format, and references a sibling tool, but could benefit from more behavioral context (e.g., permissions or errors) to be fully comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters (screenId and timeout). The description adds no additional parameter semantics beyond what's in the schema, such as format details or constraints. Baseline 3 is appropriate when the schema does all the work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Take a screenshot') and resource ('of a specific screen'), and distinguishes from the sibling tool 'listScreens' by mentioning it for finding available screens. It's precise about the verb, target, and output format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by referencing the sibling tool 'listScreens' to find screen IDs, which implies when to use this tool (after identifying screens). However, it doesn't explicitly state when not to use it or name alternatives beyond the implied workflow.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zueai/webdev-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server