Skip to main content
Glama
mobile-next

Mobile Next MCP Server

Official
by mobile-next

Take Screenshot

mobile_take_screenshot
Read-only

Capture screenshots from mobile devices to analyze on-screen content and identify interactive elements for mobile automation tasks.

Instructions

Take a screenshot of the mobile device. Use this to understand what's on screen, if you need to press an element that is available through view hierarchy then you must list elements on screen instead. Do not cache this result.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
deviceYesThe device identifier to use. Use mobile_list_available_devices to find which devices are available to you.

Implementation Reference

  • Handler function that executes the mobile_take_screenshot tool: retrieves Robot for device, captures screenshot, processes (validate PNG, optional resize/JPEG), base64 encodes, returns image content.
    async ({ device }) => {
    	try {
    		const robot = getRobotFromDevice(device);
    		const screenSize = await robot.getScreenSize();
    
    		let screenshot = await robot.getScreenshot();
    		let mimeType = "image/png";
    
    		// validate we received a png, will throw exception otherwise
    		const image = new PNG(screenshot);
    		const pngSize = image.getDimensions();
    		if (pngSize.width <= 0 || pngSize.height <= 0) {
    			throw new ActionableError("Screenshot is invalid. Please try again.");
    		}
    
    		if (isScalingAvailable()) {
    			trace("Image scaling is available, resizing screenshot");
    			const image = Image.fromBuffer(screenshot);
    			const beforeSize = screenshot.length;
    			screenshot = image.resize(Math.floor(pngSize.width / screenSize.scale))
    				.jpeg({ quality: 75 })
    				.toBuffer();
    
    			const afterSize = screenshot.length;
    			trace(`Screenshot resized from ${beforeSize} bytes to ${afterSize} bytes`);
    
    			mimeType = "image/jpeg";
    		}
    
    		const screenshot64 = screenshot.toString("base64");
    		trace(`Screenshot taken: ${screenshot.length} bytes`);
    		posthog("tool_invoked", {
    			"ToolName": "mobile_take_screenshot",
    			"ScreenshotFilesize": screenshot64.length,
    			"ScreenshotMimeType": mimeType,
    			"ScreenshotWidth": pngSize.width,
    			"ScreenshotHeight": pngSize.height,
    		}).then();
    
    		return {
    			content: [{ type: "image", data: screenshot64, mimeType }]
    		};
    	} catch (err: any) {
    		error(`Error taking screenshot: ${err.message} ${err.stack}`);
    		return {
    			content: [{ type: "text", text: `Error: ${err.message}` }],
    			isError: true,
    		};
    	}
    }
  • Tool metadata including title, description, and input schema (device ID).
    {
    	title: "Take Screenshot",
    	description: "Take a screenshot of the mobile device. Use this to understand what's on screen, if you need to press an element that is available through view hierarchy then you must list elements on screen instead. Do not cache this result.",
    	inputSchema: {
    		device: z.string().describe("The device identifier to use. Use mobile_list_available_devices to find which devices are available to you.")
    	}
  • src/server.ts:521-580 (registration)
    Registration of the mobile_take_screenshot tool using server.registerTool, including schema and inline handler.
    server.registerTool(
    	"mobile_take_screenshot",
    	{
    		title: "Take Screenshot",
    		description: "Take a screenshot of the mobile device. Use this to understand what's on screen, if you need to press an element that is available through view hierarchy then you must list elements on screen instead. Do not cache this result.",
    		inputSchema: {
    			device: z.string().describe("The device identifier to use. Use mobile_list_available_devices to find which devices are available to you.")
    		}
    	},
    	async ({ device }) => {
    		try {
    			const robot = getRobotFromDevice(device);
    			const screenSize = await robot.getScreenSize();
    
    			let screenshot = await robot.getScreenshot();
    			let mimeType = "image/png";
    
    			// validate we received a png, will throw exception otherwise
    			const image = new PNG(screenshot);
    			const pngSize = image.getDimensions();
    			if (pngSize.width <= 0 || pngSize.height <= 0) {
    				throw new ActionableError("Screenshot is invalid. Please try again.");
    			}
    
    			if (isScalingAvailable()) {
    				trace("Image scaling is available, resizing screenshot");
    				const image = Image.fromBuffer(screenshot);
    				const beforeSize = screenshot.length;
    				screenshot = image.resize(Math.floor(pngSize.width / screenSize.scale))
    					.jpeg({ quality: 75 })
    					.toBuffer();
    
    				const afterSize = screenshot.length;
    				trace(`Screenshot resized from ${beforeSize} bytes to ${afterSize} bytes`);
    
    				mimeType = "image/jpeg";
    			}
    
    			const screenshot64 = screenshot.toString("base64");
    			trace(`Screenshot taken: ${screenshot.length} bytes`);
    			posthog("tool_invoked", {
    				"ToolName": "mobile_take_screenshot",
    				"ScreenshotFilesize": screenshot64.length,
    				"ScreenshotMimeType": mimeType,
    				"ScreenshotWidth": pngSize.width,
    				"ScreenshotHeight": pngSize.height,
    			}).then();
    
    			return {
    				content: [{ type: "image", data: screenshot64, mimeType }]
    			};
    		} catch (err: any) {
    			error(`Error taking screenshot: ${err.message} ${err.stack}`);
    			return {
    				content: [{ type: "text", text: `Error: ${err.message}` }],
    				isError: true,
    			};
    		}
    	}
    );
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond the readOnlyHint annotation: it specifies that the result should not be cached (implying freshness or temporary nature) and clarifies the tool's purpose for understanding screen content rather than interaction. No contradictions with annotations exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded, with three sentences that each serve a distinct purpose: stating the action, providing usage guidelines, and adding a behavioral constraint. There is no wasted verbiage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (single parameter, read-only operation) and the presence of annotations (readOnlyHint), the description is mostly complete. It lacks details on output format (e.g., image data or file path) since there's no output schema, but it compensates with strong usage and behavioral guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not mention the 'device' parameter, but schema description coverage is 100% (the schema fully documents the parameter with a clear description and reference to mobile_list_available_devices). With high schema coverage, the baseline is 3, as the description adds no additional parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Take a screenshot') and resource ('mobile device'), distinguishing it from siblings like mobile_save_screenshot (which saves) or mobile_list_elements_on_screen (which lists elements). It explicitly mentions the purpose 'to understand what's on screen'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('to understand what's on screen') and when not to use it ('if you need to press an element... then you must list elements on screen instead'), with a clear alternative (mobile_list_elements_on_screen). It also includes a prohibition ('Do not cache this result').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mobile-next/mobile-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server