Skip to main content
Glama
acchuang

Jina AI Remote MCP Server

by acchuang

capture_screenshot_url

Capture web page screenshots in base64 JPEG format for visual inspection, analysis, or display. Specify URL and choose single-screen or full-page capture options.

Instructions

Capture high-quality screenshots of web pages in base64 encoded JPEG format. Use this tool when you need to visually inspect a website, take a snapshot for analysis, or show users what a webpage looks like.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe complete HTTP/HTTPS URL of the webpage to capture (e.g., 'https://example.com')
firstScreenOnlyNoSet to true for a single screen capture (faster), false for full page capture including content below the fold
return_urlNoSet to true to return screenshot URLs instead of downloading images as base64

Implementation Reference

  • The core handler function that implements capture_screenshot_url tool logic: calls r.jina.ai API with URL, handles response to get screenshot URL, downloads the image, encodes as base64 PNG, and returns as MCP image content with error handling.
    async ({ url, firstScreenOnly }: { url: string; firstScreenOnly?: boolean }) => {
    	try {
    		const props = getProps();
    		const headers: Record<string, string> = {
    			'Accept': 'application/json',
    			'Content-Type': 'application/json',
    			'X-Return-Format': firstScreenOnly === true ? 'screenshot' : 'pageshot',
    		};
    
    		// Add Authorization header if bearer token is available
    		if (props.bearerToken) {
    			headers['Authorization'] = `Bearer ${props.bearerToken}`;
    		}
    
    		const response = await fetch('https://r.jina.ai/', {
    			method: 'POST',
    			headers,
    			body: JSON.stringify({ url }),
    		});
    
    		if (!response.ok) {
    			return handleApiError(response, "Screenshot capture");
    		}
    
    		const data = await response.json() as any;
    
    		// Fetch and return the screenshot as base64-encoded image
    		const imageUrl = data.data.screenshotUrl || data.data.pageshotUrl;
    		if (imageUrl) {
    			try {
    				// Download the image from the URL
    				const imageResponse = await fetch(imageUrl);
    				if (!imageResponse.ok) {
    					return {
    						content: [
    							{
    								type: "text" as const,
    								text: `Error: Failed to download screenshot from ${imageUrl}`,
    							},
    						],
    						isError: true,
    					};
    				}
    
    				// Convert to base64
    				const imageBuffer = await imageResponse.arrayBuffer();
    				const base64Image = Buffer.from(imageBuffer).toString('base64');
    
    				return {
    					content: [
    						{
    							type: "image" as const,
    							data: base64Image,
    							mimeType: "image/png",
    						},
    					],
    				};
    			} catch (downloadError) {
    				return {
    					content: [
    						{
    							type: "text" as const,
    							text: `Error downloading screenshot: ${downloadError instanceof Error ? downloadError.message : String(downloadError)}`,
    						},
    					],
    					isError: true,
    				};
    			}
    		} else {
    			return {
    				content: [
    					{
    						type: "text" as const,
    						text: "Error: No screenshot URL received from API",
    					},
    				],
    				isError: true,
    			};
    		}
    	} catch (error) {
    		return {
    			content: [
    				{
    					type: "text" as const,
    					text: `Error: ${error instanceof Error ? error.message : String(error)}`,
    				},
    			],
    			isError: true,
    		};
    	}
    },
  • Zod schema defining input parameters for the tool: required 'url' (string URL) and optional 'firstScreenOnly' (boolean).
    {
    	url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to capture (e.g., 'https://example.com')"),
    	firstScreenOnly: z.boolean().optional().describe("Set to true for a single screen capture (faster), false for full page capture including content below the fold (default: false)")
    },
  • Registers the 'capture_screenshot_url' tool on the MCP server using server.tool(), providing name, description, input schema, and handler function.
    server.tool(
    	"capture_screenshot_url",
    	"Capture high-quality screenshots of web pages. Use this tool when you need to visually inspect a website, take a snapshot for analysis, or show users what a webpage looks like. Returns the screenshot as a base64-encoded PNG image that can be displayed directly.",
    	{
    		url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to capture (e.g., 'https://example.com')"),
    		firstScreenOnly: z.boolean().optional().describe("Set to true for a single screen capture (faster), false for full page capture including content below the fold (default: false)")
    	},
    	async ({ url, firstScreenOnly }: { url: string; firstScreenOnly?: boolean }) => {
    		try {
    			const props = getProps();
    			const headers: Record<string, string> = {
    				'Accept': 'application/json',
    				'Content-Type': 'application/json',
    				'X-Return-Format': firstScreenOnly === true ? 'screenshot' : 'pageshot',
    			};
    
    			// Add Authorization header if bearer token is available
    			if (props.bearerToken) {
    				headers['Authorization'] = `Bearer ${props.bearerToken}`;
    			}
    
    			const response = await fetch('https://r.jina.ai/', {
    				method: 'POST',
    				headers,
    				body: JSON.stringify({ url }),
    			});
    
    			if (!response.ok) {
    				return handleApiError(response, "Screenshot capture");
    			}
    
    			const data = await response.json() as any;
    
    			// Fetch and return the screenshot as base64-encoded image
    			const imageUrl = data.data.screenshotUrl || data.data.pageshotUrl;
    			if (imageUrl) {
    				try {
    					// Download the image from the URL
    					const imageResponse = await fetch(imageUrl);
    					if (!imageResponse.ok) {
    						return {
    							content: [
    								{
    									type: "text" as const,
    									text: `Error: Failed to download screenshot from ${imageUrl}`,
    								},
    							],
    							isError: true,
    						};
    					}
    
    					// Convert to base64
    					const imageBuffer = await imageResponse.arrayBuffer();
    					const base64Image = Buffer.from(imageBuffer).toString('base64');
    
    					return {
    						content: [
    							{
    								type: "image" as const,
    								data: base64Image,
    								mimeType: "image/png",
    							},
    						],
    					};
    				} catch (downloadError) {
    					return {
    						content: [
    							{
    								type: "text" as const,
    								text: `Error downloading screenshot: ${downloadError instanceof Error ? downloadError.message : String(downloadError)}`,
    							},
    						],
    						isError: true,
    					};
    				}
    			} else {
    				return {
    					content: [
    						{
    							type: "text" as const,
    							text: "Error: No screenshot URL received from API",
    						},
    					],
    					isError: true,
    				};
    			}
    		} catch (error) {
    			return {
    				content: [
    					{
    						type: "text" as const,
    						text: `Error: ${error instanceof Error ? error.message : String(error)}`,
    					},
    				],
    				isError: true,
    			};
    		}
    	},
    );
  • src/index.ts:21-21 (registration)
    Calls registerJinaTools in the MCP agent's init() method, which registers all Jina tools including capture_screenshot_url.
    registerJinaTools(this.server, () => this.props);
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. While it mentions the output format (base64 JPEG), it doesn't cover important aspects like performance characteristics (e.g., timeouts, size limits), error handling, authentication requirements, or rate limits. For a tool that interacts with external websites, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise - two sentences that efficiently communicate the tool's purpose and usage scenarios. Every word earns its place with no redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters, 100% schema coverage, but no annotations or output schema, the description provides adequate purpose and usage context. However, it lacks important behavioral details (performance, errors, limits) that would be crucial for an agent to use this tool effectively in production scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so all parameters are well-documented in the schema itself. The description doesn't add any additional parameter semantics beyond what's in the schema (e.g., it doesn't explain trade-offs between firstScreenOnly options or when to use return_url). Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('capture screenshots'), resource ('web pages'), and output format ('base64 encoded JPEG format'). It distinguishes from siblings like 'read_url' or 'parallel_read_url' by focusing on visual capture rather than text extraction or parallel processing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool ('visually inspect a website, take a snapshot for analysis, or show users what a webpage looks like'), which helps differentiate it from text-based or search-oriented siblings. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/acchuang/jina-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server