Skip to main content
Glama
wlmwwx

Jina AI Remote MCP Server

by wlmwwx

capture_screenshot_url

Capture web page screenshots in base64 JPEG format for visual inspection, analysis, or demonstration. Specify URLs and choose between single-screen or full-page capture.

Instructions

Capture high-quality screenshots of web pages in base64 encoded JPEG format. Use this tool when you need to visually inspect a website, take a snapshot for analysis, or show users what a webpage looks like.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe complete HTTP/HTTPS URL of the webpage to capture (e.g., 'https://example.com')
firstScreenOnlyNoSet to true for a single screen capture (faster), false for full page capture including content below the fold
return_urlNoSet to true to return screenshot URLs instead of downloading images as base64

Implementation Reference

  • Full implementation of the capture_screenshot_url tool handler, including input schema validation with Zod, API call to r.jina.ai for screenshot generation, optional image downloading/processing with downloadImages utility, and error handling. This is the core execution logic.
    server.tool(
    	"capture_screenshot_url",
    	"Capture high-quality screenshots of web pages in base64 encoded JPEG format. Use this tool when you need to visually inspect a website, take a snapshot for analysis, or show users what a webpage looks like.",
    	{
    		url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to capture (e.g., 'https://example.com')"),
    		firstScreenOnly: z.boolean().default(false).describe("Set to true for a single screen capture (faster), false for full page capture including content below the fold"),
    		return_url: z.boolean().default(false).describe("Set to true to return screenshot URLs instead of downloading images as base64")
    	},
    	async ({ url, firstScreenOnly, return_url }: { url: string; firstScreenOnly: boolean; return_url: boolean }) => {
    		try {
    			const props = getProps();
    			const headers: Record<string, string> = {
    				'Accept': 'application/json',
    				'Content-Type': 'application/json',
    				'X-Return-Format': firstScreenOnly === true ? 'screenshot' : 'pageshot',
    			};
    
    			// Add Authorization header if bearer token is available
    			if (props.bearerToken) {
    				headers['Authorization'] = `Bearer ${props.bearerToken}`;
    			}
    
    			const response = await fetch('https://r.jina.ai/', {
    				method: 'POST',
    				headers,
    				body: JSON.stringify({ url }),
    			});
    
    			if (!response.ok) {
    				return handleApiError(response, "Screenshot capture");
    			}
    
    			const data = await response.json() as any;
    
    			// Get the screenshot URL from the response
    			const imageUrl = data.data.screenshotUrl || data.data.pageshotUrl;
    			if (!imageUrl) {
    				throw new Error("No screenshot URL received from API");
    			}
    
    			// Prepare response content - always return as list structure for consistency
    			const contentItems: Array<{ type: 'text'; text: string } | { type: 'image'; data: string; mimeType: string }> = [];
    
    			if (return_url) {
    				// Return the URL as text
    				contentItems.push({
    					type: "text" as const,
    					text: imageUrl,
    				});
    			} else {
    				// Download and process the image (resize to max 800px, convert to JPEG)
    				const processedResults = await downloadImages(imageUrl, 1, 10000);
    				const processedResult = processedResults[0];
    
    				if (!processedResult.success) {
    					throw new Error(`Failed to process screenshot: ${processedResult.error}`);
    				}
    
    				contentItems.push({
    					type: "image" as const,
    					data: processedResult.data!,
    					mimeType: "image/jpeg",
    				});
    			}
    
    			return {
    				content: contentItems,
    			};
    		} catch (error) {
    			return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`);
    		}
    	},
    );
  • Zod schema for input parameters of capture_screenshot_url tool: url (required URL), firstScreenOnly (boolean, default false for full page), return_url (boolean, default false for base64 image).
    {
    	url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to capture (e.g., 'https://example.com')"),
    	firstScreenOnly: z.boolean().default(false).describe("Set to true for a single screen capture (faster), false for full page capture including content below the fold"),
    	return_url: z.boolean().default(false).describe("Set to true to return screenshot URLs instead of downloading images as base64")
    },
  • src/index.ts:21-21 (registration)
    Top-level registration of all Jina tools, including capture_screenshot_url, by calling registerJinaTools on the MCP server instance.
    registerJinaTools(this.server, () => this.props);
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the output format (base64 JPEG), it doesn't cover critical aspects like performance implications (e.g., speed differences between firstScreenOnly options), potential failures (e.g., timeouts, unsupported sites), authentication needs, or rate limits. The description is insufficient for a mutation-like tool (capturing external resources).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first states the core functionality and output format, the second provides usage scenarios. Every sentence adds value with zero wasted words, making it easy to parse and front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (interacting with external websites, multiple parameters) and lack of annotations and output schema, the description is incomplete. It doesn't address behavioral aspects like error handling, performance, or security considerations, which are crucial for a tool that captures web content. The description should provide more context to compensate for missing structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain trade-offs between firstScreenOnly options or when to use return_url). Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('capture screenshots') and resources ('web pages'), specifying the output format ('base64 encoded JPEG format'). It distinguishes from sibling tools like 'read_url' or 'search_web' by focusing on visual capture rather than text extraction or search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool ('visually inspect a website, take a snapshot for analysis, or show users what a webpage looks like'), giving practical scenarios. However, it doesn't explicitly state when NOT to use it or name specific alternatives among siblings (e.g., 'read_url' for text content).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wlmwwx/jina-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server