Skip to main content
Glama
wlmwwx

Jina AI Remote MCP Server

by wlmwwx

parallel_read_url

Extract clean content from multiple web pages simultaneously to compare sources or gather information efficiently. Supports up to 5 URLs with options for links and images.

Instructions

Read multiple web pages in parallel to extract clean content efficiently. For best results, provide multiple URLs that you need to extract simultaneously. This is useful for comparing content across multiple sources or gathering information from multiple pages at once.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlsYesArray of URL configurations to read in parallel (maximum 5 URLs for optimal performance)
timeoutNoTimeout in milliseconds for all URL reads

Implementation Reference

  • The complete handler and registration block for the 'parallel_read_url' tool, including Zod input schema, description, and the execution logic that handles input validation, deduplication, parallel execution via utility, and response formatting.
    server.tool(
    	"parallel_read_url",
    	"Read multiple web pages in parallel to extract clean content efficiently. For best results, provide multiple URLs that you need to extract simultaneously. This is useful for comparing content across multiple sources or gathering information from multiple pages at once. 💡 Use this when you need to analyze multiple sources simultaneously for efficiency.",
    	{
    		urls: z.array(z.object({
    			url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert"),
    			withAllLinks: z.boolean().default(false).describe("Set to true to extract and return all hyperlinks found on the page as structured data"),
    			withAllImages: z.boolean().default(false).describe("Set to true to extract and return all images found on the page as structured data")
    		})).max(5).describe("Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance)"),
    		timeout: z.number().default(30000).describe("Timeout in milliseconds for all URL reads")
    	},
    	async ({ urls, timeout }: { urls: Array<{ url: string; withAllLinks: boolean; withAllImages: boolean }>; timeout: number }) => {
    		try {
    			const props = getProps();
    
    			const uniqueUrls = urls.filter((urlConfig, index, self) =>
    				index === self.findIndex(u => u.url === urlConfig.url)
    			);
    
    			// Import the utility functions
    			const { executeParallelUrlReads } = await import("../utils/read.js");
    
    			// Execute parallel URL reads using the utility
    			const results = await executeParallelUrlReads(uniqueUrls, props.bearerToken, timeout);
    
    			// Format results for consistent output
    			const contentItems: Array<{ type: 'text'; text: string }> = [];
    
    			for (const result of results) {
    				if ('success' in result && result.success) {
    					contentItems.push({
    						type: "text" as const,
    						text: yamlStringify(result.structuredData),
    					});
    				}
    			}
    
    			return {
    				content: contentItems,
    			};
    		} catch (error) {
    			return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`);
    		}
    	},
    );
  • Helper utility to execute multiple single-URL reads in parallel with a global timeout, using Promise.all raced against a timeout promise.
    export async function executeParallelUrlReads(
        urlConfigs: ReadUrlConfig[],
        bearerToken?: string,
        timeout: number = 30000
    ): Promise<ReadUrlResponse[]> {
        const timeoutPromise = new Promise<never>((_, reject) =>
            setTimeout(() => reject(new Error('Parallel URL read timeout')), timeout)
        );
    
        const readPromises = urlConfigs.map(urlConfig => readUrlFromConfig(urlConfig, bearerToken));
    
        return Promise.race([
            Promise.all(readPromises),
            timeoutPromise
        ]);
    }
  • Core helper function for reading a single URL: normalizes URL, sends POST to r.jina.ai with custom headers for auth/links/images, parses JSON response into structured data (url, title, content, optional links/images).
    export async function readUrlFromConfig(
        urlConfig: ReadUrlConfig,
        bearerToken?: string
    ): Promise<ReadUrlResponse> {
        try {
            // Normalize the URL first
            const normalizedUrl = normalizeUrl(urlConfig.url);
            if (!normalizedUrl) {
                return { error: "Invalid or unsupported URL", url: urlConfig.url };
            }
    
            const headers: Record<string, string> = {
                'Accept': 'application/json',
                'Content-Type': 'application/json',
                'X-Md-Link-Style': 'discarded',
            };
    
            // Add Authorization header if bearer token is available
            if (bearerToken) {
                headers['Authorization'] = `Bearer ${bearerToken}`;
            }
    
            if (urlConfig.withAllLinks) {
                headers['X-With-Links-Summary'] = 'all';
            }
    
            if (urlConfig.withAllImages) {
                headers['X-With-Images-Summary'] = 'true';
            } else {
                headers['X-Retain-Images'] = 'none';
            }
    
            const response = await fetch('https://r.jina.ai/', {
                method: 'POST',
                headers,
                body: JSON.stringify({ url: normalizedUrl }),
            });
    
            if (!response.ok) {
                return { error: `HTTP ${response.status}: ${response.statusText}`, url: urlConfig.url };
            }
    
            const data = await response.json() as any;
    
            if (!data.data) {
                return { error: "Invalid response data from r.jina.ai", url: urlConfig.url };
            }
    
            // Prepare structured data
            const structuredData: any = {
                url: data.data.url,
                title: data.data.title,
            };
    
            if (urlConfig.withAllLinks && data.data.links) {
                structuredData.links = data.data.links.map((link: [string, string]) => ({
                    anchorText: link[0],
                    url: link[1]
                }));
            }
    
            if (urlConfig.withAllImages && data.data.images) {
                structuredData.images = data.data.images;
            }
            structuredData.content = data.data.content || "";
    
            return {
                success: true,
                url: urlConfig.url,
                structuredData,
                withAllLinks: urlConfig.withAllLinks || false,
                withAllImages: urlConfig.withAllImages || false
            };
        } catch (error) {
            return {
                error: error instanceof Error ? error.message : String(error),
                url: urlConfig.url
            };
        }
    }
  • Zod input schema for the parallel_read_url tool.
    {
    	urls: z.array(z.object({
    		url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert"),
    		withAllLinks: z.boolean().default(false).describe("Set to true to extract and return all hyperlinks found on the page as structured data"),
    		withAllImages: z.boolean().default(false).describe("Set to true to extract and return all images found on the page as structured data")
    	})).max(5).describe("Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance)"),
    	timeout: z.number().default(30000).describe("Timeout in milliseconds for all URL reads")
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions 'extract clean content efficiently' and 'optimal performance' with a maximum of 5 URLs, it doesn't disclose important behavioral traits like error handling (what happens if a URL fails?), rate limits, authentication needs, or what 'clean content' specifically means. The description adds some context but leaves significant gaps for a tool that performs web requests.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with three sentences that each serve a purpose: stating the core function, providing usage guidance, and explaining benefits. It's front-loaded with the main purpose first. While efficient, the third sentence could be slightly more concise by combining the two use cases.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description should do more to compensate. It adequately covers the purpose and basic usage but lacks details about behavioral traits, error handling, and what 'clean content' extraction entails. For a web reading tool with potential complexity around failures and content processing, this leaves important gaps despite the good schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters (urls array with maxItems:5 and timeout with default). The description adds marginal value by reinforcing the 'multiple URLs' concept and 'optimal performance' with up to 5 URLs, but doesn't provide additional semantic meaning beyond what's in the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Read multiple web pages in parallel to extract clean content efficiently.' It specifies the verb ('read'), resource ('multiple web pages'), and key behavior ('in parallel'), distinguishing it from sibling tools like 'read_url' (singular) and 'parallel_search_web' (searching rather than reading/extracting).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'For best results, provide multiple URLs that you need to extract simultaneously' and 'useful for comparing content across multiple sources or gathering information from multiple pages at once.' It implies this is for parallel extraction of multiple pages, but doesn't explicitly state when NOT to use it or name alternatives like 'read_url' for single URLs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wlmwwx/jina-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server