Skip to main content
Glama
jina-ai

Jina AI Remote MCP Server

Official
by jina-ai

parallel_read_url

Extract clean content from multiple web pages simultaneously to compare information across sources or gather data from several pages at once.

Instructions

Read multiple web pages in parallel to extract clean content efficiently. For best results, provide multiple URLs that you need to extract simultaneously. This is useful for comparing content across multiple sources or gathering information from multiple pages at once.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlsYesArray of URL configurations to read in parallel (maximum 5 URLs for optimal performance)
timeoutNoTimeout in milliseconds for all URL reads

Implementation Reference

  • The main handler function for the 'parallel_read_url' tool. Deduplicates URLs, dynamically imports and calls the parallel read utility, processes results into YAML-formatted text content items, applies token guardrail, and handles errors.
    async ({ urls, timeout }: { urls: Array<{ url: string; withAllLinks: boolean; withAllImages: boolean }>; timeout: number }) => {
    	try {
    		const props = getProps();
    
    		const uniqueUrls = urls.filter((urlConfig, index, self) =>
    			index === self.findIndex(u => u.url === urlConfig.url)
    		);
    
    		// Import the utility functions
    		const { executeParallelUrlReads } = await import("../utils/read.js");
    
    		// Execute parallel URL reads using the utility
    		const results = await executeParallelUrlReads(uniqueUrls, props.bearerToken, timeout);
    
    		// Format results for consistent output
    		const contentItems: Array<{ type: 'text'; text: string }> = [];
    
    		for (const result of results) {
    			if ('success' in result && result.success) {
    				contentItems.push({
    					type: "text" as const,
    					text: yamlStringify(result.structuredData),
    				});
    			}
    		}
    
    		return applyTokenGuardrail({
    			content: contentItems,
    		}, props.bearerToken, getClientName());
    	} catch (error) {
    		return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`);
    	}
    },
  • Zod input schema defining the parameters for the parallel_read_url tool: array of up to 5 URL configs with optional link/image extraction flags, and timeout.
    {
    	urls: z.array(z.object({
    		url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert"),
    		withAllLinks: z.boolean().default(false).describe("Set to true to extract and return all hyperlinks found on the page as structured data"),
    		withAllImages: z.boolean().default(false).describe("Set to true to extract and return all images found on the page as structured data")
    	})).max(5).describe("Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance)"),
    	timeout: z.number().default(30000).describe("Timeout in milliseconds for all URL reads")
    },
  • Conditional registration of the parallel_read_url tool on the MCP server using server.tool(), including name, description, and references to schema and handler.
    if (isToolEnabled("parallel_read_url")) {
    	server.tool(
    		"parallel_read_url",
  • Helper utility that runs multiple single-URL reads in parallel via Promise.all, wrapped in a timeout promise for overall operation timeout control.
    export async function executeParallelUrlReads(
        urlConfigs: ReadUrlConfig[],
        bearerToken?: string,
        timeout: number = 30000
    ): Promise<ReadUrlResponse[]> {
        const timeoutPromise = new Promise<never>((_, reject) =>
            setTimeout(() => reject(new Error('Parallel URL read timeout')), timeout)
        );
    
        const readPromises = urlConfigs.map(urlConfig => readUrlFromConfig(urlConfig, bearerToken));
    
        return Promise.race([
            Promise.all(readPromises),
            timeoutPromise
        ]);
    }
  • Core helper for single URL content extraction: normalizes URL, sets custom headers for Jina reader API (r.jina.ai), handles auth/links/images options, parses response into structured data (title, content, optional links/images), returns success/error.
    export async function readUrlFromConfig(
        urlConfig: ReadUrlConfig,
        bearerToken?: string
    ): Promise<ReadUrlResponse> {
        try {
            // Normalize the URL first
            const normalizedUrl = normalizeUrl(urlConfig.url);
            if (!normalizedUrl) {
                return { error: "Invalid or unsupported URL", url: urlConfig.url };
            }
    
            const headers: Record<string, string> = {
                'Accept': 'application/json',
                'Content-Type': 'application/json',
                'X-Md-Link-Style': 'discarded',
            };
    
            // Add Authorization header if bearer token is available
            if (bearerToken) {
                headers['Authorization'] = `Bearer ${bearerToken}`;
            }
    
            if (urlConfig.withAllLinks) {
                headers['X-With-Links-Summary'] = 'all';
            }
    
            if (urlConfig.withAllImages) {
                headers['X-With-Images-Summary'] = 'true';
            } else {
                headers['X-Retain-Images'] = 'none';
            }
    
            const response = await fetch('https://r.jina.ai/', {
                method: 'POST',
                headers,
                body: JSON.stringify({ url: normalizedUrl }),
            });
    
            if (!response.ok) {
                return { error: `HTTP ${response.status}: ${response.statusText}`, url: urlConfig.url };
            }
    
            const data = await response.json() as any;
    
            if (!data.data) {
                return { error: "Invalid response data from r.jina.ai", url: urlConfig.url };
            }
    
            // Prepare structured data
            const structuredData: any = {
                url: data.data.url,
                title: data.data.title,
            };
    
            if (urlConfig.withAllLinks && data.data.links) {
                structuredData.links = data.data.links.map((link: [string, string]) => ({
                    anchorText: link[0],
                    url: link[1]
                }));
            }
    
            if (urlConfig.withAllImages && data.data.images) {
                structuredData.images = data.data.images;
            }
            structuredData.content = data.data.content || "";
    
            return {
                success: true,
                url: urlConfig.url,
                structuredData,
                withAllLinks: urlConfig.withAllLinks || false,
                withAllImages: urlConfig.withAllImages || false
            };
        } catch (error) {
            return {
                error: error instanceof Error ? error.message : String(error),
                url: urlConfig.url
            };
        }
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions efficiency benefits and the parallel nature of the operation, but doesn't disclose important behavioral traits like error handling, rate limits, authentication requirements, or what happens when URLs fail. It mentions 'optimal performance' with max 5 URLs but doesn't explain consequences of exceeding this limit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with three sentences that each earn their place: first states the core functionality, second provides usage guidance, third gives concrete use cases. It's front-loaded with the main purpose and wastes no words while being comprehensive.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description should do more to explain behavioral aspects and expected outputs. While it covers purpose and usage well, it doesn't describe what the tool returns (clean content format, error responses, or structured data from links/images). For a tool with 2 parameters and no annotation coverage, the description is adequate but leaves gaps in behavioral transparency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds minimal parameter semantics beyond what's in the schema - it mentions 'multiple URLs' which aligns with the 'urls' parameter, but doesn't provide additional context about parameter usage or relationships beyond what the schema already specifies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('read multiple web pages in parallel', 'extract clean content efficiently') and distinguishes it from sibling tools like 'read_url' by emphasizing parallel processing and multi-URL capability. It explicitly mentions the resource ('web pages') and the efficiency benefit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'provide multiple URLs that you need to extract simultaneously' and gives concrete use cases ('comparing content across multiple sources', 'gathering information from multiple pages at once'). It distinguishes from single-URL alternatives by emphasizing parallel processing for multiple URLs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jina-ai/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server