parallel_read_url
Extract clean content from multiple web pages simultaneously to compare sources or gather information efficiently. Supports up to 5 URLs with options for links and images.
Instructions
Read multiple web pages in parallel to extract clean content efficiently. For best results, provide multiple URLs that you need to extract simultaneously. This is useful for comparing content across multiple sources or gathering information from multiple pages at once.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance) | |
| timeout | No | Timeout in milliseconds for all URL reads |
Implementation Reference
- src/tools/jina-tools.ts:490-534 (handler)The complete handler and registration block for the 'parallel_read_url' tool, including Zod input schema, description, and the execution logic that handles input validation, deduplication, parallel execution via utility, and response formatting.server.tool( "parallel_read_url", "Read multiple web pages in parallel to extract clean content efficiently. For best results, provide multiple URLs that you need to extract simultaneously. This is useful for comparing content across multiple sources or gathering information from multiple pages at once. 💡 Use this when you need to analyze multiple sources simultaneously for efficiency.", { urls: z.array(z.object({ url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert"), withAllLinks: z.boolean().default(false).describe("Set to true to extract and return all hyperlinks found on the page as structured data"), withAllImages: z.boolean().default(false).describe("Set to true to extract and return all images found on the page as structured data") })).max(5).describe("Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance)"), timeout: z.number().default(30000).describe("Timeout in milliseconds for all URL reads") }, async ({ urls, timeout }: { urls: Array<{ url: string; withAllLinks: boolean; withAllImages: boolean }>; timeout: number }) => { try { const props = getProps(); const uniqueUrls = urls.filter((urlConfig, index, self) => index === self.findIndex(u => u.url === urlConfig.url) ); // Import the utility functions const { executeParallelUrlReads } = await import("../utils/read.js"); // Execute parallel URL reads using the utility const results = await executeParallelUrlReads(uniqueUrls, props.bearerToken, timeout); // Format results for consistent output const contentItems: Array<{ type: 'text'; text: string }> = []; for (const result of results) { if ('success' in result && result.success) { contentItems.push({ type: "text" as const, text: yamlStringify(result.structuredData), }); } } return { content: contentItems, }; } catch (error) { return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`); } }, );
- src/utils/read.ts:119-134 (helper)Helper utility to execute multiple single-URL reads in parallel with a global timeout, using Promise.all raced against a timeout promise.export async function executeParallelUrlReads( urlConfigs: ReadUrlConfig[], bearerToken?: string, timeout: number = 30000 ): Promise<ReadUrlResponse[]> { const timeoutPromise = new Promise<never>((_, reject) => setTimeout(() => reject(new Error('Parallel URL read timeout')), timeout) ); const readPromises = urlConfigs.map(urlConfig => readUrlFromConfig(urlConfig, bearerToken)); return Promise.race([ Promise.all(readPromises), timeoutPromise ]); }
- src/utils/read.ts:35-114 (helper)Core helper function for reading a single URL: normalizes URL, sends POST to r.jina.ai with custom headers for auth/links/images, parses JSON response into structured data (url, title, content, optional links/images).export async function readUrlFromConfig( urlConfig: ReadUrlConfig, bearerToken?: string ): Promise<ReadUrlResponse> { try { // Normalize the URL first const normalizedUrl = normalizeUrl(urlConfig.url); if (!normalizedUrl) { return { error: "Invalid or unsupported URL", url: urlConfig.url }; } const headers: Record<string, string> = { 'Accept': 'application/json', 'Content-Type': 'application/json', 'X-Md-Link-Style': 'discarded', }; // Add Authorization header if bearer token is available if (bearerToken) { headers['Authorization'] = `Bearer ${bearerToken}`; } if (urlConfig.withAllLinks) { headers['X-With-Links-Summary'] = 'all'; } if (urlConfig.withAllImages) { headers['X-With-Images-Summary'] = 'true'; } else { headers['X-Retain-Images'] = 'none'; } const response = await fetch('https://r.jina.ai/', { method: 'POST', headers, body: JSON.stringify({ url: normalizedUrl }), }); if (!response.ok) { return { error: `HTTP ${response.status}: ${response.statusText}`, url: urlConfig.url }; } const data = await response.json() as any; if (!data.data) { return { error: "Invalid response data from r.jina.ai", url: urlConfig.url }; } // Prepare structured data const structuredData: any = { url: data.data.url, title: data.data.title, }; if (urlConfig.withAllLinks && data.data.links) { structuredData.links = data.data.links.map((link: [string, string]) => ({ anchorText: link[0], url: link[1] })); } if (urlConfig.withAllImages && data.data.images) { structuredData.images = data.data.images; } structuredData.content = data.data.content || ""; return { success: true, url: urlConfig.url, structuredData, withAllLinks: urlConfig.withAllLinks || false, withAllImages: urlConfig.withAllImages || false }; } catch (error) { return { error: error instanceof Error ? error.message : String(error), url: urlConfig.url }; } }
- src/tools/jina-tools.ts:493-499 (schema)Zod input schema for the parallel_read_url tool.{ urls: z.array(z.object({ url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert"), withAllLinks: z.boolean().default(false).describe("Set to true to extract and return all hyperlinks found on the page as structured data"), withAllImages: z.boolean().default(false).describe("Set to true to extract and return all images found on the page as structured data") })).max(5).describe("Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance)"), timeout: z.number().default(30000).describe("Timeout in milliseconds for all URL reads")