parallel_read_url
Extract clean content from multiple web pages simultaneously to compare information across sources or gather data from several pages at once.
Instructions
Read multiple web pages in parallel to extract clean content efficiently. For best results, provide multiple URLs that you need to extract simultaneously. This is useful for comparing content across multiple sources or gathering information from multiple pages at once.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance) | |
| timeout | No | Timeout in milliseconds for all URL reads |
Implementation Reference
- src/tools/jina-tools.ts:783-815 (handler)The main handler function for the 'parallel_read_url' tool. Deduplicates URLs, dynamically imports and calls the parallel read utility, processes results into YAML-formatted text content items, applies token guardrail, and handles errors.async ({ urls, timeout }: { urls: Array<{ url: string; withAllLinks: boolean; withAllImages: boolean }>; timeout: number }) => { try { const props = getProps(); const uniqueUrls = urls.filter((urlConfig, index, self) => index === self.findIndex(u => u.url === urlConfig.url) ); // Import the utility functions const { executeParallelUrlReads } = await import("../utils/read.js"); // Execute parallel URL reads using the utility const results = await executeParallelUrlReads(uniqueUrls, props.bearerToken, timeout); // Format results for consistent output const contentItems: Array<{ type: 'text'; text: string }> = []; for (const result of results) { if ('success' in result && result.success) { contentItems.push({ type: "text" as const, text: yamlStringify(result.structuredData), }); } } return applyTokenGuardrail({ content: contentItems, }, props.bearerToken, getClientName()); } catch (error) { return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`); } },
- src/tools/jina-tools.ts:775-782 (schema)Zod input schema defining the parameters for the parallel_read_url tool: array of up to 5 URL configs with optional link/image extraction flags, and timeout.{ urls: z.array(z.object({ url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert"), withAllLinks: z.boolean().default(false).describe("Set to true to extract and return all hyperlinks found on the page as structured data"), withAllImages: z.boolean().default(false).describe("Set to true to extract and return all images found on the page as structured data") })).max(5).describe("Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance)"), timeout: z.number().default(30000).describe("Timeout in milliseconds for all URL reads") },
- src/tools/jina-tools.ts:771-773 (registration)Conditional registration of the parallel_read_url tool on the MCP server using server.tool(), including name, description, and references to schema and handler.if (isToolEnabled("parallel_read_url")) { server.tool( "parallel_read_url",
- src/utils/read.ts:119-134 (helper)Helper utility that runs multiple single-URL reads in parallel via Promise.all, wrapped in a timeout promise for overall operation timeout control.export async function executeParallelUrlReads( urlConfigs: ReadUrlConfig[], bearerToken?: string, timeout: number = 30000 ): Promise<ReadUrlResponse[]> { const timeoutPromise = new Promise<never>((_, reject) => setTimeout(() => reject(new Error('Parallel URL read timeout')), timeout) ); const readPromises = urlConfigs.map(urlConfig => readUrlFromConfig(urlConfig, bearerToken)); return Promise.race([ Promise.all(readPromises), timeoutPromise ]); }
- src/utils/read.ts:35-114 (helper)Core helper for single URL content extraction: normalizes URL, sets custom headers for Jina reader API (r.jina.ai), handles auth/links/images options, parses response into structured data (title, content, optional links/images), returns success/error.export async function readUrlFromConfig( urlConfig: ReadUrlConfig, bearerToken?: string ): Promise<ReadUrlResponse> { try { // Normalize the URL first const normalizedUrl = normalizeUrl(urlConfig.url); if (!normalizedUrl) { return { error: "Invalid or unsupported URL", url: urlConfig.url }; } const headers: Record<string, string> = { 'Accept': 'application/json', 'Content-Type': 'application/json', 'X-Md-Link-Style': 'discarded', }; // Add Authorization header if bearer token is available if (bearerToken) { headers['Authorization'] = `Bearer ${bearerToken}`; } if (urlConfig.withAllLinks) { headers['X-With-Links-Summary'] = 'all'; } if (urlConfig.withAllImages) { headers['X-With-Images-Summary'] = 'true'; } else { headers['X-Retain-Images'] = 'none'; } const response = await fetch('https://r.jina.ai/', { method: 'POST', headers, body: JSON.stringify({ url: normalizedUrl }), }); if (!response.ok) { return { error: `HTTP ${response.status}: ${response.statusText}`, url: urlConfig.url }; } const data = await response.json() as any; if (!data.data) { return { error: "Invalid response data from r.jina.ai", url: urlConfig.url }; } // Prepare structured data const structuredData: any = { url: data.data.url, title: data.data.title, }; if (urlConfig.withAllLinks && data.data.links) { structuredData.links = data.data.links.map((link: [string, string]) => ({ anchorText: link[0], url: link[1] })); } if (urlConfig.withAllImages && data.data.images) { structuredData.images = data.data.images; } structuredData.content = data.data.content || ""; return { success: true, url: urlConfig.url, structuredData, withAllLinks: urlConfig.withAllLinks || false, withAllImages: urlConfig.withAllImages || false }; } catch (error) { return { error: error instanceof Error ? error.message : String(error), url: urlConfig.url }; } }