batch_crawl
Crawl multiple URLs simultaneously to process URL lists, compare web pages, or extract bulk data efficiently with parallel processing capabilities.
Instructions
[STATELESS] Crawl multiple URLs concurrently for efficiency. Use when: processing URL lists, comparing multiple pages, or bulk data extraction. Faster than sequential crawling. Max 5 concurrent by default. Each URL gets a fresh browser. Cannot maintain state between URLs. For persistent operations use create_session + crawl.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| bypass_cache | No | Bypass cache for all URLs | |
| max_concurrent | No | Parallel request limit. Higher = faster but more resource intensive. Adjust based on server capacity and rate limits | |
| remove_images | No | Remove images from output by excluding img, picture, and svg tags | |
| urls | Yes | List of URLs to crawl |
Implementation Reference
- src/handlers/crawl-handlers.ts:16-88 (handler)Main handler implementation for the batch_crawl tool. Handles both legacy and new config formats, calls the /crawl endpoint, processes results, adds metrics, and formats the response.async batchCrawl(options: BatchCrawlOptions) { try { let response; // Check if we have per-URL configs (new in 0.7.3/0.7.4) if (options.configs && options.configs.length > 0) { // Use the new configs array format // Extract URLs from configs for the urls field const urls = options.configs.map((config) => config.url); const requestBody = { urls: urls, configs: options.configs, max_concurrent: options.max_concurrent, }; response = await this.axiosClient.post('/crawl', requestBody); } else { // Use the legacy format with single crawler_config // Build crawler config if needed const crawler_config: Record<string, unknown> = {}; // Handle remove_images by using exclude_tags if (options.remove_images) { crawler_config.exclude_tags = ['img', 'picture', 'svg']; } if (options.bypass_cache) { crawler_config.cache_mode = 'BYPASS'; } response = await this.axiosClient.post('/crawl', { urls: options.urls, max_concurrent: options.max_concurrent, crawler_config: Object.keys(crawler_config).length > 0 ? crawler_config : undefined, }); } const results = response.data.results || []; // Add memory metrics if available let metricsText = ''; const responseData = response.data as CrawlEndpointResponse; if (responseData.server_memory_delta_mb !== undefined || responseData.server_peak_memory_mb !== undefined) { const memoryInfo = []; if (responseData.server_processing_time_s !== undefined) { memoryInfo.push(`Processing time: ${responseData.server_processing_time_s.toFixed(2)}s`); } if (responseData.server_memory_delta_mb !== undefined) { memoryInfo.push(`Memory delta: ${responseData.server_memory_delta_mb.toFixed(1)}MB`); } if (responseData.server_peak_memory_mb !== undefined) { memoryInfo.push(`Peak memory: ${responseData.server_peak_memory_mb.toFixed(1)}MB`); } if (memoryInfo.length > 0) { metricsText = `\n\nServer metrics: ${memoryInfo.join(', ')}`; } } return { content: [ { type: 'text', text: `Batch crawl completed. Processed ${results.length} URLs:\n\n${results .map( (r: CrawlResultItem, i: number) => `${i + 1}. ${options.urls[i]}: ${r.success ? 'Success' : 'Failed'}`, ) .join('\n')}${metricsText}`, }, ], }; } catch (error) { throw this.formatError(error, 'batch crawl'); } }
- Zod schema defining the input parameters and validation for the batch_crawl tool, including support for urls array, concurrency, options, and per-URL configs.export const BatchCrawlSchema = createStatelessSchema( z.object({ urls: z.array(z.string().url()), max_concurrent: z.number().optional(), remove_images: z.boolean().optional(), bypass_cache: z.boolean().optional(), // New: Support per-URL configs array (0.7.3/0.7.4) configs: z .array( z.object({ url: z.string().url(), browser_config: z.record(z.unknown()).optional(), crawler_config: z.record(z.unknown()).optional(), extraction_strategy: z.record(z.unknown()).optional(), table_extraction_strategy: z.record(z.unknown()).optional(), markdown_generator_options: z.record(z.unknown()).optional(), matcher: z.union([z.string(), z.function()]).optional(), }), ) .optional(), }), 'batch_crawl', );
- src/server.ts:212-241 (registration)Tool registration in the listTools response, defining name, description, and inputSchema for batch_crawl.name: 'batch_crawl', description: '[STATELESS] Crawl multiple URLs concurrently for efficiency. Use when: processing URL lists, comparing multiple pages, or bulk data extraction. Faster than sequential crawling. Max 5 concurrent by default. Each URL gets a fresh browser. Cannot maintain state between URLs. For persistent operations use create_session + crawl.', inputSchema: { type: 'object', properties: { urls: { type: 'array', items: { type: 'string' }, description: 'List of URLs to crawl', }, max_concurrent: { type: 'number', description: 'Parallel request limit. Higher = faster but more resource intensive. Adjust based on server capacity and rate limits', default: 5, }, remove_images: { type: 'boolean', description: 'Remove images from output by excluding img, picture, and svg tags', default: false, }, bypass_cache: { type: 'boolean', description: 'Bypass cache for all URLs', default: false, }, }, required: ['urls'], },
- src/server.ts:847-850 (registration)Dispatch logic in the CallToolRequest handler that validates arguments using BatchCrawlSchema and calls the batchCrawl handler method.case 'batch_crawl': return await this.validateAndExecute('batch_crawl', args, BatchCrawlSchema, async (validatedArgs) => this.crawlHandlers.batchCrawl(validatedArgs as BatchCrawlOptions), );