batch_webpage_scrape
Scrape multiple webpages simultaneously using concurrent processing to extract content efficiently without requiring official APIs.
Instructions
Batch scrape multiple webpages with concurrent processing support.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | List of webpage URLs to scrape, up to 20. | |
| maxConcurrent | No | Maximum concurrency |
Implementation Reference
- src/mcp/server.js:283-339 (handler)The handler function that implements the batch_webpage_scrape tool. It validates input, processes URLs in concurrent batches using scrapeWebpage from searchService, collects successes and errors, and returns structured results.async function handleBatchWebpageScrape(args) { const { urls, maxConcurrent = 3 } = args; if (!Array.isArray(urls) || urls.length === 0) { throw new Error('urls must be a non-empty array'); } if (urls.length > 20) { throw new Error('A maximum of 20 URLs is supported'); } if (maxConcurrent < 1 || maxConcurrent > 10) { throw new Error('maxConcurrent must be between 1 and 10'); } const searchService = (await import('../services/searchService.js')).default; const results = []; const errors = []; // Process in batches for (let i = 0; i < urls.length; i += maxConcurrent) { const batch = urls.slice(i, i + maxConcurrent); const batchPromises = batch.map(async (url) => { try { const result = await searchService.scrapeWebpage(url); return { success: true, url, data: result }; } catch (error) { return { success: false, url, error: error.message }; } }); const batchResults = await Promise.allSettled(batchPromises); batchResults.forEach((result) => { if (result.status === 'fulfilled') { if (result.value.success) { results.push(result.value); } else { errors.push(result.value); } } else { errors.push({ url: 'unknown', error: result.reason?.message || 'Unknown error' }); } }); } return { tool: 'batch_webpage_scrape', totalUrls: urls.length, successful: results.length, failed: errors.length, maxConcurrent, results, errors, timestamp: new Date().toISOString() };
- src/mcp/server.js:74-99 (schema)Tool schema definition including input schema for batch_webpage_scrape, defining urls array and maxConcurrent parameters.{ name: 'batch_webpage_scrape', description: 'Batch scrape multiple webpages with concurrent processing support.', inputSchema: { type: 'object', properties: { urls: { type: 'array', items: { type: 'string' }, description: 'List of webpage URLs to scrape, up to 20.', minItems: 1, maxItems: 20 }, maxConcurrent: { type: 'number', description: 'Maximum concurrency', default: 3, minimum: 1, maximum: 10 } }, required: ['urls'] } }
- src/mcp/server.js:142-144 (registration)Registration of the batch_webpage_scrape handler in the main tool dispatcher switch statement.case 'batch_webpage_scrape': result = await handleBatchWebpageScrape(args); break;