Skip to main content
Glama
omgwtfwow

MCP Server for Crawl4AI

by omgwtfwow

batch_crawl

Crawl multiple URLs concurrently to process URL lists, compare pages, or extract bulk web data efficiently with parallel requests.

Instructions

[STATELESS] Crawl multiple URLs concurrently for efficiency. Use when: processing URL lists, comparing multiple pages, or bulk data extraction. Faster than sequential crawling. Max 5 concurrent by default. Each URL gets a fresh browser. Cannot maintain state between URLs. For persistent operations use create_session + crawl.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlsYesList of URLs to crawl
max_concurrentNoParallel request limit. Higher = faster but more resource intensive. Adjust based on server capacity and rate limits
remove_imagesNoRemove images from output by excluding img, picture, and svg tags
bypass_cacheNoBypass cache for all URLs

Implementation Reference

  • Implements the core logic for the batch_crawl tool by sending batch requests to the Crawl4AI /crawl endpoint, handling both legacy and new config formats, processing results, and formatting the response with success status and server metrics.
    async batchCrawl(options: BatchCrawlOptions) {
      try {
        let response;
    
        // Check if we have per-URL configs (new in 0.7.3/0.7.4)
        if (options.configs && options.configs.length > 0) {
          // Use the new configs array format
          // Extract URLs from configs for the urls field
          const urls = options.configs.map((config) => config.url);
          const requestBody = {
            urls: urls,
            configs: options.configs,
            max_concurrent: options.max_concurrent,
          };
          response = await this.axiosClient.post('/crawl', requestBody);
        } else {
          // Use the legacy format with single crawler_config
          // Build crawler config if needed
          const crawler_config: Record<string, unknown> = {};
    
          // Handle remove_images by using exclude_tags
          if (options.remove_images) {
            crawler_config.exclude_tags = ['img', 'picture', 'svg'];
          }
    
          if (options.bypass_cache) {
            crawler_config.cache_mode = 'BYPASS';
          }
    
          response = await this.axiosClient.post('/crawl', {
            urls: options.urls,
            max_concurrent: options.max_concurrent,
            crawler_config: Object.keys(crawler_config).length > 0 ? crawler_config : undefined,
          });
        }
    
        const results = response.data.results || [];
    
        // Add memory metrics if available
        let metricsText = '';
        const responseData = response.data as CrawlEndpointResponse;
        if (responseData.server_memory_delta_mb !== undefined || responseData.server_peak_memory_mb !== undefined) {
          const memoryInfo = [];
          if (responseData.server_processing_time_s !== undefined) {
            memoryInfo.push(`Processing time: ${responseData.server_processing_time_s.toFixed(2)}s`);
          }
          if (responseData.server_memory_delta_mb !== undefined) {
            memoryInfo.push(`Memory delta: ${responseData.server_memory_delta_mb.toFixed(1)}MB`);
          }
          if (responseData.server_peak_memory_mb !== undefined) {
            memoryInfo.push(`Peak memory: ${responseData.server_peak_memory_mb.toFixed(1)}MB`);
          }
          if (memoryInfo.length > 0) {
            metricsText = `\n\nServer metrics: ${memoryInfo.join(', ')}`;
          }
        }
    
        return {
          content: [
            {
              type: 'text',
              text: `Batch crawl completed. Processed ${results.length} URLs:\n\n${results
                .map(
                  (r: CrawlResultItem, i: number) => `${i + 1}. ${options.urls[i]}: ${r.success ? 'Success' : 'Failed'}`,
                )
                .join('\n')}${metricsText}`,
            },
          ],
        };
      } catch (error) {
        throw this.formatError(error, 'batch crawl');
      }
    }
  • Zod schema defining and validating the input parameters for the batch_crawl tool, supporting both simple urls array and advanced per-URL configs.
    export const BatchCrawlSchema = createStatelessSchema(
      z.object({
        urls: z.array(z.string().url()),
        max_concurrent: z.number().optional(),
        remove_images: z.boolean().optional(),
        bypass_cache: z.boolean().optional(),
        // New: Support per-URL configs array (0.7.3/0.7.4)
        configs: z
          .array(
            z.object({
              url: z.string().url(),
              browser_config: z.record(z.unknown()).optional(),
              crawler_config: z.record(z.unknown()).optional(),
              extraction_strategy: z.record(z.unknown()).optional(),
              table_extraction_strategy: z.record(z.unknown()).optional(),
              markdown_generator_options: z.record(z.unknown()).optional(),
              matcher: z.union([z.string(), z.function()]).optional(),
            }),
          )
          .optional(),
      }),
      'batch_crawl',
    );
  • src/server.ts:211-241 (registration)
    Defines the batch_crawl tool in the listTools response, including name, description, and inputSchema matching the validation schema.
    {
      name: 'batch_crawl',
      description:
        '[STATELESS] Crawl multiple URLs concurrently for efficiency. Use when: processing URL lists, comparing multiple pages, or bulk data extraction. Faster than sequential crawling. Max 5 concurrent by default. Each URL gets a fresh browser. Cannot maintain state between URLs. For persistent operations use create_session + crawl.',
      inputSchema: {
        type: 'object',
        properties: {
          urls: {
            type: 'array',
            items: { type: 'string' },
            description: 'List of URLs to crawl',
          },
          max_concurrent: {
            type: 'number',
            description:
              'Parallel request limit. Higher = faster but more resource intensive. Adjust based on server capacity and rate limits',
            default: 5,
          },
          remove_images: {
            type: 'boolean',
            description: 'Remove images from output by excluding img, picture, and svg tags',
            default: false,
          },
          bypass_cache: {
            type: 'boolean',
            description: 'Bypass cache for all URLs',
            default: false,
          },
        },
        required: ['urls'],
      },
  • src/server.ts:847-850 (registration)
    Registers the routing for batch_crawl tool calls in the main switch statement, invoking validation with BatchCrawlSchema and delegating to CrawlHandlers.batchCrawl.
    case 'batch_crawl':
      return await this.validateAndExecute('batch_crawl', args, BatchCrawlSchema, async (validatedArgs) =>
        this.crawlHandlers.batchCrawl(validatedArgs as BatchCrawlOptions),
      );
  • Supporting service method for legacy batch crawling (used by other handlers), builds request body and makes API call to /crawl endpoint.
    async batchCrawl(options: BatchCrawlOptions) {
      // Validate URLs
      if (!options.urls || options.urls.length === 0) {
        throw new Error('URLs array cannot be empty');
      }
    
      // Build crawler config if needed
      const crawler_config: Record<string, unknown> = {};
    
      // Handle remove_images by using exclude_tags
      if (options.remove_images) {
        crawler_config.exclude_tags = ['img', 'picture', 'svg'];
      }
    
      if (options.bypass_cache) {
        crawler_config.cache_mode = 'BYPASS';
      }
    
      try {
        const response = await this.axiosClient.post('/crawl', {
          urls: options.urls,
          max_concurrent: options.max_concurrent,
          crawler_config: Object.keys(crawler_config).length > 0 ? crawler_config : undefined,
        });
    
        return response.data;
      } catch (error) {
        return handleAxiosError(error);
      }
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server