firecrawl_crawl
Crawl multiple web pages from a starting URL with depth control, path filtering, and webhook notifications for data collection.
Instructions
Start an asynchronous crawl of multiple pages from a starting URL. Supports depth control, path filtering, and webhook notifications.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Starting URL for the crawl | |
| excludePaths | No | URL paths to exclude from crawling | |
| includePaths | No | Only crawl these URL paths | |
| maxDepth | No | Maximum link depth to crawl | |
| ignoreSitemap | No | Skip sitemap.xml discovery | |
| limit | No | Maximum number of pages to crawl | |
| allowBackwardLinks | No | Allow crawling links that point to parent directories | |
| allowExternalLinks | No | Allow crawling links to external domains | |
| webhook | No | ||
| deduplicateSimilarURLs | No | Remove similar URLs during crawl | |
| ignoreQueryParameters | No | Ignore query parameters when comparing URLs | |
| scrapeOptions | No | Options for scraping each page |
Implementation Reference
- src/index.ts:1068-1097 (handler)Handler for 'firecrawl_crawl' tool: validates arguments with isCrawlOptions, calls client.asyncCrawlUrl to initiate asynchronous crawl, handles response and returns the crawl job ID.case 'firecrawl_crawl': { if (!isCrawlOptions(args)) { throw new Error('Invalid arguments for firecrawl_crawl'); } const { url, ...options } = args; const response = await withRetry( async () => client.asyncCrawlUrl(url, options), 'crawl operation' ); if (!response.success) { throw new Error(response.error); } // Monitor credits for cloud API if (!FIRECRAWL_API_URL && hasCredits(response)) { await updateCreditUsage(response.creditsUsed); } return { content: [ { type: 'text', text: `Started crawl for ${url} with job ID: ${response.id}`, }, ], isError: false, }; }
- src/index.ts:215-325 (schema)Tool schema definition for 'firecrawl_crawl', specifying name, description, and detailed inputSchema with parameters like url, excludePaths, maxDepth, limit, webhook, etc.const CRAWL_TOOL: Tool = { name: 'firecrawl_crawl', description: 'Start an asynchronous crawl of multiple pages from a starting URL. ' + 'Supports depth control, path filtering, and webhook notifications.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'Starting URL for the crawl', }, excludePaths: { type: 'array', items: { type: 'string' }, description: 'URL paths to exclude from crawling', }, includePaths: { type: 'array', items: { type: 'string' }, description: 'Only crawl these URL paths', }, maxDepth: { type: 'number', description: 'Maximum link depth to crawl', }, ignoreSitemap: { type: 'boolean', description: 'Skip sitemap.xml discovery', }, limit: { type: 'number', description: 'Maximum number of pages to crawl', }, allowBackwardLinks: { type: 'boolean', description: 'Allow crawling links that point to parent directories', }, allowExternalLinks: { type: 'boolean', description: 'Allow crawling links to external domains', }, webhook: { oneOf: [ { type: 'string', description: 'Webhook URL to notify when crawl is complete', }, { type: 'object', properties: { url: { type: 'string', description: 'Webhook URL', }, headers: { type: 'object', description: 'Custom headers for webhook requests', }, }, required: ['url'], }, ], }, deduplicateSimilarURLs: { type: 'boolean', description: 'Remove similar URLs during crawl', }, ignoreQueryParameters: { type: 'boolean', description: 'Ignore query parameters when comparing URLs', }, scrapeOptions: { type: 'object', properties: { formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, }, onlyMainContent: { type: 'boolean', }, includeTags: { type: 'array', items: { type: 'string' }, }, excludeTags: { type: 'array', items: { type: 'string' }, }, waitFor: { type: 'number', }, }, description: 'Options for scraping each page', }, }, required: ['url'], }, };
- src/index.ts:862-874 (registration)Registration of the firecrawl_crawl tool (as CRAWL_TOOL) in the MCP server's listTools request handler.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SCRAPE_TOOL, MAP_TOOL, CRAWL_TOOL, BATCH_SCRAPE_TOOL, CHECK_BATCH_STATUS_TOOL, CHECK_CRAWL_STATUS_TOOL, SEARCH_TOOL, EXTRACT_TOOL, DEEP_RESEARCH_TOOL, ], }));
- src/index.ts:630-637 (helper)Type guard helper function used to validate input arguments for the firecrawl_crawl tool.function isCrawlOptions(args: unknown): args is CrawlParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof (args as { url: unknown }).url === 'string' ); }