firecrawl_crawl

firecrawl_crawl

Crawl multiple web pages from a starting URL with depth control, path filtering, and webhook notifications for asynchronous data collection.

Instructions

Start an asynchronous crawl of multiple pages from a starting URL. Supports depth control, path filtering, and webhook notifications.

Input Schema

TableJSON Schema

Name	Required	Description
`url`	Yes	Starting URL for the crawl
`excludePaths`	No	URL paths to exclude from crawling
`includePaths`	No	Only crawl these URL paths
`maxDepth`	No	Maximum link depth to crawl
`ignoreSitemap`	No	Skip sitemap.xml discovery
`limit`	No	Maximum number of pages to crawl
`allowBackwardLinks`	No	Allow crawling links that point to parent directories
`allowExternalLinks`	No	Allow crawling links to external domains
`webhook`	No
`deduplicateSimilarURLs`	No	Remove similar URLs during crawl
`ignoreQueryParameters`	No	Ignore query parameters when comparing URLs
`scrapeOptions`	No	Options for scraping each page

Implementation Reference

src/index.ts:216-326 (schema)
Defines the input schema and metadata for the firecrawl_crawl tool, including parameters like url, excludePaths, maxDepth, etc.
const CRAWL_TOOL: Tool = { name: 'firecrawl_crawl', description: 'Start an asynchronous crawl of multiple pages from a starting URL. ' + 'Supports depth control, path filtering, and webhook notifications.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'Starting URL for the crawl', }, excludePaths: { type: 'array', items: { type: 'string' }, description: 'URL paths to exclude from crawling', }, includePaths: { type: 'array', items: { type: 'string' }, description: 'Only crawl these URL paths', }, maxDepth: { type: 'number', description: 'Maximum link depth to crawl', }, ignoreSitemap: { type: 'boolean', description: 'Skip sitemap.xml discovery', }, limit: { type: 'number', description: 'Maximum number of pages to crawl', }, allowBackwardLinks: { type: 'boolean', description: 'Allow crawling links that point to parent directories', }, allowExternalLinks: { type: 'boolean', description: 'Allow crawling links to external domains', }, webhook: { oneOf: [ { type: 'string', description: 'Webhook URL to notify when crawl is complete', }, { type: 'object', properties: { url: { type: 'string', description: 'Webhook URL', }, headers: { type: 'object', description: 'Custom headers for webhook requests', }, }, required: ['url'], }, ], }, deduplicateSimilarURLs: { type: 'boolean', description: 'Remove similar URLs during crawl', }, ignoreQueryParameters: { type: 'boolean', description: 'Ignore query parameters when comparing URLs', }, scrapeOptions: { type: 'object', properties: { formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, }, onlyMainContent: { type: 'boolean', }, includeTags: { type: 'array', items: { type: 'string' }, }, excludeTags: { type: 'array', items: { type: 'string' }, }, waitFor: { type: 'number', }, }, description: 'Options for scraping each page', }, }, required: ['url'], }, };
src/index.ts:960-973 (registration)
Registers the CRAWL_TOOL in the list of available tools returned by ListToolsRequestSchema.
server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SCRAPE_TOOL, MAP_TOOL, CRAWL_TOOL, BATCH_SCRAPE_TOOL, CHECK_BATCH_STATUS_TOOL, CHECK_CRAWL_STATUS_TOOL, SEARCH_TOOL, EXTRACT_TOOL, DEEP_RESEARCH_TOOL, GENERATE_LLMSTXT_TOOL, ], }));
src/index.ts:1183-1215 (handler)
The main handler for firecrawl_crawl tool: validates input, calls Firecrawl client.asyncCrawlUrl, handles response and returns job ID.
case 'firecrawl_crawl': { if (!isCrawlOptions(args)) { throw new Error('Invalid arguments for firecrawl_crawl'); } const { url, ...options } = args; const response = await withRetry( async () => // @ts-expect-error Extended API options including origin client.asyncCrawlUrl(url, { ...options, origin: 'mcp-server' }), 'crawl operation' ); if (!response.success) { throw new Error(response.error); } // Monitor credits for cloud API if (!FIRECRAWL_API_URL && hasCredits(response)) { await updateCreditUsage(response.creditsUsed); } return { content: [ { type: 'text', text: trimResponseText( `Started crawl for ${url} with job ID: ${response.id}` ), }, ], isError: false, }; }
src/index.ts:701-708 (helper)
Type guard function to validate arguments for the crawl tool.
function isCrawlOptions(args: unknown): args is CrawlParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof (args as { url: unknown }).url === 'string' ); }

Firecrawl MCP Server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API