firecrawl_crawl

Extract content from multiple website pages by crawling through links to gather comprehensive data from related web content.

Instructions

Starts a crawl job on a website and extracts content from all pages.

Best for: Extracting content from multiple related pages, when you need comprehensive coverage. Not recommended for: Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). Warning: Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. Common mistakes: Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. Prompt Example: "Get all blog posts from the first two levels of example.com/blog." Usage Example:

{ "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDiscoveryDepth": 5, "limit": 20, "allowExternalLinks": false, "deduplicateSimilarURLs": true, "sitemap": "include" } }

Returns: Operation ID for status checking; use firecrawl_check_crawl_status to check progress.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes
`prompt`	No
`excludePaths`	No
`includePaths`	No
`maxDiscoveryDepth`	No
`sitemap`	No
`limit`	No
`allowExternalLinks`	No
`allowSubdomains`	No
`crawlEntireDomain`	No
`delay`	No
`maxConcurrency`	No
`webhook`	No
`deduplicateSimilarURLs`	No
`ignoreQueryParameters`	No
`scrapeOptions`	No

Implementation Reference

src/index.ts:510-520 (handler)
Handler function that executes the crawl using the Firecrawl client.crawl() method, cleaning arguments and logging the operation.
execute: async (args, { session, log }) => { const { url, ...options } = args as Record<string, unknown>; const client = getClient(session); const cleaned = removeEmptyTopLevel(options as Record<string, unknown>); log.info('Starting crawl', { url: String(url) }); const res = await client.crawl(String(url), { ...(cleaned as any), origin: ORIGIN, }); return asText(res); },
src/index.ts:480-509 (schema)
Zod schema defining the input parameters for the firecrawl_crawl tool, including options for crawling behavior and scrape configurations.
parameters: z.object({ url: z.string(), prompt: z.string().optional(), excludePaths: z.array(z.string()).optional(), includePaths: z.array(z.string()).optional(), maxDiscoveryDepth: z.number().optional(), sitemap: z.enum(['skip', 'include', 'only']).optional(), limit: z.number().optional(), allowExternalLinks: z.boolean().optional(), allowSubdomains: z.boolean().optional(), crawlEntireDomain: z.boolean().optional(), delay: z.number().optional(), maxConcurrency: z.number().optional(), ...(SAFE_MODE ? {} : { webhook: z .union([ z.string(), z.object({ url: z.string(), headers: z.record(z.string(), z.string()).optional(), }), ]) .optional(), }), deduplicateSimilarURLs: z.boolean().optional(), ignoreQueryParameters: z.boolean().optional(), scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(), }),
src/index.ts:449-521 (registration)
Registration of the firecrawl_crawl tool with FastMCP server.addTool, including name, description, parameters schema, and execute handler.
server.addTool({ name: 'firecrawl_crawl', description: ` Starts a crawl job on a website and extracts content from all pages. **Best for:** Extracting content from multiple related pages, when you need comprehensive coverage. **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. **Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. **Prompt Example:** "Get all blog posts from the first two levels of example.com/blog." **Usage Example:** \`\`\`json { "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDiscoveryDepth": 5, "limit": 20, "allowExternalLinks": false, "deduplicateSimilarURLs": true, "sitemap": "include" } } \`\`\` **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress. ${ SAFE_MODE ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.' : '' } `, parameters: z.object({ url: z.string(), prompt: z.string().optional(), excludePaths: z.array(z.string()).optional(), includePaths: z.array(z.string()).optional(), maxDiscoveryDepth: z.number().optional(), sitemap: z.enum(['skip', 'include', 'only']).optional(), limit: z.number().optional(), allowExternalLinks: z.boolean().optional(), allowSubdomains: z.boolean().optional(), crawlEntireDomain: z.boolean().optional(), delay: z.number().optional(), maxConcurrency: z.number().optional(), ...(SAFE_MODE ? {} : { webhook: z .union([ z.string(), z.object({ url: z.string(), headers: z.record(z.string(), z.string()).optional(), }), ]) .optional(), }), deduplicateSimilarURLs: z.boolean().optional(), ignoreQueryParameters: z.boolean().optional(), scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(), }), execute: async (args, { session, log }) => { const { url, ...options } = args as Record<string, unknown>; const client = getClient(session); const cleaned = removeEmptyTopLevel(options as Record<string, unknown>); log.info('Starting crawl', { url: String(url) }); const res = await client.crawl(String(url), { ...(cleaned as any), origin: ORIGIN, }); return asText(res); }, });
src/index.ts:185-260 (helper)
Shared scrapeParamsSchema referenced in the scrapeOptions of firecrawl_crawl parameters, defining advanced scraping options.
const scrapeParamsSchema = z.object({ url: z.string().url(), formats: z .array( z.union([ z.enum([ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'summary', 'changeTracking', 'branding', ]), z.object({ type: z.literal('json'), prompt: z.string().optional(), schema: z.record(z.string(), z.any()).optional(), }), z.object({ type: z.literal('screenshot'), fullPage: z.boolean().optional(), quality: z.number().optional(), viewport: z .object({ width: z.number(), height: z.number() }) .optional(), }), ]) ) .optional(), parsers: z .array( z.union([ z.enum(['pdf']), z.object({ type: z.enum(['pdf']), maxPages: z.number().int().min(1).max(10000).optional(), }), ]) ) .optional(), onlyMainContent: z.boolean().optional(), includeTags: z.array(z.string()).optional(), excludeTags: z.array(z.string()).optional(), waitFor: z.number().optional(), ...(SAFE_MODE ? {} : { actions: z .array( z.object({ type: z.enum(allowedActionTypes), selector: z.string().optional(), milliseconds: z.number().optional(), text: z.string().optional(), key: z.string().optional(), direction: z.enum(['up', 'down']).optional(), script: z.string().optional(), fullPage: z.boolean().optional(), }) ) .optional(), }), mobile: z.boolean().optional(), skipTlsVerification: z.boolean().optional(), removeBase64Images: z.boolean().optional(), location: z .object({ country: z.string().optional(), languages: z.array(z.string()).optional(), }) .optional(), storeInCache: z.boolean().optional(), maxAge: z.number().optional(), });

Firecrawl MCP Server

firecrawl_crawl

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API