Skip to main content
Glama
Jaycee1996

Firecrawl MCP Server

by Jaycee1996

firecrawl_crawl

Extract content from multiple website pages by crawling through links to gather comprehensive data from related web content.

Instructions

Starts a crawl job on a website and extracts content from all pages.

Best for: Extracting content from multiple related pages, when you need comprehensive coverage. Not recommended for: Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). Warning: Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. Common mistakes: Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. Prompt Example: "Get all blog posts from the first two levels of example.com/blog." Usage Example:

{ "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDiscoveryDepth": 5, "limit": 20, "allowExternalLinks": false, "deduplicateSimilarURLs": true, "sitemap": "include" } }

Returns: Operation ID for status checking; use firecrawl_check_crawl_status to check progress.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
promptNo
excludePathsNo
includePathsNo
maxDiscoveryDepthNo
sitemapNo
limitNo
allowExternalLinksNo
allowSubdomainsNo
crawlEntireDomainNo
delayNo
maxConcurrencyNo
webhookNo
deduplicateSimilarURLsNo
ignoreQueryParametersNo
scrapeOptionsNo

Implementation Reference

  • Handler function that executes the crawl using the Firecrawl client.crawl() method, cleaning arguments and logging the operation.
    execute: async (args, { session, log }) => { const { url, ...options } = args as Record<string, unknown>; const client = getClient(session); const cleaned = removeEmptyTopLevel(options as Record<string, unknown>); log.info('Starting crawl', { url: String(url) }); const res = await client.crawl(String(url), { ...(cleaned as any), origin: ORIGIN, }); return asText(res); },
  • Zod schema defining the input parameters for the firecrawl_crawl tool, including options for crawling behavior and scrape configurations.
    parameters: z.object({ url: z.string(), prompt: z.string().optional(), excludePaths: z.array(z.string()).optional(), includePaths: z.array(z.string()).optional(), maxDiscoveryDepth: z.number().optional(), sitemap: z.enum(['skip', 'include', 'only']).optional(), limit: z.number().optional(), allowExternalLinks: z.boolean().optional(), allowSubdomains: z.boolean().optional(), crawlEntireDomain: z.boolean().optional(), delay: z.number().optional(), maxConcurrency: z.number().optional(), ...(SAFE_MODE ? {} : { webhook: z .union([ z.string(), z.object({ url: z.string(), headers: z.record(z.string(), z.string()).optional(), }), ]) .optional(), }), deduplicateSimilarURLs: z.boolean().optional(), ignoreQueryParameters: z.boolean().optional(), scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(), }),
  • src/index.ts:449-521 (registration)
    Registration of the firecrawl_crawl tool with FastMCP server.addTool, including name, description, parameters schema, and execute handler.
    server.addTool({ name: 'firecrawl_crawl', description: ` Starts a crawl job on a website and extracts content from all pages. **Best for:** Extracting content from multiple related pages, when you need comprehensive coverage. **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. **Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. **Prompt Example:** "Get all blog posts from the first two levels of example.com/blog." **Usage Example:** \`\`\`json { "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDiscoveryDepth": 5, "limit": 20, "allowExternalLinks": false, "deduplicateSimilarURLs": true, "sitemap": "include" } } \`\`\` **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress. ${ SAFE_MODE ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.' : '' } `, parameters: z.object({ url: z.string(), prompt: z.string().optional(), excludePaths: z.array(z.string()).optional(), includePaths: z.array(z.string()).optional(), maxDiscoveryDepth: z.number().optional(), sitemap: z.enum(['skip', 'include', 'only']).optional(), limit: z.number().optional(), allowExternalLinks: z.boolean().optional(), allowSubdomains: z.boolean().optional(), crawlEntireDomain: z.boolean().optional(), delay: z.number().optional(), maxConcurrency: z.number().optional(), ...(SAFE_MODE ? {} : { webhook: z .union([ z.string(), z.object({ url: z.string(), headers: z.record(z.string(), z.string()).optional(), }), ]) .optional(), }), deduplicateSimilarURLs: z.boolean().optional(), ignoreQueryParameters: z.boolean().optional(), scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(), }), execute: async (args, { session, log }) => { const { url, ...options } = args as Record<string, unknown>; const client = getClient(session); const cleaned = removeEmptyTopLevel(options as Record<string, unknown>); log.info('Starting crawl', { url: String(url) }); const res = await client.crawl(String(url), { ...(cleaned as any), origin: ORIGIN, }); return asText(res); }, });
  • Shared scrapeParamsSchema referenced in the scrapeOptions of firecrawl_crawl parameters, defining advanced scraping options.
    const scrapeParamsSchema = z.object({ url: z.string().url(), formats: z .array( z.union([ z.enum([ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'summary', 'changeTracking', 'branding', ]), z.object({ type: z.literal('json'), prompt: z.string().optional(), schema: z.record(z.string(), z.any()).optional(), }), z.object({ type: z.literal('screenshot'), fullPage: z.boolean().optional(), quality: z.number().optional(), viewport: z .object({ width: z.number(), height: z.number() }) .optional(), }), ]) ) .optional(), parsers: z .array( z.union([ z.enum(['pdf']), z.object({ type: z.enum(['pdf']), maxPages: z.number().int().min(1).max(10000).optional(), }), ]) ) .optional(), onlyMainContent: z.boolean().optional(), includeTags: z.array(z.string()).optional(), excludeTags: z.array(z.string()).optional(), waitFor: z.number().optional(), ...(SAFE_MODE ? {} : { actions: z .array( z.object({ type: z.enum(allowedActionTypes), selector: z.string().optional(), milliseconds: z.number().optional(), text: z.string().optional(), key: z.string().optional(), direction: z.enum(['up', 'down']).optional(), script: z.string().optional(), fullPage: z.boolean().optional(), }) ) .optional(), }), mobile: z.boolean().optional(), skipTlsVerification: z.boolean().optional(), removeBase64Images: z.boolean().optional(), location: z .object({ country: z.string().optional(), languages: z.array(z.string()).optional(), }) .optional(), storeInCache: z.boolean().optional(), maxAge: z.number().optional(), });

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jaycee1996/firecrawl-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server