Skip to main content
Glama
Jaycee1996

Firecrawl MCP Server

by Jaycee1996

firecrawl_crawl

Extract content from multiple pages on a website by starting a crawl job. Use to comprehensively gather data from related pages with configurable depth and limits.

Instructions

Starts a crawl job on a website and extracts content from all pages.

Best for: Extracting content from multiple related pages, when you need comprehensive coverage. Not recommended for: Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). Warning: Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. Common mistakes: Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. Prompt Example: "Get all blog posts from the first two levels of example.com/blog." Usage Example:

{ "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDiscoveryDepth": 5, "limit": 20, "allowExternalLinks": false, "deduplicateSimilarURLs": true, "sitemap": "include" } }

Returns: Operation ID for status checking; use firecrawl_check_crawl_status to check progress.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
promptNo
excludePathsNo
includePathsNo
maxDiscoveryDepthNo
sitemapNo
limitNo
allowExternalLinksNo
allowSubdomainsNo
crawlEntireDomainNo
delayNo
maxConcurrencyNo
webhookNo
deduplicateSimilarURLsNo
ignoreQueryParametersNo
scrapeOptionsNo

Implementation Reference

  • The handler function that executes the firecrawl_crawl tool. It extracts the URL and options from arguments, cleans the options, creates a Firecrawl client, logs the action, calls client.crawl(), and returns the result as text.
    execute: async (args, { session, log }) => { const { url, ...options } = args as Record<string, unknown>; const client = getClient(session); const cleaned = removeEmptyTopLevel(options as Record<string, unknown>); log.info('Starting crawl', { url: String(url) }); const res = await client.crawl(String(url), { ...(cleaned as any), origin: ORIGIN, }); return asText(res); },
  • Zod schema defining the input parameters for the firecrawl_crawl tool, including url, optional prompts, paths, limits, and scrape options.
    parameters: z.object({ url: z.string(), prompt: z.string().optional(), excludePaths: z.array(z.string()).optional(), includePaths: z.array(z.string()).optional(), maxDiscoveryDepth: z.number().optional(), sitemap: z.enum(['skip', 'include', 'only']).optional(), limit: z.number().optional(), allowExternalLinks: z.boolean().optional(), allowSubdomains: z.boolean().optional(), crawlEntireDomain: z.boolean().optional(), delay: z.number().optional(), maxConcurrency: z.number().optional(), ...(SAFE_MODE ? {} : { webhook: z .union([ z.string(), z.object({ url: z.string(), headers: z.record(z.string(), z.string()).optional(), }), ]) .optional(), }), deduplicateSimilarURLs: z.boolean().optional(), ignoreQueryParameters: z.boolean().optional(), scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(), }),
  • src/index.ts:449-521 (registration)
    Registration of the firecrawl_crawl tool using server.addTool, including name, description, parameters schema, and execute handler.
    server.addTool({ name: 'firecrawl_crawl', description: ` Starts a crawl job on a website and extracts content from all pages. **Best for:** Extracting content from multiple related pages, when you need comprehensive coverage. **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. **Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. **Prompt Example:** "Get all blog posts from the first two levels of example.com/blog." **Usage Example:** \`\`\`json { "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDiscoveryDepth": 5, "limit": 20, "allowExternalLinks": false, "deduplicateSimilarURLs": true, "sitemap": "include" } } \`\`\` **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress. ${ SAFE_MODE ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.' : '' } `, parameters: z.object({ url: z.string(), prompt: z.string().optional(), excludePaths: z.array(z.string()).optional(), includePaths: z.array(z.string()).optional(), maxDiscoveryDepth: z.number().optional(), sitemap: z.enum(['skip', 'include', 'only']).optional(), limit: z.number().optional(), allowExternalLinks: z.boolean().optional(), allowSubdomains: z.boolean().optional(), crawlEntireDomain: z.boolean().optional(), delay: z.number().optional(), maxConcurrency: z.number().optional(), ...(SAFE_MODE ? {} : { webhook: z .union([ z.string(), z.object({ url: z.string(), headers: z.record(z.string(), z.string()).optional(), }), ]) .optional(), }), deduplicateSimilarURLs: z.boolean().optional(), ignoreQueryParameters: z.boolean().optional(), scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(), }), execute: async (args, { session, log }) => { const { url, ...options } = args as Record<string, unknown>; const client = getClient(session); const cleaned = removeEmptyTopLevel(options as Record<string, unknown>); log.info('Starting crawl', { url: String(url) }); const res = await client.crawl(String(url), { ...(cleaned as any), origin: ORIGIN, }); return asText(res); }, });

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jaycee1996/firecrawl-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server