Skip to main content
Glama
JayceeTran1995

Firecrawl MCP Server

firecrawl_crawl

Extract content from multiple pages on a website by starting a crawl job. Use to comprehensively gather data from related pages with configurable depth and limits.

Instructions

Starts a crawl job on a website and extracts content from all pages.

Best for: Extracting content from multiple related pages, when you need comprehensive coverage. Not recommended for: Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). Warning: Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. Common mistakes: Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. Prompt Example: "Get all blog posts from the first two levels of example.com/blog." Usage Example:

{
  "name": "firecrawl_crawl",
  "arguments": {
    "url": "https://example.com/blog/*",
    "maxDiscoveryDepth": 5,
    "limit": 20,
    "allowExternalLinks": false,
    "deduplicateSimilarURLs": true,
    "sitemap": "include"
  }
}

Returns: Operation ID for status checking; use firecrawl_check_crawl_status to check progress.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
promptNo
excludePathsNo
includePathsNo
maxDiscoveryDepthNo
sitemapNo
limitNo
allowExternalLinksNo
allowSubdomainsNo
crawlEntireDomainNo
delayNo
maxConcurrencyNo
webhookNo
deduplicateSimilarURLsNo
ignoreQueryParametersNo
scrapeOptionsNo

Implementation Reference

  • The handler function that executes the firecrawl_crawl tool. It extracts the URL and options from arguments, cleans the options, creates a Firecrawl client, logs the action, calls client.crawl(), and returns the result as text.
    execute: async (args, { session, log }) => {
      const { url, ...options } = args as Record<string, unknown>;
      const client = getClient(session);
      const cleaned = removeEmptyTopLevel(options as Record<string, unknown>);
      log.info('Starting crawl', { url: String(url) });
      const res = await client.crawl(String(url), {
        ...(cleaned as any),
        origin: ORIGIN,
      });
      return asText(res);
    },
  • Zod schema defining the input parameters for the firecrawl_crawl tool, including url, optional prompts, paths, limits, and scrape options.
    parameters: z.object({
      url: z.string(),
      prompt: z.string().optional(),
      excludePaths: z.array(z.string()).optional(),
      includePaths: z.array(z.string()).optional(),
      maxDiscoveryDepth: z.number().optional(),
      sitemap: z.enum(['skip', 'include', 'only']).optional(),
      limit: z.number().optional(),
      allowExternalLinks: z.boolean().optional(),
      allowSubdomains: z.boolean().optional(),
      crawlEntireDomain: z.boolean().optional(),
      delay: z.number().optional(),
      maxConcurrency: z.number().optional(),
      ...(SAFE_MODE
        ? {}
        : {
            webhook: z
              .union([
                z.string(),
                z.object({
                  url: z.string(),
                  headers: z.record(z.string(), z.string()).optional(),
                }),
              ])
              .optional(),
          }),
      deduplicateSimilarURLs: z.boolean().optional(),
      ignoreQueryParameters: z.boolean().optional(),
      scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(),
    }),
  • src/index.ts:449-521 (registration)
    Registration of the firecrawl_crawl tool using server.addTool, including name, description, parameters schema, and execute handler.
    server.addTool({
      name: 'firecrawl_crawl',
      description: `
     Starts a crawl job on a website and extracts content from all pages.
     
     **Best for:** Extracting content from multiple related pages, when you need comprehensive coverage.
     **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow).
     **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
     **Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended.
     **Prompt Example:** "Get all blog posts from the first two levels of example.com/blog."
     **Usage Example:**
     \`\`\`json
     {
       "name": "firecrawl_crawl",
       "arguments": {
         "url": "https://example.com/blog/*",
         "maxDiscoveryDepth": 5,
         "limit": 20,
         "allowExternalLinks": false,
         "deduplicateSimilarURLs": true,
         "sitemap": "include"
       }
     }
     \`\`\`
     **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
     ${
       SAFE_MODE
         ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'
         : ''
     }
     `,
      parameters: z.object({
        url: z.string(),
        prompt: z.string().optional(),
        excludePaths: z.array(z.string()).optional(),
        includePaths: z.array(z.string()).optional(),
        maxDiscoveryDepth: z.number().optional(),
        sitemap: z.enum(['skip', 'include', 'only']).optional(),
        limit: z.number().optional(),
        allowExternalLinks: z.boolean().optional(),
        allowSubdomains: z.boolean().optional(),
        crawlEntireDomain: z.boolean().optional(),
        delay: z.number().optional(),
        maxConcurrency: z.number().optional(),
        ...(SAFE_MODE
          ? {}
          : {
              webhook: z
                .union([
                  z.string(),
                  z.object({
                    url: z.string(),
                    headers: z.record(z.string(), z.string()).optional(),
                  }),
                ])
                .optional(),
            }),
        deduplicateSimilarURLs: z.boolean().optional(),
        ignoreQueryParameters: z.boolean().optional(),
        scrapeOptions: scrapeParamsSchema.omit({ url: true }).partial().optional(),
      }),
      execute: async (args, { session, log }) => {
        const { url, ...options } = args as Record<string, unknown>;
        const client = getClient(session);
        const cleaned = removeEmptyTopLevel(options as Record<string, unknown>);
        log.info('Starting crawl', { url: String(url) });
        const res = await client.crawl(String(url), {
          ...(cleaned as any),
          origin: ORIGIN,
        });
        return asText(res);
      },
    });

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JayceeTran1995/firecrawl-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server