firecrawl_crawl

Instructions

Starts an asynchronous crawl job on a website and extracts content from all pages.

Best for: Extracting content from multiple related pages, when you need comprehensive coverage. Not recommended for: Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). Warning: Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. Common mistakes: Setting limit or maxDepth too high (causes token overflow); using crawl for a single page (use scrape instead). Prompt Example: "Get all blog posts from the first two levels of example.com/blog." Usage Example:

{
  "name": "firecrawl_crawl",
  "arguments": {
    "url": "https://example.com/blog/*",
    "maxDepth": 2,
    "limit": 100,
    "allowExternalLinks": false,
    "deduplicateSimilarURLs": true
  }
}

Returns: Operation ID for status checking; use firecrawl_check_crawl_status to check progress.

Input Schema

TableJSON Schema

Name	Required	Description
`url`	Yes	Starting URL for the crawl
`excludePaths`	No	URL paths to exclude from crawling
`includePaths`	No	Only crawl these URL paths
`maxDepth`	No	Maximum link depth to crawl
`ignoreSitemap`	No	Skip sitemap.xml discovery
`limit`	No	Maximum number of pages to crawl
`allowBackwardLinks`	No	Allow crawling links that point to parent directories
`allowExternalLinks`	No	Allow crawling links to external domains
`webhook`	No
`deduplicateSimilarURLs`	No	Remove similar URLs during crawl
`ignoreQueryParameters`	No	Ignore query parameters when comparing URLs
`scrapeOptions`	No	Options for scraping each page

Implementation Reference

src/index.ts:248-378 (registration)

Tool definition/registration for firecrawl_crawl with its name, description, and input schema

const CRAWL_TOOL: Tool = {
  name: 'firecrawl_crawl',
  description: `
Starts an asynchronous crawl job on a website and extracts content from all pages.

**Best for:** Extracting content from multiple related pages, when you need comprehensive coverage.
**Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow).
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
**Common mistakes:** Setting limit or maxDepth too high (causes token overflow); using crawl for a single page (use scrape instead).
**Prompt Example:** "Get all blog posts from the first two levels of example.com/blog."
**Usage Example:**
\`\`\`json
{
  "name": "firecrawl_crawl",
  "arguments": {
    "url": "https://example.com/blog/*",
    "maxDepth": 2,
    "limit": 100,
    "allowExternalLinks": false,
    "deduplicateSimilarURLs": true
  }
}
\`\`\`
**Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
`,
  inputSchema: {
    type: 'object',
    properties: {
      url: {
        type: 'string',
        description: 'Starting URL for the crawl',
      },
      excludePaths: {
        type: 'array',
        items: { type: 'string' },
        description: 'URL paths to exclude from crawling',
      },
      includePaths: {
        type: 'array',
        items: { type: 'string' },
        description: 'Only crawl these URL paths',
      },
      maxDepth: {
        type: 'number',
        description: 'Maximum link depth to crawl',
      },
      ignoreSitemap: {
        type: 'boolean',
        description: 'Skip sitemap.xml discovery',
      },
      limit: {
        type: 'number',
        description: 'Maximum number of pages to crawl',
      },
      allowBackwardLinks: {
        type: 'boolean',
        description: 'Allow crawling links that point to parent directories',
      },
      allowExternalLinks: {
        type: 'boolean',
        description: 'Allow crawling links to external domains',
      },
      webhook: {
        oneOf: [
          {
            type: 'string',
            description: 'Webhook URL to notify when crawl is complete',
          },
          {
            type: 'object',
            properties: {
              url: {
                type: 'string',
                description: 'Webhook URL',
              },
              headers: {
                type: 'object',
                description: 'Custom headers for webhook requests',
              },
            },
            required: ['url'],
          },
        ],
      },
      deduplicateSimilarURLs: {
        type: 'boolean',
        description: 'Remove similar URLs during crawl',
      },
      ignoreQueryParameters: {
        type: 'boolean',
        description: 'Ignore query parameters when comparing URLs',
      },
      scrapeOptions: {
        type: 'object',
        properties: {
          formats: {
            type: 'array',
            items: {
              type: 'string',
              enum: [
                'markdown',
                'html',
                'rawHtml',
                'screenshot',
                'links',
                'screenshot@fullPage',
                'extract',
              ],
            },
          },
          onlyMainContent: {
            type: 'boolean',
          },
          includeTags: {
            type: 'array',
            items: { type: 'string' },
          },
          excludeTags: {
            type: 'array',
            items: { type: 'string' },
          },
          waitFor: {
            type: 'number',
          },
        },
        description: 'Options for scraping each page',
      },
    },
    required: ['url'],
  },
};

Firecrawl MCP Server

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API