firecrawl_crawl
Crawl a website and extract content from multiple related pages. Returns final crawl status and data after polling.
Instructions
Starts a crawl job on a website, polls until it reaches a terminal state, and returns the final crawl status/data.
Best for: Extracting content from multiple related pages, when you need comprehensive coverage. Not recommended for: Extracting content from a single page (use scrape); when token limits are a concern (use map + scrape for tighter control); when you need fast results (crawling can be slow). Warning: Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + scrape for tighter control. Common mistakes: Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended. Prompt Example: "Get all blog posts from the first two levels of example.com/blog." Usage Example:
{
"name": "firecrawl_crawl",
"arguments": {
"url": "https://example.com/blog/*",
"maxDiscoveryDepth": 5,
"limit": 20,
"allowExternalLinks": false,
"deduplicateSimilarURLs": true,
"sitemap": "include"
}
}Returns: Final crawl status and data after internal polling, including the crawl id. Use firecrawl_check_crawl_status only when you need to re-check an existing crawl ID later.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| delay | No | ||
| limit | No | ||
| prompt | No | ||
| sitemap | No | ||
| webhook | No | ||
| excludePaths | No | ||
| includePaths | No | ||
| scrapeOptions | No | ||
| maxConcurrency | No | ||
| webhookHeaders | No | ||
| allowSubdomains | No | ||
| crawlEntireDomain | No | ||
| maxDiscoveryDepth | No | ||
| allowExternalLinks | No | ||
| ignoreQueryParameters | No | ||
| deduplicateSimilarURLs | No |