firecrawl_crawl
Extract content from multiple related web pages by crawling a website. Use this tool to comprehensively gather information from all pages within specified depth and URL limits.
Instructions
Starts an asynchronous crawl job on a website and extracts content from all pages.
Best for: Extracting content from multiple related pages, when you need comprehensive coverage. Not recommended for: Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow). Warning: Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. Common mistakes: Setting limit or maxDepth too high (causes token overflow); using crawl for a single page (use scrape instead). Prompt Example: "Get all blog posts from the first two levels of example.com/blog." Usage Example:
Returns: Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
allowBackwardLinks | No | Allow crawling links that point to parent directories | |
allowExternalLinks | No | Allow crawling links to external domains | |
deduplicateSimilarURLs | No | Remove similar URLs during crawl | |
excludePaths | No | URL paths to exclude from crawling | |
ignoreQueryParameters | No | Ignore query parameters when comparing URLs | |
ignoreSitemap | No | Skip sitemap.xml discovery | |
includePaths | No | Only crawl these URL paths | |
limit | No | Maximum number of pages to crawl | |
maxDepth | No | Maximum link depth to crawl | |
scrapeOptions | No | Options for scraping each page | |
url | Yes | Starting URL for the crawl | |
webhook | No |