crawl_site
Crawl website pages to analyze SEO elements, detect duplicates, identify issues, and classify page types for comprehensive site audits.
Instructions
Crawl multiple pages of a website starting from a URL. Discovers internal links and analyzes each page.
Returns:
Aggregated statistics (pages with titles, meta descriptions, schema, etc.)
Page type classification (job detail, category landing, location pages, etc.)
Duplicate detection (titles, descriptions)
Critical issues and warnings
All individual page analyses
Use this for comprehensive site audits. Respects crawl limits and delays.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| startUrl | Yes | The URL to start crawling from | |
| maxPages | No | Maximum pages to crawl (default: 50) | |
| maxDepth | No | Maximum link depth to follow (default: 5) | |
| includePatterns | No | Regex patterns - only crawl URLs matching these patterns | |
| excludePatterns | No | Regex patterns - skip URLs matching these patterns |