smart_crawl
Automatically detect and crawl web content by analyzing URL types and formats, handling HTML pages, sitemaps, RSS feeds, and text content with adaptive parsing strategies.
Instructions
[STATELESS] Auto-detect and handle different content types (HTML, sitemap, RSS, text). Use when: URL type is unknown, crawling feeds/sitemaps, or want automatic format handling. Adapts strategy based on content. Creates new browser each time. For persistent operations use create_session + crawl.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
bypass_cache | No | Force fresh crawl | |
follow_links | No | For sitemaps/RSS: crawl found URLs (max 10). For HTML: no effect | |
max_depth | No | Maximum crawl depth for sitemaps | |
url | Yes | The URL to crawl intelligently |
Input Schema (JSON Schema)
{
"properties": {
"bypass_cache": {
"default": false,
"description": "Force fresh crawl",
"type": "boolean"
},
"follow_links": {
"default": false,
"description": "For sitemaps/RSS: crawl found URLs (max 10). For HTML: no effect",
"type": "boolean"
},
"max_depth": {
"default": 2,
"description": "Maximum crawl depth for sitemaps",
"type": "number"
},
"url": {
"description": "The URL to crawl intelligently",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}