crawl_url
Extract web content from URLs with JavaScript support for SPAs, pagination control, media extraction, and markdown generation.
Instructions
Extract web page content with JavaScript support. Use wait_for_js=true for SPAs. Use content_offset/content_limit for pagination.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to crawl | |
| css_selector | No | CSS selector for extraction | |
| extract_media | No | Extract images/videos | |
| take_screenshot | No | Take screenshot | |
| generate_markdown | No | Generate markdown | |
| include_cleaned_html | No | Include cleaned HTML | |
| wait_for_selector | No | Wait for element to load | |
| timeout | No | Timeout in seconds | |
| wait_for_js | No | Wait for JavaScript | |
| auto_summarize | No | Auto-summarize large content | |
| use_undetected_browser | No | Bypass bot detection | |
| content_limit | No | Max characters to return (0=unlimited) | |
| content_offset | No | Start position for content (0-indexed) |