validate_robots
Check if a URL can be scraped by verifying robots.txt compliance to ensure web scraping follows site rules and avoids blocked content.
Instructions
Check robots.txt compliance for specific URLs
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to check robots.txt compliance for | |
| userAgent | No | User agent to check against (default: *) | * |