check_robots_txt
Analyze a website's robots.txt file to determine crawl permissions and ensure compliance with ethical web scraping practices. Provides insights into allowed and disallowed paths for crawling.
Instructions
Check the robots.txt file for a domain to understand crawling permissions.
This tool helps ensure ethical scraping by checking the robots.txt file of a website to see what crawling rules are in place.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
url | Yes |
Input Schema (JSON Schema)
{
"properties": {
"url": {
"title": "Url",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}