crawl-site
Scan and extract all unique URLs from a website by recursively crawling from a given URL up to a specified depth. Designed for web content analysis.
Instructions
Recursively crawls a website starting from a given URL up to a specified maximum depth. It follows links within the same origin and returns a list of all unique URLs found during the crawl.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
maxDepth | No | The maximum depth to crawl relative to the starting URL. 0 means only the starting URL is fetched. Max allowed depth is 5 to prevent excessive crawling. Defaults to 2. | |
url | Yes | The starting URL for the crawl. Must be a valid HTTP or HTTPS URL. |
Input Schema (JSON Schema)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
"maxDepth": {
"default": 2,
"description": "The maximum depth to crawl relative to the starting URL. 0 means only the starting URL is fetched. Max allowed depth is 5 to prevent excessive crawling. Defaults to 2.",
"maximum": 5,
"minimum": 0,
"type": "integer"
},
"url": {
"description": "The starting URL for the crawl. Must be a valid HTTP or HTTPS URL.",
"format": "uri",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}