crawl
Recursively crawl websites to extract content, specifying URL, depth, and output format for efficient data collection and processing.
Instructions
Recursively crawl websites and extract content.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
depth | No | Maximum depth | |
format | No | Output format | markdown |
limit | No | Maximum number of pages to crawl | |
url | Yes | Starting URL |
Input Schema (JSON Schema)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
"depth": {
"default": 2,
"description": "Maximum depth",
"type": "number"
},
"format": {
"default": "markdown",
"description": "Output format",
"enum": [
"markdown",
"text",
"raw"
],
"type": "string"
},
"limit": {
"default": 5,
"description": "Maximum number of pages to crawl",
"type": "number"
},
"url": {
"description": "Starting URL",
"format": "uri",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}