parse_sitemap
Extract URLs from XML sitemaps to discover all site pages, plan crawl strategies, or verify sitemap validity. Supports regex filtering for targeted URL extraction.
Instructions
[STATELESS] Extract URLs from XML sitemaps. Use when: discovering all site pages, planning crawl strategies, or checking sitemap validity. Supports regex filtering. Try sitemap.xml or robots.txt first. Creates new browser each time.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
filter_pattern | No | Optional regex pattern to filter URLs | |
url | Yes | URL of the sitemap (e.g., https://example.com/sitemap.xml) |
Input Schema (JSON Schema)
{
"properties": {
"filter_pattern": {
"description": "Optional regex pattern to filter URLs",
"type": "string"
},
"url": {
"description": "URL of the sitemap (e.g., https://example.com/sitemap.xml)",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}