alterlab_map
Discover all URLs on a website by parsing sitemaps and extracting links. Use patterns to scope discovery and search to rank by relevance.
Instructions
Discover all URLs on a website via sitemap parsing and link extraction. No JS rendering, no content scraping — pure lightweight URL discovery. Costs $0.001 per call regardless of how many URLs are found. Returns a flat list of URLs with source (sitemap/link) and depth. Use include_patterns/exclude_patterns to scope discovery to specific sections. Use search to rank URLs by relevance to a query. Use include_metadata=true to also fetch page titles and descriptions.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Starting URL for site discovery | |
| max_pages | No | Maximum URLs to discover | |
| max_depth | No | Link-following depth (0 = start page + sitemap only) | |
| include_patterns | No | Glob patterns — only include URLs whose path matches at least one (e.g., ['/docs/*']) | |
| exclude_patterns | No | Glob patterns — exclude URLs whose path matches any (e.g., ['/tag/*', '/page/*']) | |
| search | No | Query to filter and rank discovered URLs by relevance (returns relevance_score per URL) | |
| sitemap | No | Sitemap handling: include (parse sitemaps + follow links), skip (links only), only (sitemap URLs only) | include |
| include_metadata | No | Fetch title and meta description for each URL via lightweight GET (adds latency) | |
| include_subdomains | No | Include URLs from subdomains of the target domain | |
| respect_robots | No | Respect robots.txt directives |