multi_url_crawl
Crawl up to 5 URLs with pattern-based configuration, save each page as markdown files and an index.json to a directory.
Instructions
Multi-URL crawl with pattern-based config. Max 5 URL patterns per call. Use output_path (directory) to persist full per-URL markdown + index.json; the return shape stays a list, each success item gets an output_file key.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url_configurations | Yes | URL-config mapping (max 5 URLs). Example: {'https://site1.com': {'wait_for_js': true}} | |
| pattern_matching | No | Pattern: 'wildcard' or 'regex' (default: wildcard) | wildcard |
| default_config | No | Default config | |
| base_timeout | No | Timeout per URL (default: 30) | |
| max_concurrent | No | Max concurrent (default: 3) | |
| output_path | No | Absolute directory path to persist per-URL markdown files + index.json. Existing regular files at this path are rejected; otherwise the directory is created if missing (dot-containing names are fine). The list return shape is preserved; each successful item gains an 'output_file' key. Failed items (success=False) are NOT written as .md but still appear in index.json with file=null. | |
| include_content_in_response | No | When True (with output_path), keep full markdown/content in each list item. Defaults to False. | |
| overwrite | No | Overwrite existing per-URL files inside output_path. Defaults to False. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |