fetch_urls
Retrieve and process web page content from multiple URLs using customizable settings like timeout, content extraction, HTML return, and media disablement. Powered by Playwright headless browser for efficient data fetching.
Instructions
Retrieve web page content from multiple specified URLs
Input Schema
Name | Required | Description | Default |
---|---|---|---|
debug | No | Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified | |
disableMedia | No | Whether to disable media resources (images, stylesheets, fonts, media), default is true | |
extractContent | No | Whether to intelligently extract the main content, default is true | |
maxLength | No | Maximum length of returned content (in characters), default is no limit | |
navigationTimeout | No | Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds) | |
returnHtml | No | Whether to return HTML content instead of Markdown, default is false | |
timeout | No | Page loading timeout in milliseconds, default is 30000 (30 seconds) | |
urls | Yes | Array of URLs to fetch | |
waitForNavigation | No | Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false | |
waitUntil | No | Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load' |