fetch_html
Fetch a website and return its HTML content. Supports chunking for large pages, browser automation for dynamic content, and intelligent extraction for main article text.
Instructions
Fetch a website and return the content as HTML. Best practices: 1) Always set startCursor=0 for initial requests, and use the fetchedBytes value from previous response for subsequent requests to ensure content continuity. 2) Set contentSizeLimit between 20000-50000 for large pages. 3) When handling large content, use the chunking system by following the startCursor instructions in the system notes rather than increasing contentSizeLimit. 4) If content retrieval fails, you can retry using the same chunkId and startCursor, or adjust startCursor as needed but you must handle any resulting data duplication or gaps yourself. 5) Always explain to users when content is chunked and ask if they want to continue retrieving subsequent parts.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the website to fetch | |
| startCursor | Yes | Starting cursor position in bytes. Set to 0 for initial requests, and use the value from previous responses for subsequent requests to resume content retrieval. | |
| headers | No | Optional headers to include in the request | |
| proxy | No | Optional proxy server to use (format: http://host:port or https://host:port) | |
| timeout | No | Optional timeout in milliseconds (default: 30000) | |
| maxRedirects | No | Optional maximum number of redirects to follow (default: 10) | |
| useSystemProxy | No | Optional flag to use system proxy environment variables (default: true) | |
| debug | No | Optional flag to enable detailed debug logging (default: false) | |
| noDelay | No | Optional flag to disable random delay between requests (default: false) | |
| useBrowser | No | Optional flag to use headless browser for fetching (default: false) | |
| waitForSelector | No | Optional CSS selector to wait for when using browser mode | |
| waitForTimeout | No | Optional timeout to wait after page load in browser mode (default: 5000) | |
| scrollToBottom | No | Optional flag to scroll to bottom of page in browser mode (default: false) | |
| closeBrowser | No | Optional flag to close the browser after fetching (default: false) | |
| saveCookies | No | Optional flag to save cookies for future requests to the same domain (default: true) | |
| autoDetectMode | No | Optional flag to automatically switch to browser mode if standard fetch fails (default: true). Set to false to strictly use the specified mode without automatic switching. | |
| contentSizeLimit | No | Optional maximum content size in bytes before splitting into chunks (default: 50KB). Set between 20KB-50KB for optimal results. For large content, prefer smaller values (20KB-30KB) to avoid truncation. | |
| enableContentSplitting | No | Optional flag to enable content splitting for large responses (default: true) | |
| chunkId | No | Optional chunk ID for retrieving a specific chunk of content from a previous request. The system adds prompts in the format === SYSTEM NOTE === ... =================== which AI models should ignore when processing the content. | |
| extractContent | No | Optional flag to enable intelligent content extraction using Readability algorithm (default: false). Extracts main article content from web pages. | |
| includeMetadata | No | Optional flag to include metadata (title, author, etc.) in the extracted content (default: false). Only works when extractContent is true. | |
| fallbackToOriginal | No | Optional flag to fall back to the original content when extraction fails (default: true). Only works when extractContent is true. |