fetch_page
Extract full text content from any web page URL with customizable output formats and options for metadata, tables, comments, and images.
Instructions
Extracts the full text content from a web page URL. Use this to read the details of a specific result found via web_search.
Args: url: The URL to fetch and extract content from output_format: Format for extracted content ('csv', 'html', 'json', 'markdown', 'python', 'txt', 'xml', 'xmltei') include_metadata: Whether to include document metadata (title, author, date, etc.) include_tables: Whether to include table content in extraction include_comments: Whether to include comment content in extraction include_images: Whether to include image descriptions in extraction deduplicate: Whether to remove duplicated content max_length: Maximum length of content to return (default 15000) timeout: Request timeout in seconds (default 30) backend: HTTP backend to use ('httpx' for lightweight, 'curl' to bypass bot detection, 'auto' to try httpx first then fallback to curl)
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| output_format | No | txt | |
| include_metadata | No | ||
| include_tables | No | ||
| include_comments | No | ||
| include_images | No | ||
| deduplicate | No | ||
| max_length | No | ||
| timeout | No | ||
| backend | No | auto |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |