fetch_webpage
Extract text content from web pages using customizable parameters such as URL, resource blocking, and character limits for efficient data retrieval.
Instructions
Retrieve text content from a web page
Input Schema
Name | Required | Description | Default |
---|---|---|---|
blockResources | No | Whether to block images, stylesheets, and fonts to improve performance (default: true) | |
headers | No | Custom headers to include in the request | |
maxLength | No | Maximum number of characters to return for content extraction (default: 2000 if not provided) | |
password | No | Password for basic authentication | |
resourceTypesToBlock | No | List of resource types to block (e.g., "image", "stylesheet", "font") | |
startIndex | No | Start character index for content extraction (default: 0) | |
timeout | No | Navigation timeout in milliseconds (default: 60000) | |
url | Yes | The URL of the webpage to fetch | |
username | No | Username for basic authentication |
Input Schema (JSON Schema)
{
"properties": {
"blockResources": {
"description": "Whether to block images, stylesheets, and fonts to improve performance (default: true)",
"type": "boolean"
},
"headers": {
"description": "Custom headers to include in the request",
"type": "object"
},
"maxLength": {
"description": "Maximum number of characters to return for content extraction (default: 2000 if not provided)",
"type": "number"
},
"password": {
"description": "Password for basic authentication",
"type": "string"
},
"resourceTypesToBlock": {
"description": "List of resource types to block (e.g., \"image\", \"stylesheet\", \"font\")",
"items": {
"type": "string"
},
"type": "array"
},
"startIndex": {
"description": "Start character index for content extraction (default: 0)",
"type": "number"
},
"timeout": {
"description": "Navigation timeout in milliseconds (default: 60000)",
"type": "number"
},
"url": {
"description": "The URL of the webpage to fetch",
"type": "string"
},
"username": {
"description": "Username for basic authentication",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}