webextrator_extract
Extract structured data such as product details, articles, or general page content from any URL. Configure page type, wait conditions, and blocking resources to optimize extraction.
Instructions
Extract structured content from a web page using the WebExtrator API.
Navigates to the specified URL, renders the page, and extracts structured data
such as product details, article content, or general page information.
Use this when:
- You need to extract structured data from a web page
- You want product details, article content, or general page data
- You need LLM-enhanced semantic normalization of extracted content
Returns:
JSON response containing the extracted structured content.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL of the web page to extract content from. Required. | |
| delay | No | Extra delay in seconds after page load before extracting. | |
| headers | No | Extra HTTP headers to include with the page request. | |
| timeout | No | Total timeout in seconds for page load. Default is 30. | |
| enable_llm | No | Enable LLM-based semantic normalization for richer structured output. Default is false. | |
| user_agent | No | Override the User-Agent header for the page request. | |
| wait_until | No | Page load wait condition before extracting. Options: 'load', 'domcontentloaded', 'networkidle', 'commit'. Default is 'networkidle'. | |
| callback_url | No | Callback URL for async processing. If provided, the task runs asynchronously and results are sent to this URL when complete. | |
| expected_type | No | Hint about expected page type. Options: 'product', 'article', 'general'. Helps the extractor optimize for the content structure. | |
| block_resources | No | Resource types to block during page load to speed up rendering. Options: 'image', 'font', 'media', 'stylesheet', 'xhr', 'fetch'. | |
| wait_for_selector | No | CSS selector to wait for before extracting content. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |