query_webpage_data
Extract structured data from webpages using natural language descriptions or AgentQL queries. Accepts live URLs or raw HTML as input.
Instructions
Extracts structured data from a webpage using an AgentQL query or a natural language prompt. Accepts either a live URL or raw HTML as the data source.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| content-type | Yes | MIME type of the request body, required to correctly parse the payload. | |
| query | No | An AgentQL (AQL) query string specifying the exact data fields to extract from the page. If omitted, a query will be auto-generated from the prompt. | |
| prompt | No | A natural language description of the data to extract, used to auto-generate an AgentQL query when no explicit query is provided. | |
| url | No | The fully qualified URL of the webpage to load and query. Either url or html must be provided as the data source. | |
| html | No | Raw HTML content of the webpage to query, used as an alternative to providing a live URL. | |
| mode | No | Controls the response generation strategy: 'fast' prioritizes speed, 'standard' prioritizes accuracy and completeness. | |
| wait_for | No | Number of seconds to wait for dynamic page content to load before capturing the snapshot. Maximum allowed wait time is 10 seconds. | |
| is_scroll_to_bottom_enabled | No | When enabled, the browser scrolls to the bottom of the page before capturing the snapshot, useful for triggering lazy-loaded content. | |
| is_screenshot_enabled | No | When enabled, a screenshot of the page is captured during the query session, which may be useful for debugging or visual verification. | |
| browser_profile | No | Determines the browser profile used for the session: 'light' uses a fast headless browser, 'stealth' applies anti-detection techniques for bot-protected pages. | |
| proxy | No | Optional proxy configuration to route the browser session through a specific proxy server, useful for geo-restricted or access-controlled pages. |