extract_page_data
Extract structured data like pricing, product info, or tables from any webpage using a provided schema. Prioritizes static DOM extraction, with fallback to browser and vision for complex content.
Instructions
Extract structured data from a webpage matching a provided schema. DOM-first: uses static extraction first, escalates to headless browser then MiniMax vision only if needed. Use for pricing tiers, product info, contact details, speaker lists, tables, API docs, or any structured content.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to extract data from. | |
| schema | Yes | JSON schema describing the data to extract. Example: {"plans": [{"name": "string", "price": "string", "features": ["string"]}]} | |
| use_vision_if_needed | No | If true, falls back to MiniMax vision when data cannot be extracted from the DOM (e.g. data in images, canvas, charts). | |
| viewport | No | ||
| wait_until | No | load |