extract_structured_data
Extract structured JSON data from any public webpage by defining the fields you need. Specify a schema and URL to get validated, typed JSON output.
Instructions
Extract structured JSON from any public webpage using Extrapify's schema-guided extraction engine. Define the fields you want (title, price, author, tags, etc.) and their types, point the tool at a URL, and get back validated, typed JSON. Handles JavaScript-heavy pages via Browserless rendering. Ideal for scraping product pages, articles, job listings, company data, search results, and any other structured web content. Returns extracted fields, confidence score, item count, and tokens used.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Fully qualified public webpage URL to extract structured data from (e.g. https://example.com/article). Must be publicly accessible. Does not support login-protected or paywalled pages. | |
| schema | Yes | Schema definition that controls what fields to extract. Each key is the field name and each value is the field type. Supported types: "string", "number", "integer", "float", "boolean", "date", "datetime", "url", and array variants using [] suffix (e.g. "string[]"). Example: { "title": "string", "price": "number", "tags": "string[]", "published_at": "date" }. Nested objects are supported for grouped fields. | |
| mode | No | Extraction mode controlling how many items are returned. "auto" detects automatically based on page structure (recommended). "single" forces extraction of one primary item only (use for product pages, articles, profiles). "list" extracts all matching items as an array (use for search results, directories, tables). Default: "auto". | auto |