extract_data
Extract structured data from web pages including tables, lists, fields, and JSON-LD. Automatically detect content or specify selectors for targeted extraction.
Instructions
Extract structured data from web pages.
Extracts tables, lists, or specific fields from HTML pages and returns
structured data. Much more efficient than parsing full page text.
Extract Types:
- "table": Extract HTML tables as list of dicts
- "list": Extract lists (ul/ol/dl) as structured list
- "fields": Extract specific elements using CSS selectors
- "json-ld": Extract JSON-LD structured data
- "auto": Automatically detect and extract structured content
Examples:
- extract_data("https://pypi.org/project/fastapi/", reasoning="Get package info")
- extract_data("https://github.com/user/repo/releases", reasoning="Get releases", extract_type="list")
- extract_data(
"https://example.com/product",
reasoning="Extract product details",
extract_type="fields",
selectors={"price": ".price", "title": "h1.product-name"}
)Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| reasoning | Yes | ||
| extract_type | No | auto | |
| selectors | No | ||
| max_items | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |