extract_structured_data
Extract structured data from web pages using CSS selectors, LLM, or table extraction. Optionally save the full extraction to disk as JSON.
Instructions
Extract structured data using CSS selectors or LLM. Use output_path to persist the full extraction (including table_data) to disk as JSON and receive a slim response.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Target URL | |
| extraction_type | No | 'css'|'llm'|'table' | css |
| css_selectors | No | CSS selector mapping | |
| extraction_schema | No | Schema definition | |
| generate_markdown | No | Generate markdown | |
| wait_for_js | No | Wait for JavaScript | |
| timeout | No | Timeout in seconds | |
| use_llm_table_extraction | No | Use LLM table extraction | |
| table_chunking_strategy | No | 'intelligent'|'fixed'|'semantic' | intelligent |
| output_path | No | Absolute file path (auto .json extension) to persist the full extracted_data + table_data as JSON. When set, the response is slimmed (content, markdown, table_data, extracted_data.raw_content removed). | |
| include_content_in_response | No | When True (with output_path set), also keep extracted_data/table_data/content in the response. Defaults to False. | |
| overwrite | No | Overwrite an existing output file at output_path. Defaults to False. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||