intelligent_extract
Extract specific data from web pages using AI. Define what to extract and optionally save full output as JSON.
Instructions
Extract specific data from web pages using LLM. Use output_path to persist the full extraction output to disk as JSON and receive a slim response.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Target URL | |
| extraction_goal | Yes | Data to extract | |
| content_filter | No | 'bm25'|'pruning'|'llm' | bm25 |
| filter_query | No | BM25 filter keywords | |
| chunk_content | No | Split content | |
| use_llm | No | Enable LLM | |
| llm_provider | No | LLM provider | |
| llm_model | No | LLM model | |
| custom_instructions | No | LLM instructions | |
| output_path | No | Absolute file path (auto .json extension) to persist the full extracted data + content as JSON. When set, the response is slimmed to metadata+file path (extracted_data.raw_content, content, markdown, table_data removed). | |
| include_content_in_response | No | When True (with output_path set), also keep extracted_data/content in the response. Defaults to False. | |
| overwrite | No | Overwrite an existing output file at output_path. Defaults to False. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||