intelligent_extract
Extract specific data from web pages using language models. Filter, chunk, and save extracted content to JSON files.
Instructions
Extract specific data from web pages using LLM. Use output_path to persist the full extraction output to disk as JSON and receive a slim response.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Target URL | |
| extraction_goal | Yes | Data to extract | |
| content_filter | No | 'bm25'|'pruning'|'llm' | bm25 |
| filter_query | No | BM25 filter keywords | |
| chunk_content | No | Split content | |
| use_llm | No | Enable LLM | |
| llm_provider | No | LLM provider | |
| llm_model | No | LLM model | |
| custom_instructions | No | LLM instructions | |
| output_path | No | Absolute file path (auto .json extension) to persist the full extracted data + content as JSON. When set, the response is slimmed to metadata+file path (extracted_data.raw_content, content, markdown, table_data removed). | |
| include_content_in_response | No | When True (with output_path set), also keep extracted_data/content in the response. Defaults to False. | |
| overwrite | No | Overwrite an existing output file at output_path. Defaults to False. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||