enhanced_process_large_content
Process large web content by chunking and filtering with BM25 to extract key information. Save chunks and summaries to disk as JSON for reuse.
Instructions
Process large content with chunking and BM25 filtering. Use output_path to persist chunks + summaries to disk as JSON and receive a slim response.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to process | |
| chunking_strategy | No | 'topic'|'sentence'|'overlap'|'regex' | sentence |
| filtering_strategy | No | 'bm25'|'pruning'|'llm' | bm25 |
| filter_query | No | Keywords for BM25 filtering | |
| max_chunk_tokens | No | Max tokens per chunk | |
| chunk_overlap | No | Overlap tokens | |
| extract_top_chunks | No | Top chunks to extract | |
| similarity_threshold | No | Min similarity 0-1 | |
| summarize_chunks | No | Summarize chunks | |
| merge_strategy | No | 'hierarchical'|'linear' | linear |
| final_summary_length | No | 'short'|'medium'|'long' | short |
| output_path | No | Absolute file path (auto .json extension) to persist the full chunks + summaries as JSON. When set, the response is slimmed to metadata+file path (chunks, chunk_summaries, merged_summary, final_summary removed). | |
| include_content_in_response | No | When True (with output_path set), also include chunks/summaries in the response. Defaults to False. | |
| overwrite | No | Overwrite an existing output file at output_path. Defaults to False. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||