process_file
Convert PDF, Office documents, and ZIP files to markdown. Extract content from URLs or local paths, with options for summarization, slicing, and saving full output to disk.
Instructions
Convert PDF, Word, Excel, PowerPoint, ZIP to markdown. Use output_path to persist the full unsliced converted markdown to disk and receive a slim response.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | File URL or local path (PDF, Office, ZIP). Supports http/https URLs, file:// URIs, and absolute paths. | |
| max_size_mb | No | Max file size in MB | |
| extract_all_from_zip | No | Extract ZIP contents | |
| include_metadata | No | Include metadata | |
| auto_summarize | No | Auto-summarize large content | |
| max_content_tokens | No | Max tokens before summarization | |
| summary_length | No | 'short'|'medium'|'long' | medium |
| llm_provider | No | LLM provider | |
| llm_model | No | LLM model | |
| content_limit | No | Max characters to return (0=unlimited) | |
| content_offset | No | Start position for content (0-indexed) | |
| output_path | No | Absolute file path (auto .md extension) to persist the full unsliced converted markdown. When set, the response is slimmed to metadata+file path. content_limit/content_offset still affect the response copy but not the on-disk file. | |
| include_content_in_response | No | When True (with output_path set), keep content in the response too. Note: the response copy is still subject to content_limit/content_offset slicing; only the on-disk file holds the full unsliced payload. Defaults to False. | |
| overwrite | No | Overwrite an existing output file at output_path. Defaults to False. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||