extract_streaming
Stream PDF extraction events as NDJSON for large documents, receiving page results as they become available without waiting for full extraction.
Instructions
Stream extraction events for a PDF as NDJSON.
Use for large documents (100+ pages) where waiting for the full extraction is impractical. The response body is newline-delimited JSON with one object per line:
{"type":"classified","data":{"page_count":N,"page_types":[...]}}
{"type":"page","data":{"page_num":0,"text":"...","confidence":0.92,...}}
{"type":"warning","data":{"message":"..."}} (zero or more)
{"type":"complete","data":{"total_confidence":0.94,"ocr_pages":[...],...}}The first event is always classified; the last is always complete.
Each page event arrives as soon as that page is extracted, including
OCR re-extraction in standard/high quality modes.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | ||
| quality | No | standard |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |