knowledge_ingest
Import documents, extract text, chunk into digest packs, and write structured wiki summaries with provenance links.
Instructions
Knowledge ingestion pipeline. Select mode to control behavior:
batch: Batch import, extract, chunk, and pack source documents into digest packs. Scans a directory (or single file), imports to raw/, extracts text with structural provenance (per-page PDF, per-sheet XLSX, per-slide PPTX), chunks into fixed-line segments, then packs into markdown digest packs under raw/digest-packs/{topic}/. Files already in raw/ are skipped.digest_write: Write LLM-generated digest summaries to wiki with structured provenance. Creates or updates one or more wiki pages from digested content, linking back to source raw files and digest packs. Index is rebuilt once at the end. Each page gets auto-classified, auto-routed, and timestamped.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| mode | Yes | Ingestion mode: batch (import and pack source documents) or digest_write (write LLM summaries to wiki) | |
| source_path | No | [batch] Absolute path to a directory or single file to ingest | |
| pattern | No | [batch] Glob filter when source_path is a directory (e.g. '*.pdf', '*.{xlsx,docx}') | |
| maxFiles | No | [batch] Maximum files to process (default: 100, max: 1000) | |
| topic | No | [batch] Topic name for organizing digest packs (default: 'general') | |
| chunkLines | No | [batch] Maximum lines per chunk (default: 100) | |
| packLines | No | [batch] Maximum lines per digest pack (default: 500) | |
| continueOnError | No | [batch] Continue processing on individual file errors (default: true) | |
| pages | No | [digest_write] Array of wiki pages to write |