analyze_document
Extract text, faces, barcodes, and rectangles from images and PDFs in reading order for document reconstruction. Works fully offline with no API keys.
Instructions
Run a full analysis pipeline on a local image or PDF and return structured JSON for document reconstruction: OCR (with line/paragraph grouping in reading order), face detection, barcode/QR detection, and rectangle detection — all in parallel, fully offline, no API key needed.
USE WHEN: The user wants the model to reconstruct a document into Markdown, HTML, DOCX, or any other format — invoices, scanned reports, contracts, IDs, receipts, mixed-content scans. Returns enough structure (paragraphs + raw text blocks with bounding boxes) that the model can render the output in whatever format the user asks for. DO NOT USE when: the user needs only one capability (use the dedicated tool — it will be faster).
Returns: JSON with this shape: { "source": { "path", "pageCount", "isPdf" }, "pages": [ { "page": 0, "paragraphs": [{ "paragraphId", "lineIds", "text" }, ...], // primary surface "textBlocks": [{ "text", "lineId", "paragraphId", "confidence", "bbox": { "x","y","width","height" } }, ...], "faces": [{ "x","y","width","height" }, ...], "barcodes": [{ "value","symbology","bbox" }, ...], "rectangles": [{ "confidence","bbox" }, ...] }, ... ], "summary": { "totalTextBlocks","totalParagraphs","totalFaces","totalBarcodes","totalRectangles" } }
Use paragraphs[].text as the primary surface for reading-order content. Use textBlocks[] when spatial information matters — multi-column layouts, tables, forms. PDFs return one entry per page; all coordinates are page-local 0–1. Face/barcode/rectangle detection on PDFs is best-effort (the underlying binary analyzes the PDF as a whole rather than per page).
Parameters: path — absolute or relative path to the image/PDF file start_page — PDFs only — 1-based index of the first page to analyze (default 1). Only narrows the OCR pass; face/barcode/rectangle detections are still whole-document and attached to the first returned page. Ignored for images. max_pages — PDFs only — maximum number of pages to OCR from start_page (default: all). Ignored for images.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | Absolute or relative path to the image or PDF file | |
| start_page | No | PDFs only — 1-based first page to analyze. Ignored for images. | |
| max_pages | No | PDFs only — maximum number of pages to analyze. Ignored for images. |