read_pdf_structure
Extract PDF page layout including text, coordinates, fonts, and colors to understand document structure before making edits.
Instructions
Extract the complete structural content of a PDF document.
Returns detailed information about each page including:
Page dimensions (width, height)
All text elements with their:
Exact text content
Bounding box coordinates (x0, y0, x1, y1)
Origin point for text insertion
Font name and size
Color value
Use this tool FIRST to understand the document layout before making any modifications. The output helps identify exact text to target for replacements.
Args: input_path: Absolute path to the PDF file to analyze. Must be a valid, accessible PDF file. password: Optional password if the PDF is encrypted.
Returns: JSON string containing the complete page structure. On error, returns JSON with success=false and error details.
Example: read_pdf_structure("/home/user/documents/invoice.pdf")
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input_path | Yes | ||
| password | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |