pdf_to_csv
Convert PDF files and scanned images into structured CSV format, extracting tables, columns, and rows for data analysis and processing.
Instructions
Convert PDF and scanned images into CSV representation with layout, columns, rows, and tables.
Ref: https://developer.pdf.co/api-reference/pdf-to-csv.md
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to the source file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. | |
| httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
| httppassword | No | HTTP auth password if required to access source url. (Optional) | |
| pages | No | Comma-separated page indices (e.g., '0, 1, 2-' or '1, 3-7'). Use '!' for inverted page numbers (e.g., '!0' for last page). Processes all pages if None. (Optional) | |
| unwrap | No | Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be true or false. (Optional) | |
| rect | No | Defines coordinates for extraction (e.g., '51.8,114.8,235.5,204.0'). (Optional) | |
| lang | No | Language for OCR for scanned documents. Default is 'eng'. See PDF.co docs for supported languages. (Optional, Default: 'eng') | eng |
| line_grouping | No | Enables line grouping within table cells when set to '1'. (Optional) | 0 |
| password | No | Password of the PDF file. (Optional) | |
| name | No | File name for the generated output. (Optional) | |
| api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) |