pdf_to_csv
Convert PDFs and scanned images into structured CSV files with accurate layout, columns, rows, and table data extraction. Supports OCR, custom page ranges, and secure file access.
Instructions
Input Schema
Name | Required | Description | Default |
---|---|---|---|
api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) | |
httppassword | No | HTTP auth password if required to access source url. (Optional) | |
httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
lang | No | Language for OCR for scanned documents. Default is 'eng'. See PDF.co docs for supported languages. (Optional, Default: 'eng') | eng |
line_grouping | No | Enables line grouping within table cells when set to '1'. (Optional) | 0 |
name | No | File name for the generated output. (Optional) | |
pages | No | Comma-separated page indices (e.g., '0, 1, 2-' or '1, 3-7'). Use '!' for inverted page numbers (e.g., '!0' for last page). Processes all pages if None. (Optional) | |
password | No | Password of the PDF file. (Optional) | |
rect | No | Defines coordinates for extraction (e.g., '51.8,114.8,235.5,204.0'). (Optional) | |
unwrap | No | Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be true or false. (Optional) | |
url | Yes | URL to the source file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. |