pdf_to_csv
Convert PDF and scanned images into structured CSV files, preserving layout, columns, rows, and tables. Extract data from specific pages or regions with customizable options for OCR and line grouping.
Instructions
Input Schema
Name | Required | Description | Default |
---|---|---|---|
api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) | |
httppassword | No | HTTP auth password if required to access source url. (Optional) | |
httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
lang | No | Language for OCR for scanned documents. Default is 'eng'. See PDF.co docs for supported languages. (Optional, Default: 'eng') | eng |
line_grouping | No | Enables line grouping within table cells when set to '1'. (Optional) | 0 |
name | No | File name for the generated output. (Optional) | |
pages | No | Comma-separated page indices (e.g., '0, 1, 2-' or '1, 3-7'). Use '!' for inverted page numbers (e.g., '!0' for last page). Processes all pages if None. (Optional) | |
password | No | Password of the PDF file. (Optional) | |
rect | No | Defines coordinates for extraction (e.g., '51.8,114.8,235.5,204.0'). (Optional) | |
unwrap | No | Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be true or false. (Optional) | |
url | Yes | URL to the source file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. |