pdf_to_json
Convert PDFs and scanned images into JSON format, preserving text, fonts, images, vectors, and formatting. Extract data from specific pages, regions, or encrypted files for structured output.
Instructions
Convert PDF and scanned images into JSON representation with text, fonts, images, vectors, and formatting preserved using the /pdf/convert/to/json2 endpoint.
Ref: https://developer.pdf.co/api-reference/pdf-to-json/basic.md
Input Schema
Name | Required | Description | Default |
---|---|---|---|
api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) | |
httppassword | No | HTTP auth password if required to access source url. (Optional) | |
httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
lang | No | Language for OCR for scanned documents. Default is 'eng'. See PDF.co docs for supported languages. (Optional, Default: 'eng') | eng |
line_grouping | No | Enables line grouping within table cells when set to '1'. (Optional) | 0 |
name | No | File name for the generated output. (Optional) | |
pages | No | Comma-separated page indices (e.g., '0, 1, 2-' or '1, 3-7'). Use '!' for inverted page numbers (e.g., '!0' for last page). Processes all pages if None. (Optional) | |
password | No | Password of the PDF file. (Optional) | |
rect | No | Defines coordinates for extraction (e.g., '51.8,114.8,235.5,204.0'). (Optional) | |
unwrap | No | Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be true or false. (Optional) | |
url | Yes | URL to the source file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. |
Input Schema (JSON Schema)
{
"properties": {
"api_key": {
"default": "",
"description": "PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional)",
"title": "Api Key",
"type": "string"
},
"httppassword": {
"default": "",
"description": "HTTP auth password if required to access source url. (Optional)",
"title": "Httppassword",
"type": "string"
},
"httpusername": {
"default": "",
"description": "HTTP auth user name if required to access source url. (Optional)",
"title": "Httpusername",
"type": "string"
},
"lang": {
"default": "eng",
"description": "Language for OCR for scanned documents. Default is 'eng'. See PDF.co docs for supported languages. (Optional, Default: 'eng')",
"title": "Lang",
"type": "string"
},
"line_grouping": {
"default": "0",
"description": "Enables line grouping within table cells when set to '1'. (Optional)",
"title": "Line Grouping",
"type": "string"
},
"name": {
"default": "",
"description": "File name for the generated output. (Optional)",
"title": "Name",
"type": "string"
},
"pages": {
"default": "",
"description": "Comma-separated page indices (e.g., '0, 1, 2-' or '1, 3-7'). Use '!' for inverted page numbers (e.g., '!0' for last page). Processes all pages if None. (Optional)",
"title": "Pages",
"type": "string"
},
"password": {
"default": "",
"description": "Password of the PDF file. (Optional)",
"title": "Password",
"type": "string"
},
"rect": {
"default": "",
"description": "Defines coordinates for extraction (e.g., '51.8,114.8,235.5,204.0'). (Optional)",
"title": "Rect",
"type": "string"
},
"unwrap": {
"default": false,
"description": "Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be true or false. (Optional)",
"title": "Unwrap",
"type": "boolean"
},
"url": {
"description": "URL to the source file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files.",
"title": "Url",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}