pdf_to_json
Convert PDFs and scanned images into structured JSON, preserving text, fonts, images, vectors, and formatting for easy data extraction and integration using the PDF.co MCP Server.
Instructions
Input Schema
Name | Required | Description | Default |
---|---|---|---|
api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) | |
httppassword | No | HTTP auth password if required to access source url. (Optional) | |
httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
lang | No | Language for OCR for scanned documents. Default is 'eng'. See PDF.co docs for supported languages. (Optional, Default: 'eng') | eng |
line_grouping | No | Enables line grouping within table cells when set to '1'. (Optional) | 0 |
name | No | File name for the generated output. (Optional) | |
pages | No | Comma-separated page indices (e.g., '0, 1, 2-' or '1, 3-7'). Use '!' for inverted page numbers (e.g., '!0' for last page). Processes all pages if None. (Optional) | |
password | No | Password of the PDF file. (Optional) | |
rect | No | Defines coordinates for extraction (e.g., '51.8,114.8,235.5,204.0'). (Optional) | |
unwrap | No | Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be true or false. (Optional) | |
url | Yes | URL to the source file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. |