pdf_to_json
Convert PDFs and scanned images to JSON while preserving text, fonts, images, vectors, and formatting using the /pdf/convert/to/json2 endpoint.
Instructions
Convert PDF and scanned images into JSON representation with text, fonts, images, vectors, and formatting preserved using the /pdf/convert/to/json2 endpoint.
Ref: https://developer.pdf.co/api-reference/pdf-to-json/basic.mdInput Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to the source file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files. | |
| httpusername | No | HTTP auth user name if required to access source url. (Optional) | |
| httppassword | No | HTTP auth password if required to access source url. (Optional) | |
| pages | No | Comma-separated page indices (e.g., '0, 1, 2-' or '1, 3-7'). Use '!' for inverted page numbers (e.g., '!0' for last page). Processes all pages if None. (Optional) | |
| unwrap | No | Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be true or false. (Optional) | |
| rect | No | Defines coordinates for extraction (e.g., '51.8,114.8,235.5,204.0'). (Optional) | |
| lang | No | Language for OCR for scanned documents. Default is 'eng'. See PDF.co docs for supported languages. (Optional, Default: 'eng') | eng |
| line_grouping | No | Enables line grouping within table cells when set to '1'. (Optional) | 0 |
| password | No | Password of the PDF file. (Optional) | |
| name | No | File name for the generated output. (Optional) | |
| api_key | No | PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional) |
Implementation Reference
- pdfco/mcp/tools/apis/conversion.py:8-72 (handler)The actual handler function for the 'pdf_to_json' MCP tool. Decorated with @mcp.tool(), it accepts parameters (url, httpusername, httppassword, pages, unwrap, rect, lang, line_grouping, password, name, api_key) and delegates to the 'convert_to' service function with source='pdf' and target='json2'.
@mcp.tool() async def pdf_to_json( url: str = Field( description="URL to the source file. Supports publicly accessible links including Google Drive, Dropbox, PDF.co Built-In Files Storage. Use 'upload_file' tool to upload local files." ), httpusername: str = Field( description="HTTP auth user name if required to access source url. (Optional)", default="", ), httppassword: str = Field( description="HTTP auth password if required to access source url. (Optional)", default="", ), pages: str = Field( description="Comma-separated page indices (e.g., '0, 1, 2-' or '1, 3-7'). Use '!' for inverted page numbers (e.g., '!0' for last page). Processes all pages if None. (Optional)", default="", ), unwrap: bool = Field( description="Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be true or false. (Optional)", default=False, ), rect: str = Field( description="Defines coordinates for extraction (e.g., '51.8,114.8,235.5,204.0'). (Optional)", default="", ), lang: str = Field( description="Language for OCR for scanned documents. Default is 'eng'. See PDF.co docs for supported languages. (Optional, Default: 'eng')", default="eng", ), line_grouping: str = Field( description="Enables line grouping within table cells when set to '1'. (Optional)", default="0", ), password: str = Field( description="Password of the PDF file. (Optional)", default="" ), name: str = Field( description="File name for the generated output. (Optional)", default="" ), api_key: str = Field( description="PDF.co API key. If not provided, will use X_API_KEY environment variable. (Optional)", default="", ), ) -> BaseResponse: """ Convert PDF and scanned images into JSON representation with text, fonts, images, vectors, and formatting preserved using the /pdf/convert/to/json2 endpoint. Ref: https://developer.pdf.co/api-reference/pdf-to-json/basic.md """ return await convert_to( "pdf", "json2", ConversionParams( url=url, httpusername=httpusername, httppassword=httppassword, pages=pages, unwrap=unwrap, rect=rect, lang=lang, line_grouping=line_grouping, password=password, name=name, ), api_key=api_key, ) - pdfco/mcp/services/pdf.py:6-10 (helper)The 'convert_to' helper function called by pdf_to_json. It constructs the API endpoint as '{_from}/convert/to/{_to}' (i.e., 'pdf/convert/to/json2') and delegates to the 'request' function.
async def convert_to( _from: str, _to: str, params: ConversionParams, api_key: str | None = None ) -> BaseResponse: return await request(f"{_from}/convert/to/{_to}", params, api_key=api_key) - pdfco/mcp/services/pdf.py:125-153 (helper)The 'request' helper that executes the actual HTTP POST to the PDF.co API. It calls the PDFCoClient, sends the payload, and wraps the response in a BaseResponse.
async def request( endpoint: str, params: ConversionParams, custom_payload: dict | None = None, api_key: str | None = None, ) -> BaseResponse: payload = params.parse_payload(async_mode=True) if custom_payload: payload.update(custom_payload) try: async with PDFCoClient(api_key=api_key) as client: url = f"/v1/{endpoint}" print(f"Requesting {url} with payload {payload}", file=sys.stderr) response = await client.post(url, json=payload) print(f"response: {response}", file=sys.stderr) json_data = response.json() return BaseResponse( status="working", content=json_data, credits_used=json_data.get("credits"), credits_remaining=json_data.get("remainingCredits"), tips=f"You **should** use the 'wait_job_completion' tool to wait for the job [{json_data.get('jobId')}] to complete if a jobId is present.", ) except Exception as e: return BaseResponse( status="error", content=f"{type(e)}: {[arg for arg in e.args if arg]}", )