parse_pdf
Extract content from PDF files via local paths or URLs and convert it to structured JSON or Markdown format for data processing.
Instructions
Parses a PDF file and returns the extracted content in the specified format.
The function supports both local file paths and remote URLs as input sources. It extracts
the content from the PDF and formats it either as structured JSON or as a Markdown string.
:param source: The source of the PDF file to be parsed.
- If it is a string starting with "http://" or "https://", it will be treated as a remote URL.
- Otherwise, it will be treated as a local file path (absolute path recommended, e.g. "/Users/yourname/file.pdf").
:param format: The desired format for the parsed output. Supports:
- "json": Returns the extracted content as a dictionary.
- "markdown": Returns the extracted content as a Markdown-formatted string.
:return: The extracted content in the specified format (JSON dictionary or Markdown string).
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| source | Yes | ||
| format | No | json |