crw_parse_file
Parse a PDF document uploaded as base64-encoded bytes and return its content as markdown. Handles text-based PDFs but not scanned/image-only PDFs without OCR.
Instructions
Parse an uploaded document (PDF) and return its content as markdown. Pass the file bytes base64-encoded in contentBase64. Use this for local PDFs that have no URL. Scanned/image-only PDFs have no text layer (no OCR) and return a warning with empty markdown.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| contentBase64 | Yes | Base64-encoded bytes of the PDF file | |
| filename | No | Original filename (optional; echoed in metadata.sourceFilename) | |
| formats | No | Output formats (default: ["markdown"]). json/summary require a server LLM. | |
| jsonSchema | No | JSON schema for LLM-based structured extraction (when formats includes json) | |
| parsers | No | Document parsers to apply (default: ["pdf"]) |