Document to Markdown
talonic_to_markdownRetrieve full text of any document as markdown using OCR. Ideal for summarization, translation, or general text extraction.
Instructions
Get the OCR-converted markdown text of a document.
USE WHEN: the user wants the full text — 'what does it say', summarise, or translate a document.
NOT FOR: specific structured fields (use talonic_extract with a schema).
BY NAME: if the user names a file, call talonic_search first to get its document_id, then call this.
ARGS: prefer document_id (a workspace doc — one cheap call). Otherwise file_url, or file_data+filename for small local files — provide exactly one. A file input ingests the document first and consumes credits; document_id does not.
RETURNS: document_id and markdown (the full text).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| document_id | No | The Talonic document id whose markdown you want. Get this from a previous talonic_extract or talonic_search response. | |
| file_data | No | Base64-encoded file bytes. Recommended path when the agent already has the file in memory (e.g., the user attached a PDF to the conversation). Pair with `filename` so MIME type can be inferred. | |
| filename | No | Original filename including extension, e.g. 'invoice.pdf'. Used to infer MIME type when uploading via `file_data`. Required when `file_data` is provided. | |
| file_path | No | Local path to a document file. Only works if the MCP server has read access to that path. In sandboxed chat clients (Claude Desktop, Cowork) use `file_data` instead. | |
| file_url | No | URL to a document file. The Talonic API fetches it server-side. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| document_id | Yes | ID of the document the markdown was extracted from. | |
| markdown | Yes | OCR-converted markdown text content of the document. | |
| cost | No | Per-call cost and post-call balance from the underlying extract step, parsed from the X-Talonic-* response headers. `null` when the document was already ingested (document_id path) and no extract call ran. Not always present on legacy clients. |