pdf_to_markdown
Converts PDF to reading-order Markdown for LLM consumption. Reconstructs up to 2 content columns, infers headings from font size, detects lists; tables rendered as plain text.
Instructions
Convert a PDF to clean, reading-order Markdown for LLM consumption: reconstructs up to 2 content columns (plus full-width title/footer bands), infers headings from font size, and detects bullet/numbered lists. Pages with 3 or more columns fall back to single-column reading order. Tables are emitted as plain reading-order text, NOT reconstructed as Markdown tables. Best on clean, digital (text-based) PDFs; degrades on scanned/image-only PDFs (use pdf_render_pages for those) and very complex layouts. Returns the first 10 pages by default.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filePath | Yes | Absolute path to the PDF file | |
| pages | No | Page range, e.g. '1-5' or '1,3,5'. Defaults to first 10 pages. |