Extract text from a PDF (page-by-page)
obsidian_read_pdfExtracts plain text from PDF files, returning per-page content, full text, and document metadata. Supports optional page ranges and detects image-only PDFs for OCR recommendation.
Instructions
Extracts plain text from one PDF, returning per-page text + a full_text join + doc-level metadata (title/author/subject/etc). Image-only / scanned PDFs surface has_text: false so agents can detect-and-recommend OCR via obsidian_ocr_pdf (v2.10.0). Optional pages slice (1-indexed inclusive range) for partial reads of long documents. Read-only. Same path-safety + privacy filter as obsidian_read_note. Powered by Mozilla's PDF.js (Apache-2.0).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | Vault-relative path of the .pdf file (with or without .pdf) | |
| pages | No | Optional 1-indexed inclusive page range, e.g. [2, 5] reads pages 2..5 | |
| include_metadata | No | Include doc-level metadata in result (default true) |