ocr_document
Force OCR text extraction on scanned PDFs when normal text extraction returns garbled or empty text.
Instructions
Force OCR text extraction on a PDF, bypassing normal text extraction. Use this when read_document returns garbled or empty text from a scanned PDF; requires tesseract and pdftoppm. Read-only.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filename | Yes | The PDF filename to OCR | |
| language | No | Tesseract language code (default: "eng"). Use "spa" for Spanish, "fra" for French, etc. Run 'tesseract --list-langs' to see available languages. | |
| page | No | Optional page number to OCR (1-based). If omitted, OCRs all pages. |