read-pdf
Extract text and metadata from PDF files, supporting text extraction from scanned documents with OCR in multiple languages, content search with regex, and customizable page ranges.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file | Yes | PDF 文件路径 | |
| pages | No | 页码范围(如 '1-5', '1,3,5', 'all'),默认 'all' | |
| include_metadata | No | 是否包含 PDF 元数据,默认 true | |
| clean_text | No | 是否清理和规范化文本,默认 false |