read-pdf
Extract text and metadata from PDF files, supporting text extraction from scanned documents with OCR in multiple languages, content search with regex, and customizable page ranges.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file | Yes | PDF 文件路径 | |
| pages | No | 页码范围(如 '1-5', '1,3,5', 'all'),默认 'all' | |
| include_metadata | No | 是否包含 PDF 元数据,默认 true | |
| clean_text | No | 是否清理和规范化文本,默认 false |