search_pdf
Search PDF files for text patterns or regular expressions to find specific content across pages, returning matching pages with context snippets for efficient document analysis.
Instructions
Search for text patterns (including regex) within a PDF file and return matching pages with context snippets. Supports Python-style page ranges and early stopping for performance. Use /pattern/flags format for regex (e.g., '/budget|forecast/gi') or plain text for literal search.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| absolute_path | No | Absolute path to the PDF file (e.g., '/Users/john/documents/report.pdf') | |
| relative_path | No | Path relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf') | |
| use_pdf_home | No | Use PDF agent home directory for relative paths (default: true) | |
| page_range | No | Page range in enhanced Python-style format: '5' (page 5), '5:10' (pages 5-10), '7:' (page 7 to end), ':5' (start to page 5). Also supports comma-separated combinations: '1,3:5,7' (pages 1, 3-5, and 7), '1-3,7,10:' (pages 1-3, 7, and 10 to end). Default: '1:' (all pages) | 1: |
| search_pattern | No | Search pattern: '/regex/flags' format (e.g., '/budget|forecast/gi') or plain text for literal search. Required. | |
| max_results | No | Stop after finding this many total matches. Optional - use for quick searches. | |
| max_pages_scanned | No | Stop after scanning this many pages. Optional - use for quick searches. | |
| context_chars | No | Number of characters to include before/after each match for context. Default: 150 | |
| search_timeout | No | Timeout for search operations in milliseconds. Default: 10000 (10 seconds) |