pdf_text_extract
Extract text from any text-based PDF by providing its URL. Handles common compressions and empty-password RC4 encryption, returning clean plain text.
Instructions
Fetch a PDF from a URL and extract its text content. Handles FlateDecode-compressed streams (the most common compression in modern PDFs) and RC4-encrypted PDFs that open with an empty password. Works on text-based PDFs (those generated from Word, LaTeX, web, etc.); does not perform OCR on scanned/image-only PDFs. Returns clean plain text — use instead of loading raw PDF bytes into your context.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the PDF to fetch (http/https). Must be a PDF file. | |
| maxChars | No | Max characters to return (default 20000, max 100000). |