Download a PDF from a URL and extract all text content, page by page.
Use this to read the full text of a specific document — for example, an annual
report PDF linked from a search_filings result. Best combined with search_filings:
use search_filings to locate the document, then parse_pdf_to_text for the full text.
Do not use for PDFs that are already well-represented in the database — search_filings
is faster and returns pre-ranked, relevant excerpts.
Not suitable for scanned (image-only) PDFs without embedded text; those pages
will be returned as "(no extractable text)".
Args:
pdf_url: Direct HTTPS URL to the PDF file, e.g. https://example.com/report.pdf.
Must be publicly accessible; authentication-protected URLs will fail.
Returns:
All text from the PDF with "--- Page N ---" separators between pages.
Returns an error string if the download fails, the URL does not point to a
valid PDF, or the document exceeds the 60-second download timeout.
Connector