pdf_fetch
Download and extract text from PDFs that return binary content in web fetch, enabling access to reports, bulletins, and circulars.
Instructions
Download a PDF directly and extract its text with pypdf.
Use this WHENEVER a `web_fetch` or `web_fetch_structured` call comes
back saying the content was "binary" or "not extractable" — that's
almost always a Tavily limitation on PDFs that are actually text-based
and perfectly extractable with a proper PDF library. Common cases:
PPAC monthly reports, RBI bulletins, MoSPI press release PDFs, PIB
statements, regulator circulars.
Args:
url: The PDF URL (.pdf in path, or a server that returns
Content-Type: application/pdf).
pages: Optional 1-indexed list of pages to extract (e.g. [1, 2, 5]).
If omitted, the first `max_pages` are extracted.
max_pages: Cap on auto-extracted pages when `pages` is omitted.
Returns:
{url, domain, content, fetched_at, page_count, pages_extracted,
content_truncated, kind: "pdf"}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| pages | No | ||
| max_pages | No |