pdf_discover
List every PDF link on a web page with its anchor text, enabling selection of specific PDFs by name for structured data extraction.
Instructions
List every PDF link on an HTML landing page, with its anchor text.
Use this on HUB pages — PPAC consumption / production / imports, RBI
bulletin month index, MoSPI press-release listings, MoRTH notification
indexes, MCA filing pages — where the actual data lives in attached
PDFs and the page often has Year/Month/Product dropdowns that are
really just client-side filters over the same anchor set. Returns
each PDF's absolute URL and the human-readable anchor text so you can
pick by name (e.g. "Domestic Consumption of Petroleum Products-2026-27",
"Flash Report May 26").
Workflow: pdf_discover → pick by anchor text → pdf_fetch_structured.
Args:
url: The HTML landing page URL.
link_text_filter: Optional case-insensitive substring; only anchors
whose text contains it are returned. E.g. "2026-27", "Flash".
max_links: Cap on links returned (default 40).
Returns:
{url, domain, pdfs: [{href, text, label_hint}], count,
page_title, fetched_at}
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| link_text_filter | No | ||
| max_links | No |