| todayA | Return the SERVER'S CURRENT DATE (UTC). Call this FIRST whenever the
user mentions a temporal phrase like "latest", "current", "today",
"yesterday", "this quarter", "this year" — your training-data cutoff is
NOT a reliable anchor for what 'today' actually is. Use the returned
iso_date (YYYY-MM-DD) and year to construct concrete queries. Returns:
{iso_date, iso_datetime, year, month, day, weekday, quarter,
fiscal_year_in: "FY26", note}
|
| pick_authority_domainsA | Decide which AUTHORITY domains a query should be restricted to. Call this BEFORE any web search when the query has an authoritative
answer — official regulators, statistical agencies, market exchanges,
industry SROs. Returns the ranked list keyed by `primary` and `secondary`.
Args:
query: The user question (free text).
indicators: Optional indicator hints (e.g. ["repo_rate", "cpi_inflation"]).
jurisdiction: ISO code: "IN", "US", "UK", "EU". Drives jurisdiction defaults.
topic_hint: One of "regulator", "market", "news", "company_ir",
"statistics", "academic", "any".
additional_authority_sources: If you've already resolved authority
sources for the indicators (e.g. ["RBI", "MOSPI"]), pass them
here — they will be expanded to domains via the AUTHORITY_DOMAINS
registry and used as the primary set.
Returns:
{primary, secondary, primary_sources, secondary_sources,
hints, landing_pages, rationale, query, current_date}
`hints` (when non-empty) are DOMAIN-SPECIFIC INSTRUCTIONS you MUST
follow for this query — e.g. "for GST collections, prefer the Excel
files at gst.gov.in/download/gststatistics over PDF press releases".
`landing_pages` are URLs you should fetch DIRECTLY (web_fetch_structured)
before broadening the search.
|
| web_search_authoritativeA | Three-pass authoritative web search. Pass 1: primary authority domains (catalog rules + curated registry).
Pass 2: secondary authority + Tier-1 business press.
Pass 3: open web (toggle via `allow_open_web_fallback`).
Prefer this over plain web search when the query has an authoritative
answer — automates the include-domains discipline and the fallback ladder.
Args:
query: Search query (free text).
indicators: Indicator hints; piped to pick_authority_domains.
jurisdiction: ISO code: "IN", "US", "UK", "EU".
topic_hint: As in pick_authority_domains.
include_domains: Manual override; skips pick_authority_domains.
additional_authority_sources: See pick_authority_domains.
max_results: Per-pass cap (default 6).
topic: "general" or "news". When "news", set days for recency window.
days: Days back for news topic (e.g. 7 = last week).
allow_open_web_fallback: If False, refuses pass 3.
Returns:
Tavily result shape plus `pass`, `domains_used`, `authority_score`
(1.0 primary, 0.6 secondary, 0.3 open web), and `rationale`.
|
| web_fetch_structuredA | Fetch a URL and extract STRUCTURED data via a focused LLM pass. Better than plain fetch when you need SPECIFIC numbers from a long
press release / annual report / regulatory document. The extraction is
LLM-mediated so it understands context and won't hallucinate values
not on the page.
Args:
url: The page URL.
focus: What to extract, e.g. "CPI YoY April 2025, food inflation,
core CPI". The LLM uses this to bias its extraction.
Returns:
{title, dateline, summary, key_facts[], numeric_values[],
dates[], tables_summary[]}
Requires ANTHROPIC_API_KEY env var. Without it, returns raw text only.
|
| web_compare_across_sourcesA | Issue THE SAME query across N authority domains in one call. Returns
a per-domain top hit plus an agreement matrix. Use to cross-validate a load-bearing number (e.g. "India CPI April 2025")
across MoSPI / RBI / IMF / press without writing N separate web_search
calls.
Args:
claim_or_query: The claim or query to test.
domains: Authority domains to compare (2-6 sweet spot).
max_results_per_domain: 1-5.
Returns:
{per_domain[], domains_covered, domains_total,
agreement: "agree"|"partial"|"conflict"|"insufficient", summary}
|
| web_sitemap_walkA | Locate the canonical landing page for a topic on an authority domain
via /sitemap.xml or /robots.txt Sitemap: declarations. Falls back to
Tavily site-restricted search if the domain doesn't expose a sitemap. Args:
domain: e.g. "rbi.org.in".
topic: The topic to score sitemap entries against, e.g. "press releases".
max_candidates: 1-20.
Returns:
{domain, sitemap_urls[], candidates[], method, notes[]}
|
| web_searchA | Plain Tavily search — escape hatch for free-form exploration. Prefer `web_search_authoritative` when the query has an authoritative
answer.
|
| web_fetchA | Plain Tavily extract — returns clean text from a URL. Prefer `web_fetch_structured` when you need typed key_facts /
numeric_values rather than prose. For PDFs, use `pdf_fetch` instead —
Tavily's Extract often returns "binary / not extractable" for them.
|
| pdf_discoverA | List every PDF link on an HTML landing page, with its anchor text. Use this on HUB pages — PPAC consumption / production / imports, RBI
bulletin month index, MoSPI press-release listings, MoRTH notification
indexes, MCA filing pages — where the actual data lives in attached
PDFs and the page often has Year/Month/Product dropdowns that are
really just client-side filters over the same anchor set. Returns
each PDF's absolute URL and the human-readable anchor text so you can
pick by name (e.g. "Domestic Consumption of Petroleum Products-2026-27",
"Flash Report May 26").
Workflow: pdf_discover → pick by anchor text → pdf_fetch_structured.
Args:
url: The HTML landing page URL.
link_text_filter: Optional case-insensitive substring; only anchors
whose text contains it are returned. E.g. "2026-27", "Flash".
max_links: Cap on links returned (default 40).
Returns:
{url, domain, pdfs: [{href, text, label_hint}], count,
page_title, fetched_at}
|
| http_post_formA | POST a form (application/x-www-form-urlencoded) and return the JSON. The escape hatch for Year/Month/Product dropdowns on government dashboards
that don't change the page URL — the dropdown triggers an AJAX POST and
only renders the result client-side, so pdf_discover and web_fetch can't
see it. Use this when a landing page's dropdown isn't a `<select>` whose
value becomes a query param.
Example — PPAC prior-year (FY2025-26) petroleum consumption:
url = "https://ppac.gov.in/AjaxController/getConsumptionPetroleumProductsChartData"
form = {"financialYear": "2025-2026", "reportBy": "1", "pageId": "43"}
referer = "https://ppac.gov.in/consumption/products-wise"
Returns the full FY2025-26 monthly JSON (April 2025 → March 2026).
Args:
url: The POST endpoint (usually `/AjaxController/...` on gov sites).
form: Form fields to submit.
referer: Optional Referer header — many gov AJAX endpoints reject
requests without one.
parse: "json" / "text" / "auto" (default — try JSON, fall back to text).
Returns:
{url, status, content_type, json (when parseable), text, fetched_at}.
|
| pdf_fetchA | Download a PDF directly and extract its text with pypdf. Use this WHENEVER a `web_fetch` or `web_fetch_structured` call comes
back saying the content was "binary" or "not extractable" — that's
almost always a Tavily limitation on PDFs that are actually text-based
and perfectly extractable with a proper PDF library. Common cases:
PPAC monthly reports, RBI bulletins, MoSPI press release PDFs, PIB
statements, regulator circulars.
Args:
url: The PDF URL (.pdf in path, or a server that returns
Content-Type: application/pdf).
pages: Optional 1-indexed list of pages to extract (e.g. [1, 2, 5]).
If omitted, the first `max_pages` are extracted.
max_pages: Cap on auto-extracted pages when `pages` is omitted.
Returns:
{url, domain, content, fetched_at, page_count, pages_extracted,
content_truncated, kind: "pdf"}.
|
| pdf_fetch_structuredA | Direct PDF download + pypdf extraction → focused LLM pass → structured JSON. Same returned shape as `web_fetch_structured` (title, dateline,
key_facts[], numeric_values[], dates[], tables_summary[]) but goes
through the PDF path. Use when you have a PDF URL and want the values
extracted into a structured shape rather than just raw text.
|
| visitA | Open a URL with a real Chromium and return its rendered state. Use when the cheaper fetch tools (web_fetch, pdf_fetch, http_post_form)
fail because the page is a SPA, JS-rendered chart, login-walled, or has
a dropdown that's not a separate URL.
Args:
url: The page URL.
wait_for_selector: Optional CSS selector to await before reading the
DOM. Use when data appears only after an AJAX call returns —
e.g. ".chart svg", "table#monthly tbody tr".
wait_extra_ms: Extra settle time after the wait fires (default 1500).
timeout_ms: Hard navigation timeout (default 45s).
screenshot: Whether to capture a PNG INTERNALLY (default True). Adds
~200ms; the bytes are used by extract()/act() for Sonnet vision.
full_page_screenshot: Scroll-stitch the whole page (default False).
text_cap: Cap on extracted text length (default 30000).
return_screenshot_b64: Whether to ECHO the base64 PNG back in the
response. DEFAULT False — typical screenshots are 700KB-1MB and
accumulating them across an agent's tool-call history blows the
1M-token context window in ~3 calls. Only opt in when the caller
actually consumes the bytes (e.g. a browser-canvas UI).
Returns:
{url, title, domain, text, screenshot_bytes, screenshot_b64 (opt-in),
fetched_at, current_date}
|
| actA | Drive a real Chromium through a sequence of steps, then run Sonnet
structured extraction on the final state. Use this when the data is BEHIND an interaction — a Year/Month dropdown
that fires AJAX inline, a tab to click, a "Load more" button, a form
to submit. `visit` and `extract` only read the page as it loaded;
`act` clicks/types/selects first.
Steps are a list of single-key dicts:
{"click": "css-selector"}
{"fill": {"selector": "#q", "value": "x"}}
{"select": {"selector": "#year", "value": "2024-2025"}}
{"press": {"selector": "#q", "key": "Enter"}}
{"scroll": {"to": "bottom"|"top"|<int px>}}
{"wait_for_selector": "css-selector"}
{"wait_for_load_state": "networkidle"|"load"}
{"wait_ms": 1500}
{"goto": "https://…"} // mid-flow navigation
{"screenshot": {"name": "after-select"}} // logged, not returned
Example — pull PPAC FY2024-25 monthly consumption (a flow that needs
the year dropdown change to fire an AJAX request):
act(
url="https://ppac.gov.in/consumption/products-wise",
steps=[
{"wait_for_selector": "#financialYear"},
{"select": {"selector": "#financialYear", "value": "2024-2025"}},
{"wait_for_load_state": "networkidle"},
{"wait_ms": 2000},
],
focus="FY2024-25 monthly LPG, MS, HSD, ATF consumption",
)
Returns the same shape as `extract` PLUS `step_results` (per-step
timing + ok/error) and `final_url`.
Args:
url: Starting page URL.
steps: Ordered list of action dicts (vocabulary above).
focus: Extraction focus passed to Sonnet.
timeout_ms: Per-step navigation / wait timeout.
full_page_screenshot: Whether the final screenshot is full-page.
Returns:
{url, domain, title, dateline, summary, key_facts[],
numeric_values[], dates[], tables_summary[], step_results[],
final_url, kind: "browser"}.
|
| extractA | Visit a URL → focused Sonnet structured extraction. Sends BOTH rendered text AND a screenshot to Sonnet — so numbers drawn
via canvas / SVG (chart values on PPAC, RBI, NSE dashboards) that don't
appear in the DOM still get extracted. Same returned shape as
pdf_fetch_structured / web_fetch_structured on authority-web-search-mcp.
Args:
url: The page URL.
focus: What to extract, e.g. "monthly LPG, MS, HSD consumption for
FY2024-25" or "Q4 FY26 EBITDA margin and revenue".
wait_for_selector: Optional CSS selector to await (see visit).
full_page_screenshot: Default True so charts below the fold are seen.
Returns:
{url, domain, title, dateline, summary, key_facts[], numeric_values[],
dates[], tables_summary[], kind: "browser"}.
|