| searchA | Run a multi-engine web search and return a ranked, deduplicated link list. Best for:
- Discovery queries ("what is X", "find me X", "who is X").
- Getting a list of URLs you can hand to `fetch` / `fetch_batch` next.
- Topics likely to be after your knowledge cutoff (use `freshness="week"`).
- Filtering to specific domains (`include_domains=["python.org"]`) or
content types (`category="paper"|"pdf"|"github"|"news"|"forum"|"blog"`).
Not recommended for:
- You already know the URL -> use `fetch` instead.
- You want both links AND their full text in one call -> use `research`.
- You want to query pages already in the local cache -> use `cache_search`.
- Reading PDFs/DOCX from a known URL -> use `read_doc`.
Returns:
- markdown (default): numbered list of `n. title`, `<url>`, snippet — ~40%
fewer tokens than json.
- json: dict with `results` (list of {title,url,snippet,engines,score}),
`engines`, `cached`, optional `errors` map, optional `hint` string.
Common mistakes:
- Passing a URL as `query` — that's `fetch`'s job.
- Cranking `max_results` to 50 hoping for better recall; engines cap around
10-20 each, anything beyond is duplicate noise.
- Adding `engines=["brave","bing","baidu"]` by default — those need
captcha-friendly conditions; stick with defaults unless they returned 0.
- Using `category="news"` for breaking news without also setting
`freshness="day"` — the index lag is days, not minutes.
Args:
query: Natural-language query (the same string a human would type).
engines: Subset of `engines()`. None = duckduckgo+mojeek+startpage.
max_results: Merged result count after dedup. 5-20 is the useful range.
use_cache: Reuse the last result for this exact (query, engines,
max_results) within the cache TTL. False forces a re-fetch.
max_age_hours: Treat cached results older than this as a miss. Use
0 to force-refresh while keeping cache writes; None = use server
default TTL (7 days).
freshness: "day"|"week"|"month"|"year" — restrict to recent results.
include_domains: List of domains to restrict to (e.g. ["python.org"]).
exclude_domains: List of domains to exclude.
category: "news"|"pdf"|"github"|"paper"|"forum"|"blog" — content-type
shortcut. "paper" => arxiv/acm/springer/ieee/etc; "forum" =>
reddit/HN/stackexchange; "github" => github.com only.
include_text: Substring required in title or snippet (case-insensitive).
exclude_text: Substring forbidden in title or snippet.
format: "markdown" (default) or "json".
|
| fetchA | Fetch one URL and return reader-mode Markdown of the main content. Best for:
- You already have a URL (from `search`, the user, or your own knowledge)
and need the actual page text.
- Verifying a single claim by reading the source.
- Pages that need reader-mode cleanup (nav/footer/scripts stripped).
Not recommended for:
- Multiple URLs at once -> use `fetch_batch` (concurrent, one round-trip).
- "Search then read top N" -> use `research` (one call, not two).
- PDF/DOCX URLs -> use `read_doc` (proper binary parsing).
- You don't have a URL yet -> use `search` first.
Returns:
- markdown (default): a small header (URL, render method, token count)
plus the cleaned page body.
- json: {url, title, content, method, truncated, tokens_estimated,
author, published_date, sitename}.
Common mistakes:
- Passing a search query instead of a URL.
- Using `render="http"` on a JS-only SPA — it returns near-empty content;
use "auto" (default) or "browser".
- Forgetting that results are cached 7 days — use `force_refresh=True`
or `max_age_hours=0` for a fresh pull.
Args:
url: Absolute http(s) URL.
render: "auto" (try HTTP, fall back to stealth Chromium), "http"
(fast, fails on JS), "browser" (slow, robust).
force_refresh: Bypass the page cache entirely.
max_age_hours: Treat cached pages older than this as a miss. 0 = same
as force_refresh. None = server default TTL (7 days).
format: "markdown" or "json".
|
| fetch_batchA | Fetch a list of URLs in parallel. Per-URL failures do not raise. Best for:
- 2+ URLs you want to read in one round-trip.
- Reading the top N results of a previous `search` call.
Not recommended for:
- A single URL -> `fetch` (no list-wrapping overhead).
- "Search and then read" -> `research` collapses both into one tool call.
- PDFs/DOCX -> `read_doc` per file.
Returns:
- markdown (default): each page rendered as a Markdown section, separated
by horizontal rules; failed URLs become inline error notes.
- json: list[dict], one entry per URL, with `error` set on failures.
Common mistakes:
- Passing a single URL inside a 1-element list — use `fetch` directly.
- Assuming an exception means the whole batch failed; check each item's
`error` field instead.
Args:
urls: List of absolute http(s) URLs.
render: Same as `fetch`.
format: "markdown" or "json".
|
| read_docA | Read a local file or http(s) document into Markdown. Best for:
- Local or remote PDFs and DOCX (parsed locally, no remote API).
- Local text/HTML/Markdown files the user pointed at.
- Paginating through a long document via `start` / `length`.
Not recommended for:
- Arbitrary HTML web pages -> `fetch` does reader-mode cleanup that this
tool does not.
- Pages discovered through search -> `fetch` or `research`.
Returns:
- markdown (default): rendered document text with a small header.
- json: {content, title, format, total_chars, start, returned_chars,
truncated}. Use `total_chars` and `returned_chars` to drive pagination.
Common mistakes:
- Calling this on a normal article URL — you'll get raw HTML noise; use
`fetch` instead.
- Forgetting to advance `start` when paginating: next call should pass
`start = previous_start + returned_chars`.
Args:
source: Local path (e.g. "~/papers/x.pdf") or http(s) URL.
start: Character offset to begin reading from. Default 0.
length: Max characters to return; None = read to end (still capped
by per-call max content size).
format: "markdown" or "json".
|
| researchA | One-shot research: search the web, fetch the top results, return both. Best for:
- Open-ended questions that need finding sources AND reading them
("what's new with X", "summarize the controversy around Y").
- Replacing a `search` + N x `fetch` chain with one call.
- Producing a citable brief with [n]-style source references.
Not recommended for:
- You only need links -> `search` (cheaper, no fetching).
- You only need to read one URL you already have -> `fetch`.
- You want to query previously-fetched cached pages -> `cache_search`.
Returns:
- markdown (default): a "Research brief" with a Sources index then the
full Markdown body of each fetched document, separated by horizontal
rules; includes a token estimate.
- json: {question, engines, sources:[{rank,title,url,snippet,...}],
documents:[...], tokens_estimated, errors}.
Common mistakes:
- Using `depth=8` for a quick lookup — that's 8 page fetches; 2-3 is
almost always enough.
- Calling `research` for a known URL — that's `fetch` territory.
- Forgetting that `fetch=False` returns sources only (much cheaper if
the LLM only needs to pick which one to read).
Args:
question: What you want to know, in natural language.
depth: How many top results to fetch (1-8). 3 is a good default.
engines: Override the engine set (see `engines()` for names).
fetch: If False, return source list without reading them.
use_cache: Reuse cached search/page data within TTL.
max_age_hours: Treat cached search results older than this as a miss
(0 = force-refresh search; None = server default TTL).
format: "markdown" or "json".
|
| cache_searchA | Full-text search over pages already fetched into the local SQLite FTS5 index. Best for:
- Recalling something the user/agent fetched earlier in the conversation
("what did that Wikipedia page say about X").
- Avoiding re-fetching content already in the local cache.
- Quick keyword grep across the corpus you've built up.
Not recommended for:
- Discovering new pages on the open web -> use `search` or `research`.
- When the cache is empty (fresh install) -> `search`/`research` first to
populate it.
Returns:
- markdown (default): a per-hit list of title, URL, and a `[bracket]`-
highlighted snippet around the matched terms.
- json: list of {url, title, snippet}.
Common mistakes:
- Treating this like web search — it ONLY hits pages already in the local
cache. If the user hasn't fetched anything, you'll get zero hits.
- Using natural-language phrases without quoting them; FTS5 splits on
whitespace as AND. For an exact phrase use `"like this"`.
Args:
query: FTS5 query. Bare terms = AND. Supports OR / NOT, prefix
(`term*`), and phrase (`"exact phrase"`).
limit: Max hits to return.
format: "markdown" or "json".
|
| enginesA | List engine names accepted by the engines= parameter of search / research. Best for:
- Discovering what's installable before passing a non-default engine.
- Building user-facing UIs that let humans pick engines.
Not recommended for:
- Calling on every search — the list is static; cache it.
Returns:
- A list of engine name strings (e.g. ["duckduckgo", "mojeek",
"startpage", "brave", "bing", "baidu"]).
Common mistakes:
- Passing one of these names as a query to `search` — they go in the
`engines=` argument, not `query`.
Defaults: duckduckgo + mojeek + startpage (all reliable, no captchas).
Opt-in: brave (PoW captcha after a few calls), bing (UA-gated),
baidu (results wrapped in baidu.com/link redirects).
|