Survey Corpus (exhaustive)
survey_corpusSurvey the entire Pāli Tipiṭaka for a term with guaranteed complete lexical results, returning exact counts, per-pitaka breakdowns, and distinct matched forms.
Instructions
Exhaustively survey the WHOLE Tipiṭaka for a term — guaranteed complete.
Use this (not search_by_keyword) when the question is about coverage or
counting rather than "show me the best passages":
"How many times does Kusinārā appear in the canon?"
"Every place ānāpānassati is mentioned — don't miss any"
"Which pitakas/how many suttas mention this term?"
Unlike search_by_keyword (ranked, capped at 50, no total), this returns an
exact count, a per-pitaka breakdown, the distinct surface forms
that matched (so you can audit and discard over-matches), and a paginated
enumeration. The lexical result carries complete: true — a hard
guarantee that nothing was dropped for the chosen match_scope.
Two layers, two different promises:
lexical — the word and its forms. Deterministic + EXHAUSTIVE.
semantic (
mode="thorough", hosted only) — passages teaching the same concept with DIFFERENT vocabulary (e.g. ānāpānassati viaassasati/passasati). Approximate, NOT exhaustive — it never claims completeness, it only boosts recall.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| keyword | Yes | Term to survey (Romanised Pāli preferred; diacritics optional — matching folds `ā→a`, `ṁ→m`, etc.). | |
| language | No | "pali" (default) or "english". Thai is not indexed yet. | pali |
| pitaka | No | Restrict to "vinaya" / "sutta" / "abhidhamma", or None for all. | |
| match_scope | No | "word" (default) matches the exact word/phrase only. "stem" also matches inflections + compounds via prefix (kusinārā → kusinārāyaṁ, kusināravagga …) — higher recall, may over-match (audit via `matched_forms`). | word |
| mode | No | "fast" (default) = lexical only — quick, no server-side ML, works offline. "thorough" = also run the semantic layer (hosted only; this is the heavier part). The lexical guarantee holds in BOTH. | fast |
| page_size | No | Lexical results per page (default 20, max 100). Counts/forms cover the WHOLE corpus regardless of this. | |
| cursor | No | Offset into the full lexical result set for pagination. | |
| sem_threshold | No | Max cosine distance for semantic hits (default 0.7; lower = stricter). Only used when mode="thorough". | |
| sem_limit | No | Max semantic hits (default 50, max 200). `capped` flags when reached. Only used when mode="thorough". |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||