Skip to main content
Glama

Survey Corpus (exhaustive)

survey_corpus
Read-onlyIdempotent

Survey the entire Pāli Tipiṭaka for a term with guaranteed complete lexical results, returning exact counts, per-pitaka breakdowns, and distinct matched forms.

Instructions

Exhaustively survey the WHOLE Tipiṭaka for a term — guaranteed complete.

Use this (not search_by_keyword) when the question is about coverage or counting rather than "show me the best passages":

  • "How many times does Kusinārā appear in the canon?"

  • "Every place ānāpānassati is mentioned — don't miss any"

  • "Which pitakas/how many suttas mention this term?"

Unlike search_by_keyword (ranked, capped at 50, no total), this returns an exact count, a per-pitaka breakdown, the distinct surface forms that matched (so you can audit and discard over-matches), and a paginated enumeration. The lexical result carries complete: true — a hard guarantee that nothing was dropped for the chosen match_scope.

Two layers, two different promises:

  • lexical — the word and its forms. Deterministic + EXHAUSTIVE.

  • semantic (mode="thorough", hosted only) — passages teaching the same concept with DIFFERENT vocabulary (e.g. ānāpānassati via assasati/passasati). Approximate, NOT exhaustive — it never claims completeness, it only boosts recall.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
keywordYesTerm to survey (Romanised Pāli preferred; diacritics optional — matching folds `ā→a`, `ṁ→m`, etc.).
languageNo"pali" (default) or "english". Thai is not indexed yet.pali
pitakaNoRestrict to "vinaya" / "sutta" / "abhidhamma", or None for all.
match_scopeNo"word" (default) matches the exact word/phrase only. "stem" also matches inflections + compounds via prefix (kusinārā → kusinārāyaṁ, kusināravagga …) — higher recall, may over-match (audit via `matched_forms`).word
modeNo"fast" (default) = lexical only — quick, no server-side ML, works offline. "thorough" = also run the semantic layer (hosted only; this is the heavier part). The lexical guarantee holds in BOTH.fast
page_sizeNoLexical results per page (default 20, max 100). Counts/forms cover the WHOLE corpus regardless of this.
cursorNoOffset into the full lexical result set for pagination.
sem_thresholdNoMax cosine distance for semantic hits (default 0.7; lower = stricter). Only used when mode="thorough".
sem_limitNoMax semantic hits (default 50, max 200). `capped` flags when reached. Only used when mode="thorough".

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description doesn't need to restate those. However, it adds valuable behavioral context: guarantees completeness for lexical mode, explicitly states that semantic mode is approximate and NOT exhaustive, explains that the lexical guarantee holds in both modes, and mentions pagination behavior. The only minor gap is not explicitly stating that it does not modify data, but annotations already cover that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections, bullet points, and examples. It front-loads the core guarantee and usage guidance. However, it is somewhat verbose, particularly in the second paragraph about layers and promises. Some sentences could be trimmed without loss of clarity, but overall it is still effective and organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, output schema exists), the description is comprehensive. It covers core functionality, usage context, mode differences, guarantees, and limitations. The output schema is referenced indirectly but its existence means return value details are not needed. The sibling tools list confirms differentiation. The description leaves no major gaps for an agent to misuse the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already describes all 9 parameters with 100% coverage, so the description's role is additive. It adds useful context like the lexical guarantee holding in both modes, that the Tp language is not indexed yet, and that audits via matched_forms are possible. While it doesn't add extensive new info for each parameter, it provides enough extra guidance to justify above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool exhaustively surveys the whole Tipiṭaka for a term with a guaranteed complete lexical search. It distinguishes itself from `search_by_keyword` by emphasizing coverage and counting rather than ranked results. The verb 'survey' combined with the resource 'corpus' and the term 'exhaustive' make the purpose very specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('when the question is about coverage or counting') and when not to ('not search_by_keyword for best passages'). It provides concrete examples of appropriate questions and directly names the alternative tool (`search_by_keyword`) with a clear contrast in capabilities (ranked, capped, no total). This makes usage guidance excellent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dhamma-seeker/tripitaka-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server