Vocab Voyage

Name: Vocab Voyage
Author: jaideepdhanoa

by io.github.jaideepdhanoa

Server Details

20 MCP tools + 17 widgets for SAT/ISEE/SSAT/GRE/GMAT/LSAT prep. Flashcards, quizzes & games. Hosted.

Status: Healthy
Last Tested: 2026-05-25 11:33
Transport: Streamable HTTP
URL
Repository: jaideepdhanoa/vocab-voyage-mcp
GitHub Stars: 0
Server Listing: Vocab Voyage

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.6/5.0

Tool DescriptionsA

Average 4/5 across 31 of 31 tools scored. Lowest: 2.7/5.

Server CoherenceA

Disambiguation4/5

Tools are mostly well-disambiguated with specific descriptions and usage notes, but the high number (31) creates potential confusion. Some pairs like get_session_detail and get_child_session_detail or get_class_session_trends and get_session_trends could cause misselection if the agent doesn't pay close attention to the audience restrictions.

Naming Consistency4/5

Naming broadly follows verb_noun pattern (e.g., get_definition, generate_quiz, file_support_ticket). However, there are minor inconsistencies like set_persona (set_) and update_adaptive_level (update_) among many get_ calls, and mixing of 'detail' vs 'trends' for similar endpoints.

Tool Count2/5

With 31 tools, the surface is overly fine-grained. Several tools could be merged (e.g., mark_word_difficult and mark_word_known into one tool with a parameter). The presence of meta-tools like list_starter_prompts and get_recommended_next_action adds clutter. This number will likely overwhelm agents.

Completeness5/5

The tool set covers nearly every aspect of a vocabulary learning app: definitions, flashcards, quizzes, games, study plans, progress tracking, mistakes, session details, adaptive levels, XP/streaks, support tickets, parent/tutor features, and guidance. No obvious gaps for the stated domain.

Available Tools

31 tools

award_game_xpBInspect

Award score-based XP from a game/activity (separate from study-time XP). Cascades to the leaderboard via DB trigger. Requires sign-in.

ParametersJSON Schema

Name	Required	Description	Default
`xp`	Yes	XP to award (>= 0).
`reason`	No	Optional human label for analytics.

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses cascading to leaderboard via DB trigger and sign-in requirement. Since annotations are absent, this adds needed behavioral context. However, could elaborate on side effects (e.g., irreversibility, rate limits).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each adding value: purpose, side effect, prerequisite. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and no annotations, the description provides purpose, a key side effect, and auth requirement. Could improve by specifying success/error responses, but adequate for a simple input tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents both parameters. The description adds no additional meaning beyond 'score-based' and 'separate from study-time XP'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it awards score-based XP from a game/activity, distinguishing it from study-time XP. The verb 'Award' and resource 'XP' are specific, but the tool name itself is redundant.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage: use for game/activity XP, not study-time XP. No explicit alternatives or when-not-to-use mentioned, but the context from siblings suggests other tools handle different XP types.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

explain_word_in_contextB

Read-onlyIdempotent

Inspect

Explain what a word means inside a specific sentence — useful when a word has multiple meanings.

ParametersJSON Schema

Name	Required	Description	Default
`word`	Yes
`sentence`	Yes	The sentence containing the word

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool 'explains' a word in context, implying a read-only operation, but doesn't describe the output format, potential errors (e.g., if the word isn't in the sentence), or any rate limits. For a tool with no annotations, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded: a single sentence stating the purpose, followed by a brief usage note. Every word earns its place with no redundancy or fluff, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and incomplete parameter documentation (50% schema coverage), the description is insufficient. It doesn't explain what the explanation output looks like, how it handles edge cases, or the tool's limitations. For a tool that likely returns textual explanations, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (only 'sentence' has a description). The description adds no parameter-specific semantics beyond what's implied by the tool's purpose. It doesn't explain the 'word' parameter's format or constraints, nor does it clarify the relationship between 'word' and 'sentence'. With low schema coverage, the description fails to compensate adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Explain what a word means inside a specific sentence.' It specifies the verb ('explain'), resource ('word'), and context ('inside a specific sentence'), distinguishing it from generic dictionary tools. However, it doesn't explicitly differentiate from siblings like 'get_definition' beyond mentioning multiple meanings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage guidance: 'useful when a word has multiple meanings.' This suggests when to use this tool (for disambiguation in context) but doesn't explicitly state when not to use it or name alternatives like 'get_definition' for standalone definitions. No prerequisites or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

file_support_ticketAInspect

File a real human-followup support ticket on behalf of the signed-in user. Use this when the user reports a billing problem, bug, account lockout, complaint about a tutor, or anything Sparkle/the agent cannot resolve from data. The ticket is emailed to the support team and a confirmation is sent to the user with a 1-business-day SLA. Categories: billing, bug, account, complaint, feedback, other. Requires sign-in.

ParametersJSON Schema

Name	Required	Description
`summary`	Yes	One-line description of the issue (what the user needs).
`category`	Yes	Issue category. Use 'billing' for refunds/charges, 'bug' for crashes/data loss, 'account' for lockouts/access, 'complaint' for tutor/quality issues, 'feedback' for feature requests.
`conversation_snippet`	No	Optional: last few turns of the conversation for context.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behavioral traits beyond annotations: the ticket is emailed to support, a confirmation is sent to the user, and there is a 1-business-day SLA. It also notes the requirement for sign-in. These details are not captured in annotations and add important context for the agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at four sentences, front-loaded with the primary purpose. Each sentence serves a distinct function: purpose, usage guidance, process details, and list of categories. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose, usage guidance, behavioral expectations, and parameter categories comprehensively. However, it does not mention what the tool returns (e.g., ticket ID or confirmation), and there is no output schema to compensate. For a tool that likely produces a result, this leaves a minor gap in understanding the full interaction.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for all parameters. The tool description does not add significant new information about parameters beyond what the schema already provides (e.g., category enum descriptions are present in schema). Thus, the description adds minimal semantic value, earning a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('File a real human-followup support ticket'), the resource (a support ticket), and the scope (on behalf of the signed-in user). It also lists specific use cases, making it distinct from sibling tools like get_flashcards or award_game_xp.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use this tool ('when the user reports a billing problem, bug, account lockout...') and implicitly when not to use it ('anything Sparkle/the agent cannot resolve from data'). This provides clear decision criteria for the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generate_quizB

Read-only

Inspect

Use this when the user wants to practice, be quizzed, or test their knowledge across multiple words at once. Generates a 1–10 question multiple-choice quiz for a test family (isee, ssat, sat, psat, gre, gmat, lsat, general). Renders the interactive Vocab Voyage quiz widget on supporting hosts; per-answer taps persist mastery for signed-in users. Do not use for definition lookups — call get_definition instead. Do not use for spaced-repetition flashcards — call get_flashcards instead.

ParametersJSON Schema

Name	Required	Description
`count`	No	Number of questions (1–10), default 5
`level`	No	Optional difficulty hint
`test_family`	Yes

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions the tool generates quizzes and returns questions with answers, but lacks details on behavioral traits such as whether it requires authentication, rate limits, how it selects vocabulary (e.g., random vs. based on level), or if the output is deterministic. For a tool with no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and front-loaded, consisting of a single sentence that efficiently conveys the core functionality (generate quiz), target (test families), and output (1–10 questions with answers). Every word earns its place with no redundancy or unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, no output schema, no annotations), the description is adequate but incomplete. It covers the purpose and basic parameters but lacks details on behavioral aspects (e.g., how difficulty is handled, quiz format) and doesn't explain return values beyond 'questions with answers'. For a quiz generation tool, more context on output structure would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning beyond the input schema by specifying the test families (e.g., isee, ssat, sat) which are not enumerated in the schema's 'test_family' property. It also clarifies the 'count' parameter range (1–10) and default (5), though the schema already describes this. With 67% schema description coverage (2 of 3 parameters described), the description compensates well by providing context for the undocumented 'test_family' parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate a multiple-choice vocabulary quiz for a test family' with specific test families listed. It distinguishes from siblings like 'explain_word_in_context' or 'get_definition' by focusing on quiz generation rather than word explanations or definitions. However, it doesn't explicitly differentiate from 'study_plan_preview' which might also involve test preparation content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by specifying the target test families (isee, ssat, sat, etc.) and that it returns 1–10 questions. However, it doesn't provide explicit guidance on when to use this tool versus alternatives like 'get_course_word_list' for vocabulary lists or 'study_plan_preview' for broader study plans. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_child_session_detailA

Read-onlyIdempotent

Inspect

Auth-only. Parent-only. Detailed breakdown for a single child's study/quiz session — accuracy, missed words, duration. Defaults to the most recent session for the parent's first linked child if no child_user_id / session_id is supplied. Ownership-gated: returns an error for unlinked children.

ParametersJSON Schema

Name	Required	Description	Default
`session_id`	No	Optional. Defaults to the child's most recent session.
`child_user_id`	No	Optional. Defaults to first linked child.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds behavioral traits beyond annotations: 'Auth-only', 'Parent-only', 'Ownership-gated: returns an error for unlinked children'. This provides important behavioral context not captured by annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. It front-loads key constraints (Auth-only, Parent-only) and provides essential details in a compact, readable format.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with no output schema and 0 required parameters, the description covers purpose, default behavior, authentication, and ownership. While it doesn't list all return fields beyond accuracy, missed words, and duration, this is sufficient given the tool's simplicity and annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value by specifying default behavior for both parameters: defaults to the child's most recent session if session_id is omitted, and defaults to the first linked child if child_user_id is omitted. This behavioral context enhances the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a detailed breakdown of a child's study/quiz session, listing specific fields (accuracy, missed words, duration). It distinguishes itself from siblings like get_session_detail by specifying parent-only and child scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: requires authentication and parent role, defaults to most recent session for first linked child if parameters omitted, and ownership-gated for unlinked children. It lacks explicit mention of when not to use this tool versus alternatives like get_session_trends, but the context is sufficient for an agent to infer.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_class_session_trendsA

Read-onlyIdempotent

Inspect

Auth-only. Tutor-only. Aggregate class-level trends across the tutor's classes (default 14 days, max 30). Pass class_id to scope to one class; omit it to get a worst-first rollup across up to 25 classes plus 1–3 struggling students.

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Window size in days (default 14, max 30).
`class_id`	No	Optional class id to scope to one class.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and destructiveHint, but the description adds valuable behavioral context: auth/role requirements, default and maximum time window, and the rollup behavior with/without class_id. It does not contradict annotations. Slight deduction for not mentioning potential data limits or caching.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with auth requirements, then behavior. Every sentence adds value, no fluff. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, but the description provides a reasonable notion of what is returned ('aggregate trends', 'worst-first rollup', 'struggling students'). Could be more explicit about the exact fields or format, but it's sufficient for an AI agent to understand the tool's output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions, but the description adds extra meaning beyond the schema: it explains the effect of omitting 'class_id' (rollup across 25 classes + struggling students) and reinforces defaults/max for 'days'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Aggregate class-level trends') and resource ('class session trends'), and distinguishes from siblings like 'get_session_trends' by specifying scope ('across the tutor's classes').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states auth and role requirement ('Auth-only. Tutor-only.'), and provides clear guidance on when to use the 'class_id' parameter versus omitting it, including the behavior difference ('worst-first rollup across up to 25 classes plus 1–3 struggling students').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_class_standingA

Read-onlyIdempotent

Inspect

Use this when a signed-in student asks how they're doing in their tutor class, who's ahead, who their rival is, or who they should challenge. Auth-only. Returns weekly XP rank inside the user's tutor class plus a winnable rival suggestion (similar weekly XP). NEVER name the class leader unless the user is rank #1 — the response uses '(top student)' as a deliberate placeholder. Renders the interactive class-standing widget on supporting hosts; falls back to markdown elsewhere. Anonymous callers receive a sign-in prompt.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description reveals key behaviors: auth-only, privacy of class leader (placeholder 'top student' unless user is rank #1), and rendering differences based on host support. Annotations confirm read-only and idempotent nature; description adds context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat verbose but front-loaded with purpose and usage. Each sentence contributes value, though a slight reduction could improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no parameters and no output schema, the description adequately covers what the tool returns (rank, rival) and key behaviors (auth, leader privacy, rendering fallback). Complete for a zero-parameter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so baseline is 4. The description adds no parameter info, which is appropriate since there are none.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns weekly XP rank inside the user's tutor class and a winnable rival suggestion. It uses specific verbs and resource names, and distinguishes itself from siblings by focusing on class standing and rival challenges.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use this tool (signed-in student asking about class standing, rivals) and notes auth requirement and anonymous fallback. It lacks explicit alternative guidance but is clear enough for the intended scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_course_word_listA

Read-onlyIdempotent

Inspect

Get a sample of vocabulary words from a specific Vocab Voyage course. Use list_courses to discover slugs.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	1–50, default 20
`course_slug`	Yes

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions that it returns a 'sample' of words, which hints at limited output, but doesn't specify details like pagination, rate limits, authentication needs, or error handling. For a read operation with no annotation coverage, this leaves significant gaps in understanding how the tool behaves beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with zero waste: the first states the purpose, and the second provides essential usage guidance. It's front-loaded with the core action and efficiently includes only necessary information, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 2 parameters, 50% schema coverage, no annotations, and no output schema, the description is minimally adequate. It covers purpose and basic usage but lacks details on return values, error cases, or behavioral traits. For a simple read tool, it's passable but could be more complete to fully guide the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50%, with 'limit' having a description ('1–50, default 20') but 'course_slug' undocumented. The description adds value by explaining that 'course_slug' comes from 'list_courses,' providing context beyond the schema. However, it doesn't fully compensate for the lack of schema info on 'course_slug,' such as format or examples, keeping it at the baseline for partial coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get a sample of vocabulary words') and the target resource ('from a specific Vocab Voyage course'), making the purpose understandable. It distinguishes from siblings like 'get_definition' or 'list_courses' by specifying it's about course vocabulary samples. However, it doesn't explicitly differentiate from 'study_plan_preview' which might also involve course content, keeping it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool by stating 'Use list_courses to discover slugs,' which implies this tool should be used after identifying a course via 'list_courses.' It doesn't explicitly mention when not to use it or name alternatives among siblings, but the guidance is practical and helpful for agent workflow.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_definitionA

Read-onlyIdempotent

Inspect

Use this when the user asks what a specific word means, requests its definition, part of speech, synonyms/antonyms, or an example sentence. Returns curated dictionary data from the Vocab Voyage corpus. Do not use for sentence-level meaning disambiguation (call explain_word_in_context) or for daily word prompts (call get_word_of_the_day).

ParametersJSON Schema

Name	Required	Description	Default
`word`	Yes	The word to define

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it states what information is returned (definition, part of speech, etc.), it doesn't describe behavioral traits such as error handling (e.g., for misspelled words), rate limits, authentication needs, or whether it accesses a specific dictionary source. This leaves gaps for an agent to understand operational constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads all key information without unnecessary words. It directly lists the outputs (definition, part of speech, example sentence, synonyms/antonyms) and the action ('look up'), making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one parameter, no nested objects) and lack of annotations or output schema, the description is minimally adequate. It covers the core purpose but lacks details on behavioral aspects like error responses or data sources. For a simple lookup tool, this might suffice, but it doesn't provide complete operational context for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the parameter 'word' clearly documented as 'The word to define'. The description adds no additional semantic context beyond what the schema provides, such as format constraints (e.g., case sensitivity) or examples. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but doesn't detract either.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('look up') and resources ('definition, part of speech, example sentence, and synonyms/antonyms for a vocabulary word'). It distinguishes from siblings like 'explain_word_in_context' (which focuses on contextual usage) and 'generate_quiz' (which creates assessments).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for vocabulary word lookup but doesn't explicitly state when to use this tool versus alternatives. For example, it doesn't clarify if this should be used instead of 'explain_word_in_context' for basic definitions or 'get_word_of_the_day' for curated words. No explicit exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_flashcardsA

Read-only

Inspect

Use this when the user asks for flashcards, wants to drill words individually, or wants a tap-to-flip review session. Returns 1–12 cards for a test family. Renders the interactive Vocab Voyage flashcards widget on supporting hosts; per-card 'I knew it / I didn't' buttons persist mastery for signed-in users. Do not use for multiple-choice testing (call generate_quiz) or for a single word lookup (call get_definition).

ParametersJSON Schema

Name	Required	Description	Default
`count`	No	Number of cards (1–12), default 5
`test_family`	No	isee, ssat, sat, psat, gre, gmat, lsat, general

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are not provided, so description carries full burden. It discloses rendering behavior (widget vs markdown) and persistence of mastery for signed-in users, which is helpful. However, it does not mention any destructive actions or rate limits. Without annotations, a 3 is appropriate as it adds some context but not comprehensive behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise (three sentences) and front-loaded: first sentence states core purpose, second explains rendering behavior, third adds detail on persistence. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately covers the tool's function and behavior. It explains the return format (deck of cards), rendering differences, and persistence. The complexity is low; missing details like whether the cards are shuffled or sorted are minor. A 4 is appropriate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for count (number of cards, default 5) and test_family (enum of test names). The description adds context about the range (1-12) and default, and explains the rendering and persistence behavior, which goes beyond the schema. Thus a 4 is justified.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool returns a small flashcard deck (1–12 cards) for a test family. It also distinguishes itself by mentioning widget rendering and persistence of mastery, which sets it apart from sibling tools like generate_quiz or get_sparkle_guidance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for reviewing vocabulary flashcards but doesn't explicitly state when to use this tool vs alternatives like generate_quiz. No guidance on when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_my_progressA

Read-only

Inspect

Use this when the signed-in user asks about their own streak, XP, words mastered, recent activity, or 'how am I doing'. Auth-only personal dashboard. Renders the interactive Vocab Voyage progress widget on supporting hosts; falls back to markdown elsewhere. Anonymous callers receive a sign-in prompt. Do not use for global stats or other users' progress.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description carries full burden. It clearly indicates that the tool is auth-only, renders a widget on supporting hosts or falls back to markdown, and handles anonymous users with a prompt. This provides good behavioral context beyond the name.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, each adding distinct value. First sentence lists data fields, second sentence explains rendering behavior and auth handling. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters, no output schema, and no annotations, the description covers the essential aspects: what data it returns, auth requirement, fallback behavior. Could optionally mention if data is read-only, but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has zero parameters and schema coverage is 100%. The description explains that it returns user-specific progress data without needing any parameters, which is sufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Describes the tool as an auth-only dashboard for the signed-in user, listing specific data returned (streak, XP, mastery split, next-up words, recent misses). Distinguishes clearly from siblings by focusing on personal progress summary.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states it is for the signed-in user, and that anonymous callers get a sign-in prompt. However, it does not explicitly mention when not to use this tool vs alternatives (e.g., when to use get_recommended_next_action instead).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_pending_invitesA

Read-onlyIdempotent

Inspect

Use this when the signed-in user asks about pending parent invites, share codes, or whether their parent invite has been accepted yet. Returns each pending invite with hours_until_expiry. RULE: if any invite has hours_until_expiry < 24 (and not expired), proactively offer to resend it via the resend-parent-invite flow. If expired, offer to send a fresh invite. Requires sign-in.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and idempotent. Description adds valuable behavioral context: returns hours_until_expiry, proactive resend rule, and sign-in requirement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded with usage condition, and includes a clear rule. Every sentence adds value with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and no output schema, description covers purpose, usage, return info, and behavioral rules comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters, so schema coverage is 100% trivially. Description adds value by mentioning output fields (hours_until_expiry), fulfilling baseline expectations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it handles pending parent invites, share codes, and invite acceptance status. Distinguishes from sibling tools as no other tool mentions invites.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('when the signed-in user asks about...'), includes a rule for proactive action based on expiry, and notes requirement for sign-in. Lacks explicit when-not or alternatives but sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_recent_mistakesA

Read-onlyIdempotent

Inspect

Use this when the signed-in user asks about words they've gotten wrong, missed words, words to review, or wants to revisit recent mistakes. Returns up to 25 words from the last N days (default 7) with miss-rate and last-seen timestamp, plus a link to the in-app Recent Mistakes page. SUMMARISE — never dump every row; tell the user the count, name 2–3 sample words, and recommend the page URL. Requires sign-in.

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Lookback window in days (1–90, default 7)
`limit`	No	Max words to return (1–50, default 10)

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. Description adds concrete behavioral details: returns up to 25 words, default 7 days, fields (miss-rate, last-seen, URL), and requires sign-in. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences plus a succinct instruction. Front-loaded with use case, then returns, then summarization rule. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Completely covers the tool's purpose, parameters, return fields, and usage guidelines. No gaps given the simple nature and existing schema/annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with descriptions. Description adds default values for days (7) and hints at max limit (25, though schema allows up to 50—minor inconsistency, but still helpful). Also clarifies summarization behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb+resource: retrieves recent mistakes (words gotten wrong). Distinguishes from siblings like get_flashcards or get_word_of_the_day by specifying the use case exactly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('when the signed-in user asks about words they've gotten wrong...') and provides a summarization instruction. Does not explicitly mention when not to use, but the context is clear given sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_recommended_next_actionA

Read-only

Inspect

One-line 'do this next' hint derived from the user's current lifecycle phase. Useful when the agent wants a quick recommendation without rendering a full guidance card.

ParametersJSON Schema

Name	Required	Description	Default
`persona`	No	Optional persona override.

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses that result is derived from lifecycle phase and is a 'hint', but does not clarify whether it modifies state, requires authentication, or what happens if no recommendation exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with clear, front-loaded purpose and usage context. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity (0 required params, 1 optional, no output schema), the description covers purpose and usage adequately. Could add brief note about return type or lifecycle phase sources.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and parameter 'persona' is described as 'Optional persona override.' Description does not add additional meaning beyond schema, but the schema is clear. Baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it provides a 'do this next' hint based on lifecycle phase, distinguishing it from full guidance tools like get_sparkle_guidance. Verb 'get' and resource 'recommended next action' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'useful when the agent wants a quick recommendation without rendering a full guidance card', which implies when to use over alternatives. Does not specify when not to use or name sibling alternatives explicitly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_session_detailA

Read-onlyIdempotent

Inspect

Use this when the signed-in user asks 'what did I miss in [that session]', 'which words tripped me up', or 'what was my accuracy on session X'. Pass a session_id (study_sessions.id or adaptive_sessions.id, usually obtained from get_recent_session_results / a picker chip). Returns title, accuracy %, wrong_words[] (max 10), and a per-card timeline (truncated to first 20 events). Cite at least one wrong word and the accuracy in your reply.

ParametersJSON Schema

Name	Required	Description	Default
`session_id`	Yes	study_sessions.id or adaptive_sessions.id (UUID)
`include_timeline`	No	Include per-card timeline (default true). Truncated to 20 events.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the description's behavioral detail is supplementary. It adds truncation limits (max 10 wrong words, first 20 timeline events) which is useful, but not necessary beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences. First sentence gives usage context, second explains parameter and return value, third gives usage instruction. No wasted words; front-loaded with crucial information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description fully explains return fields (title, accuracy percentage, wrong_words max 10, per-card timeline truncated to 20). Combined with input schema and annotations, the tool is completely specified for a read operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds context: session_id is obtained from get_recent_session_results or a picker chip, and include_timeline defaults to true with truncation. This adds meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns session details (title, accuracy, wrong words, timeline) and provides example user queries. It distinguishes slightly by mentioning how to obtain the session_id from related tools, but does not explicitly differentiate from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly specifies when to use this tool with concrete user utterances (e.g., 'what did I miss in that session'). Provides context for when to invoke, but does not mention when not to use or suggest alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_session_trendsA

Read-onlyIdempotent

Inspect

Auth-only. Personal study trends over a window (default 14 days, max 90): session count, total minutes, accuracy trend (up/down/flat), and top-missed words. Use after a user asks 'how am I trending / am I improving / which words keep tripping me up'.

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Window size in days (default 14, max 90).

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. The description adds 'Auth-only', specifies the default window (14 days) and maximum (90), and lists the output components (session count, minutes, accuracy trend, top-missed words), providing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences cover purpose, output details, usage trigger, and parameter constraints. No redundant or extraneous text; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only tool with one optional parameter and no output schema, the description is complete. It explains what the tool returns, its default/max, and when to use it, meeting all needs for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema description already includes 'Window size in days (default 14, max 90).' The tool description repeats this info but does not add new parameter semantics; however, it reinforces the parameter's role in context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides personal study trends (session count, total minutes, accuracy trend, top-missed words) over a window. It also gives example user queries that trigger usage, distinguishing it from siblings like get_class_session_trends.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use after a user asks...' which defines the context. It lacks explicit 'do not use' statements but the intention is clear and the sibling get_class_session_trends exists as an alternative for class-level trends.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_sparkle_guidanceA

Read-only

Inspect

Returns Vocab Voyage's lifecycle-aware guidance: the user's current phase (e.g. student.at_risk), a friendly greeting, 2–3 recommended tool calls, and an optional CTA. Renders the session-debrief widget on supporting hosts. Anonymous callers get visitor.* phase suggestions.

ParametersJSON Schema

Name	Required	Description	Default
`persona`	No	Optional persona override: student \| parent \| tutor \| explorer.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses widget rendering on supporting hosts and anonymous behavior. However, does not mention if it has side effects (e.g., logging), rate limits, or permissions needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with clear structure, front-loaded with key outputs. No wasted words, but could be slightly more structured (e.g., bullet points).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple input and no output schema, the description adequately covers what the tool returns and its behavior. Could mention more about the format of recommended tool calls or CTA structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single optional parameter. Description doesn't add much beyond the schema's description, which already covers the persona override. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool returns lifecycle-aware guidance including user's current phase, greeting, recommended tool calls, and CTA. It distinguishes from siblings by specifying the widget rendering and anonymous visitor behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies when to use (e.g., when needing lifecycle-aware guidance) and covers anonymous callers, but doesn't explicitly state when not to use or alternative tools. Sibling names like get_recommended_next_action might overlap but no comparison given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_study_plan_recommendationA

Read-only

Inspect

Auth-only. Returns a personalized N-day study plan (default 7, range 3–7) chosen from one of four focus modes (weak-topic-drill / streak-recovery / new-words / review-mastery) based on the user's recent trends. Inline only the first 3 days; full plan persists when the user clicks the Vocab Voyage start link.

ParametersJSON Schema

Name	Required	Description	Default
`horizon_days`	No	Plan length in days (3–7, default 7).

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and non-destructive. Description adds behavioral details on focus modes, inline limitation to first 3 days, and persistence via a start link, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first covers authentication and core function, second details behavioral specifics. No wasted words, well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given a single parameter and no output schema, the description covers focus modes, default/range, and inline behavior, sufficiently complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers the parameter with 100% description, but description adds contextual info on default (7) and range (3-7), adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a personalized N-day study plan, specifies focus modes, and differs from siblings like 'get_recommended_next_action' or 'study_plan_preview' by detailing output and behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Indicates 'Auth-only' and describes when to use for personalized study plans based on recent trends, but does not explicitly exclude alternatives or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_word_of_the_dayA

Read-onlyIdempotent

Inspect

Use this when the user asks for today's word, a daily vocabulary nudge, or a single-word warmup. Returns today's deterministic Word of the Day (definition, part of speech, example, synonyms/antonyms), optionally scoped to a test family (isee, ssat, sat, psat, gre, gmat, lsat, general). Do not use for arbitrary lookups — call get_definition instead.

ParametersJSON Schema

Name	Required	Description	Default
`test_family`	No	Optional test family: isee, ssat, sat, psat, gre, gmat, lsat, general

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the tool's read-only nature implicitly through 'Returns' and adds context about the optional test family parameter, but it lacks details on rate limits, authentication needs, or what happens if no data is available, leaving behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences that are front-loaded and efficient. The first sentence states the core purpose, and the second adds necessary detail about the optional parameter, with zero wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one optional parameter, no annotations, no output schema), the description is complete enough for basic usage but lacks details on output format, error handling, or dependencies, which could be helpful for an AI agent in more complex scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents the optional 'test_family' parameter with its enum values. The description adds minimal value by mentioning the optional scoping but does not provide additional semantics beyond what the schema specifies, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Returns') and resource ('today's vocabulary Word of the Day'), and it distinguishes from siblings by focusing on a daily vocabulary feature rather than definitions, quizzes, or course lists.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage by specifying the optional test family scoping, but it does not explicitly state when to use this tool versus alternatives like 'get_definition' or 'get_course_word_list', missing explicit exclusions or comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_coursesA

Read-onlyIdempotent

Inspect

Lists all 13 Vocab Voyage courses with their slugs and descriptions.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool returns a fixed set of 13 courses, which is useful behavioral context. However, it doesn't mention whether the list is static or dynamic, if there are rate limits, or what format the output takes (e.g., JSON array).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that efficiently conveys the tool's purpose and output details without any redundant or unnecessary words. It is front-loaded with the core action and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (0 parameters, no annotations, no output schema), the description is adequate but has gaps. It specifies the number of courses and attributes returned, but doesn't describe the output structure (e.g., list format) or behavioral constraints, which could be helpful for an agent to interpret results correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters, and schema description coverage is 100% (though empty). The description appropriately states there are no inputs needed, which aligns with the schema. Since there are no parameters to document, a baseline of 4 is applied as it doesn't need to compensate for any gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the action ('Lists') and the resource ('all 13 Vocab Voyage courses'), including specific attributes returned ('slugs and descriptions'). It clearly distinguishes from siblings like 'get_course_word_list' (which focuses on words within a course) and 'study_plan_preview' (which is about planning).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for retrieving course metadata, but provides no explicit guidance on when to use this tool versus alternatives like 'get_course_word_list' (for word-level details) or 'explain_word_in_context' (for word explanations). It lacks any 'when-not' or prerequisite information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_starter_promptsA

Read-onlyIdempotent

Inspect

Lists Vocab Voyage's MCP starter prompts (also exposed via the standard MCP prompts/list endpoint). Useful for hosts that don't yet support prompts/list.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. The description notes that these prompts are also exposed via the standard MCP endpoint, which provides transparency about duplication. However, it does not describe any other behavioral traits (e.g., whether it's read-only, performance, or error cases). Since it's a simple list operation with no parameters, the description is adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, each serving a purpose: the first states what the tool does and links it to the standard MCP endpoint, the second gives usage guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple (0 parameters, no output schema) with no sibling overlap. The description fully covers its purpose and usage context. Nothing is missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters (0), so schema description coverage is 100%. The description adds no parameter information because none is needed. Baseline for 0 params is 4, but the description is concise and complete, so a 5 is justified.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists Vocab Voyage's MCP starter prompts and explicitly distinguishes them from the standard MCP prompts/list endpoint. The verb 'lists' and resource 'starter prompts' are specific, and the context about the alternative endpoint adds differentiation from siblings (none of which are similar).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Useful for hosts that don't yet support prompts/list', which tells the agent when to use this tool (when the standard endpoint is unavailable) and implies when not to use it (when prompts/list works). This is clear guidance without needing to name a sibling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mark_word_difficultB

Idempotent

Inspect

Manually mark a word as still-learning for the signed-in user (resets mastery toward learning band). Requires sign-in.

ParametersJSON Schema

Name	Required	Description	Default
`word`	No
`card_id`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the action resets mastery toward the learning band, which adds context beyond a simple 'mark as difficult'. However, with no annotations provided, it could detail more behavioral traits like whether the action is reversible or affects other data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with a single sentence, but the requirement note is separated by a period. It efficiently conveys the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no output schema and two undocumented parameters, the description provides essential purpose and a side effect, but lacks details on parameter roles and return value, leaving gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage for the two parameters (word and card_id). The description does not explain what each parameter represents or how they relate (e.g., whether both are required or how card_id is used).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'mark' and the resource 'word as still-learning', and specifies the user scope. It distinguishes from sibling tool 'mark_word_known' by indicating a different mastery band effect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the requirement of sign-in, which is helpful, but does not provide explicit guidance on when to use this tool versus alternatives like mark_word_known or record_word_result.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mark_word_knownC

Idempotent

Inspect

Manually mark a word as mastered for the signed-in user (same as the flashcard 'I knew this' override). Requires sign-in.

ParametersJSON Schema

Name	Required	Description	Default
`word`	No
`card_id`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description mentions it requires sign-in and marks a word as mastered, which implies a write operation with authentication. With no annotations provided, the description carries the full burden, but it lacks details on whether the action is reversible, if it affects other data, or any rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded with the main action. Every sentence adds value: the first states the purpose, the second provides analogy and constraint. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and 0% parameter coverage, the description is insufficient. It does not explain return values, side effects, or how the two parameters interact. For a tool that likely mutates user state, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must compensate. It does not explain what 'word' or 'card_id' are used for, nor their relationship. The description only adds that the tool requires sign-in, which is behavioral, not parameter-specific.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool marks a word as mastered for the signed-in user, and equates it to a flashcard 'I knew this' override. It specifies the action and resource, though it doesn't differentiate from 'mark_word_difficult' which is a sibling with similar purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'mark_word_difficult' or 'record_word_result'. The mention of 'same as the flashcard override' gives some context, but no explicit when-to-use or when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

nudge_childAInspect

Parent-only. Sends a 'check-in' push notification (and email fallback) to a linked child. Use when the parent says things like 'remind my kid to study', 'nudge my child', 'tell Sam to do their words today'. The server enforces a 24h cooldown per child — if rate-limited the response includes retry_after_hours. NEVER spoof a different parent — the calling user must already be linked to the child. Requires sign-in.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Optional short reason (≤200 chars), e.g. 'streak at risk'
`message`	No	Optional personal message (≤280 chars) shown to the child
`child_user_id`	Yes	user_id of the linked child to nudge

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes server-enforced 24h cooldown with retry_after_hours in response, and email fallback. Annotations already declare non-read-only and non-destructive, so description adds value with these behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, no fluff. Front-loaded with key info. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers prerequisites, behavior, cooldown, and fallback. No output schema, but response details (retry_after_hours) are mentioned. Missing explicit error handling but sufficient for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. Description adds examples and character limits for parameters (e.g., 'Optional short reason (≤200 chars)') which go beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Explicitly states 'Sends a check-in push notification (and email fallback) to a linked child' and gives specific example phrases that trigger the tool. Clearly distinguishes itself from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use examples like 'remind my kid to study'. Also states constraints: parent-only, requires sign-in, never spoof parent, and mentions 24h cooldown. Does not explicitly mention alternative tools but given siblings are unrelated, this is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

play_gameAInspect

Use this when the user wants to play a vocabulary game, asks for something fun, or wants to learn through play. Launches one of 11 mini-games inside the host chat. Renders the matching ui://vocab-voyage/game/{slug} widget on supporting hosts; falls back to a deep link elsewhere. Per-question answers persist via record_word_result; round completion fires record_session_complete + award_game_xp so MCP play counts toward streaks, XP, and mastery for signed-in users. Supported slugs: word_match, spelling_bee, speed_round, synonym_showdown, word_scramble, fill_in_blank, context_clues, word_guess, picture_match, crossword, word_search. Do not use for a serious test-prep quiz — call generate_quiz instead.

ParametersJSON Schema

Name	Required	Description
`slug`	Yes	Game slug: word_match \| spelling_bee \| speed_round \| synonym_showdown \| word_scramble \| fill_in_blank \| context_clues \| word_guess \| picture_match \| crossword \| word_search
`count`	No	Words in the round (4–12, default 8)
`test_family`	No	Optional: isee, ssat, sat, psat, gre, gmat, lsat, general

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description does a good job disclosing behavior: it renders a widget or falls back to a deep link, persists answers via record_word_result, and fires record_session_complete + award_game_xp. However, it doesn't clarify if any destructive side effects occur (e.g., overwriting progress) or mention authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but efficient, covering all key aspects in a few sentences. It could be slightly more structured (e.g., separate sections for behavior, parameters, usage), but it's not verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and no annotations, the description covers purpose, behavior, parameters, side effects, and related tools. It provides enough information for an agent to decide when and how to invoke this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value by explaining the purpose of 'count' (words in round, default 8) and 'test_family' (optional for tailored vocab), and reiterates the slugs list. It doesn't add much beyond what's in the schema, but the additional context on defaults and intent justifies a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it launches a Vocab Voyage mini-game, specifies the UI widget or fallback behavior, and lists all supported slugs. It distinguishes itself from sibling tools by describing its role as launching a game, not scoring or managing words.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use this tool (to launch a game) and explicitly mentions alternatives: use list_courses + test_family hint for tailored vocab. It also hints at when not to use it (for word lookup or progress tracking, which have their own tools).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

record_session_completeBInspect

Record a completed study session: writes study_sessions, awards study-time XP (+1/min, capped 30/day), and updates the daily streak. Use after a play_game / quiz / flashcard session ends. Requires sign-in.

ParametersJSON Schema

Name	Required	Description
`deck_id`	No	Optional deck UUID; omit for ad-hoc MCP sessions.
`total_count`	No
`session_type`	No	e.g. mcp_word_match, mcp_flashcard, mcp_quiz.
`cards_studied`	Yes
`correct_count`	No
`session_title`	No
`time_spent_seconds`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses mutation (writes, awards, updates), but doesn't specify error states, idempotency, or concurrency behavior. The XP cap and daily streak update are useful behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, three sentences with clear structure: action + effects + usage context. No wasted words, but could be more efficient by removing redundant phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 7 params and no output schema, the description provides essential behavioral context (XP rate, daily cap, streak) but lacks detail on many input parameters and return value. The sign-in requirement is helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (29%). Description only mentions time_spent_seconds implicitly via XP rate, but doesn't explain cards_studied, correct_count, total_count, or session_title beyond schema. Many parameters lack meaningful description in both schema and description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool records a completed study session and lists three specific side effects (writes to study_sessions, awards XP, updates streak). It also distinguishes from siblings by suggesting use after specific activities (play_game, quiz, flashcard). However, it doesn't differentiate from sibling tools like record_word_result.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use after a play_game / quiz / flashcard session ends' and mentions sign-in requirement. Lacks explicit when-not-to-use or alternatives for similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

record_word_resultAInspect

Persist a single word answer (correct/incorrect) to the user's mastery progress. Mirrors the web app's word-mastery scaling so MCP study counts toward leaderboards and streaks. Requires sign-in.

ParametersJSON Schema

Name	Required	Description
`word`	No	The word answered (preferred for human input).
`card_id`	No	Card UUID (preferred when known from a prior tool result).
`is_correct`	Yes
`question_type`	No	e.g. multiple_choice, fill_in_blank, flashcard.
`quiz_attempt_id`	No	Optional quiz_attempt UUID to record a per-question row.
`selected_answer`	No
`time_taken_seconds`	No

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses that results count toward leaderboards and streaks, and that it requires sign-in. However, it does not mention whether the operation is idempotent, what happens on duplicate answers, or if there are rate limits. The description adds moderate value but lacks depth for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with no wasted words. It front-loads the purpose, then adds behavioral context (mirroring, leaderboards), and ends with a prerequisite. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and moderate parameter complexity (7 params, 1 required), the description is reasonably complete. It explains the tool's role in the larger system (mastery progress, leaderboards). However, it does not describe return values or error conditions, which would be expected for a mutation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 57%, and the description does not detail parameters. However, the schema itself provides descriptions for most parameters, including usage hints (e.g., 'preferred for human input' for word, 'preferred when known from a prior tool result' for card_id). The description compensates by explaining the overall purpose but does not add meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it persists a single word answer (correct/incorrect) to mastery progress, with specific verb 'persist' and resource 'word answer'. It distinguishes from sibling tools like 'mark_word_known' or 'record_session_complete' by specifying it records individual answer correctness for mastery scaling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use it: to record a single word answer for mastery progress, and that it mirrors web app scaling. It mentions the prerequisite 'requires sign-in' but does not explicitly state when not to use it (e.g., for batch recording or session-level recording, which would be covered by 'record_session_complete').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resend_pending_inviteAInspect

Resend a pending parent invite by id. Use after get_pending_invites surfaces an invite expiring in <24h, or when the user explicitly asks to resend. Re-emails the existing invite_token; no new code is generated. 60s per-invite cooldown. Caller must own the invite. Requires sign-in.

ParametersJSON Schema

Name	Required	Description	Default
`invite_id`	Yes	The id field returned by get_pending_invites.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral details beyond annotations: 60s cooldown, re-emails existing token, caller must own invite. Annotations already indicate mutation, so credit for extra context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Multiple sentences each add value; no fluff. Front-loaded with core action. Could be slightly more structured but clear and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 1-param tool with no output schema, covers purpose, usage, behavioral traits, and permissions. Lacks error handling details but adequate given simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers invite_id with adequate description. The tool description adds little beyond 'by id' and referencing get_pending_invites. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Resend a pending parent invite by id' and provides specific use cases when to invoke, distinguishing from siblings like get_pending_invites.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use (after expiring invite or user request), what it doesn't do (no new code), and includes cooldown and ownership prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_personaA

Read-onlyIdempotent

Inspect

Bias subsequent Sparkle guidance toward a persona (student | parent | tutor | explorer). Session-scoped: the host should pass the chosen persona back to get_sparkle_guidance.

ParametersJSON Schema

Name	Required	Description	Default
`persona`	Yes	student \| parent \| tutor \| explorer

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the effect is session-scoped, which is a key behavioral trait beyond what annotations (none provided) would indicate. It also mentions that the persona needs to be passed back to get_sparkle_guidance, which clarifies a dependency. However, it does not disclose any other behaviors like whether it overwrites previous settings or has side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that cover purpose, scope, and usage. Every sentence is necessary and informative. There is no wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single parameter, no output schema), the description is nearly complete. It explains what the tool does, its scope, and how it fits with a sibling tool. The only minor gap is that it does not specify whether setting the persona persists across multiple calls or only affects the next guidance call, but 'session-scoped' implies the duration.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning beyond the schema by explaining the role of the persona parameter: it biases guidance toward a specific persona. The schema lists the allowed values but the description clarifies that this is used to influence get_sparkle_guidance. Since schema coverage is 100% and there is only one parameter, the description provides useful context, earning a score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to bias subsequent Sparkle guidance toward a specific persona. The verb 'bias' and resource 'Sparkle guidance' are specific. It distinguishes itself from siblings like 'get_sparkle_guidance' by indicating it sets a session-scoped parameter that affects future guidance, rather than directly retrieving guidance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: it is session-scoped and should be used before calling get_sparkle_guidance. It tells when to use it (to bias guidance) and hints that the host should pass the chosen persona back to get_sparkle_guidance, implying a workflow. However, it does not explicitly exclude any alternatives or provide when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

study_plan_previewA

Read-onlyIdempotent

Inspect

Use this when the user asks for a study plan, a multi-day prep schedule, or how to prepare for a test by date. Returns a 7-day plan (5 words/day) for a given test family. Renders the interactive Vocab Voyage study-plan widget on supporting hosts; tapping 'Start Day N' launches a flashcard session seeded with that day's words. Do not use for a single quiz session — call generate_quiz instead. Do not use for one-off lookups — call get_definition instead.

ParametersJSON Schema

Name	Required	Description	Default
`target_date`	No	Optional ISO date (YYYY-MM-DD)
`test_family`	Yes

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool returns a sample plan but does not describe what 'sample' entails (e.g., mock data, limited scope), whether it requires authentication, rate limits, or error handling. The description lacks details on behavioral traits beyond the basic operation, leaving significant gaps for an agent to understand its full behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first clause and efficiently adds optional details in a second clause. Both sentences earn their place by specifying key constraints (7-day, 5 words/day) and parameter roles, with zero wasted words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no annotations, no output schema), the description is minimally adequate. It covers the basic operation and parameters but lacks details on output format, error conditions, or behavioral nuances. Without annotations or output schema, more completeness is needed for an agent to use it effectively, but it meets a baseline for a read-only tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (only 'target_date' has a description). The description adds value by clarifying that 'test_family' is required and 'target_date' is optional for context, which is not evident from the schema alone. However, it does not explain the semantics of 'test_family' (e.g., what values are valid) or provide examples, partially compensating for the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Returns a sample 7-day study plan') with precise resource details (5 words/day for a given test family). It distinguishes itself from sibling tools like 'generate_quiz', 'get_course_word_list', and 'get_word_of_the_day' by focusing on multi-day planning rather than immediate content generation or retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'for a given test family' and 'optional target date for context', but it does not explicitly state when to use this tool versus alternatives like 'list_courses' or 'get_course_word_list'. No exclusions or specific prerequisites are provided, leaving the guidelines at an implied level.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_adaptive_levelCInspect

Run the adaptive-mastery promotion logic for the signed-in user (delegates to the web app's update-adaptive-mastery function). Requires sign-in.

ParametersJSON Schema

Name	Required	Description	Default
`course_id`	No
`total_count`	Yes
`correct_count`	Yes
`words_studied`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so description must disclose behavioral traits. It mentions delegation to a web function and requires sign-in, but doesn't explain if it mutates data, returns a result, or has side effects. It doesn't mention rate limits, idempotency, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: two sentences. However, the first sentence is slightly dense and could be more readable. It front-loads the main action, but the parenthetical could be clearer.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema and annotations, the description is incomplete. It doesn't explain what the tool returns, if any, or how the parameters affect the logic. For a tool with 4 parameters and no additional documentation, it leaves significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% (no descriptions in schema), but the description does not explain what the parameters mean. 'words_studied', 'correct_count', and 'total_count' are self-explanatory from their names, but 'course_id' is not described. However, the schema already provides types, so the description adds no semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it runs adaptive-mastery promotion logic, which is a specific action on a resource (the user's adaptive level). It distinguishes from siblings like 'generate_quiz' or 'study_plan_preview', but doesn't fully clarify what the tool outputs or changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'requires sign-in' but provides no guidance on when to use this tool vs alternatives. Among siblings like 'get_my_progress' or 'record_word_result', it's unclear when one should call this function. No exclusions or alternatives mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Vocab Voyage

Server Details

Tool Definition Quality

Available Tools

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Discussions

Your Connectors

Resources