Skip to main content
Glama

Server Quality Checklist

58%
Profile completionA complete profile improves this server's visibility in search results.
  • Disambiguation3/5

    The tools cover a wide range of distinct resources (e.g., accounts, courses, orders, teachers) with clear CRUD operations, reducing ambiguity in most cases. However, some overlap exists, such as 'create_teacher_enrollment' and 'create_teacher_enrollment_by_planned_course_id', which could cause confusion due to similar purposes with slight variations in parameters or context.

    Naming Consistency4/5

    Tool names follow a consistent verb_noun pattern (e.g., create_account, get_accounts, update_course) throughout the set, with verbs like create, get, update, delete, and cancel used predictably. Minor deviations include 'open_invoice' (which uses 'open' instead of a standard verb) and some longer names like 'get_teacher_enrollments_by_planned_course_id', but overall the naming is highly consistent and readable.

    Tool Count2/5

    With 189 tools, the count is excessively high for an educational management system, making it overwhelming and likely to cause agent confusion or inefficiency. While the domain is broad, this many tools suggests over-fragmentation or lack of consolidation, far exceeding the typical well-scoped range of 3-15 tools and indicating a heavy, unwieldy surface.

    Completeness5/5

    The tool set provides comprehensive CRUD and lifecycle coverage across all major domains (e.g., accounts, courses, enrollments, orders, teachers, invoices), with no obvious gaps. It includes operations for creation, retrieval, updating, deletion, and specific actions like approvals or cancellations, ensuring agents can handle full workflows without dead ends in the educational management context.

  • Average 2.4/5 across 189 of 189 tools scored. Lowest: 1.6/5.

    See the tool scores section below for per-tool breakdowns.

  • This repository includes a README.md file.

  • This repository includes a LICENSE file.

  • Latest release: v2.0.1

  • No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.

    Tip: use the "Try in Browser" feature on the server page to seed initial usage.

  • Add a glama.json file to provide metadata about your server.

  • This server provides 100 tools. View schema
  • No known security issues or vulnerabilities reported.

    Report a security issue

  • Are you the author?

  • Add related servers to improve discoverability.

Tool Scores

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description adds no behavioral context beyond what annotations provide. While annotations declare idempotentHint=true and destructiveHint=false, the description fails to explain what happens if the option already exists, what validation rules apply, or any side effects of adding an option to an active custom field.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (6 words) and front-loaded, but this represents under-specification rather than useful conciseness. The single sentence fails to earn its place by providing actionable information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of four parameters with confusing schema descriptions and multiple sibling tools handling the same resource (get, update, delete options), the description is critically incomplete. It lacks return value documentation, error conditions, and relationships between object_type and field_slug parameters.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 75% schema description coverage, the description needed to compensate for the confusing schema (where both object_type and field_slug are erroneously described as 'ID of the custom field option'). The description adds no clarification for these parameters or the value parameter, leaving the agent to rely on misleading schema documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Add an option to a custom field' is a tautology that restates the tool name with minimal elaboration. While it identifies the verb and resource, it fails to differentiate from siblings like update_option_of_custom_field or delete_option_of_custom_field, and adds no specificity about scope or constraints.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No usage guidance is provided. There is no indication of when to use this tool versus update_option_of_custom_field (e.g., for creating new vs. modifying existing options), no prerequisites mentioned (e.g., requiring the custom field to exist first), and no warnings about duplicate values.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to explain side effects (e.g., whether this updates invoice status, allows partial payments, or handles overpayments), idempotency, required permissions, or what the tool returns. For a mutation operation, this lack of behavioral context is a critical gap.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While extremely brief (3 words), this represents under-specification rather than efficient conciseness. Every word is obvious from the tool name, wasting the opportunity to provide value. The description is front-loaded with no structure, but the content is too minimal to be useful.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 5-parameter mutation tool with no output schema and no annotations, the description is woefully incomplete. It should explain the payment-invoice relationship, validation rules (e.g., currency matching invoice), and business logic effects, but provides none of this context.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage, establishing a baseline of 3. The tool description adds no parameter-specific context (e.g., date format expectations, whether payment_method_id refers to an existing entity), but also doesn't contradict the schema. Note: The schema itself contains an error where invoice_id is described as 'ID of the payment' rather than 'ID of the invoice', which the description fails to clarify or correct.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a payment.' is a tautology that restates the tool's action without clarifying the domain context. It omits that this specifically creates an invoice payment (not a generic payment) and fails to distinguish what kind of payment record is being created despite the tool name implying invoice-specific functionality.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives, prerequisites (e.g., invoice existence/status requirements), or workflow context. Sibling tools like delete_invoice_payment_by_id_and_invoice_id and get_invoice_payments_by_invoice_id suggest a CRUD pattern, but the description doesn't explain the relationship between these operations.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full responsibility for behavioral disclosure but provides none. It does not explain the order lifecycle state after creation (draft vs pending), financial implications (whether payment is immediate), idempotency, or the meaning of the 'approve' parameter in the creation context.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (two words), this represents under-specification rather than efficient conciseness. For a complex tool with 13 parameters including nested objects, financial fields (cost, payment_method_id), and deprecated fields, the description is inappropriately sized and front-loaded with zero useful information.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Completely inadequate for the complexity level. The tool has 13 parameters, nested objects, enum constraints, deprecated fields, and no output schema or annotations. The description addresses none of the complexity: no return value documentation, no workflow context, no error conditions, and no explanation of the cost_scheme enum implications.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has 100% description coverage, establishing baseline 3. However, the description adds no context about parameter relationships (e.g., the mutual exclusivity between student_ids and enrollments_attributes mentioned in schema descriptions) or the deprecation of planned_course_id in favor of catalog_variant_id.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create an order.' is tautological, merely restating the tool name without distinguishing it from siblings like approve_order, cancel_order, or deny_order. It fails to clarify what distinguishes 'creating' an order from other order lifecycle operations available in the sibling set.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives, nor prerequisites for invocation. The description omits the evident workflow implications (e.g., relationship to approve_order, whether creation requires subsequent approval) and fails to mention that account_id defaults to personal account if omitted.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to mention whether this creates a persistent association, what happens if the referenced course/edition doesn't exist, or what the mutation returns.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), this represents under-specification rather than effective conciseness. The single sentence provides no actionable information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 3-parameter mutation tool in a complex domain (evidenced by 100+ sibling tools) with no output schema or annotations, the description is radically incomplete. It should explain the entity relationships (course + edition + optional planned course) and side effects.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, so the schema already documents all three parameters (course_id, edition_id, planned_course_id). The description adds no parameter-specific context, but baseline 3 applies when schema coverage is complete.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a program element' is a tautology that merely restates the tool name. It fails to define what a 'program element' represents in this domain (likely a course-edition linkage) or how it differs from sibling entities like 'program', 'program_edition', or 'planned_course'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like create_planned_course or create_program_edition. Given the complex domain model suggested by sibling tools (programs, editions, elements, planned courses), the absence of selection criteria is a critical gap.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but reveals nothing about pagination limits, response format, or performance characteristics. It does not explain what data structure constitutes a 'conflict', how conflicts are detected, or what scope 'all' encompasses.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While the three-word description is brief, it fails the 'every sentence earns its place' standard because the single sentence provides only tautological information without actionable context. The extreme brevity constitutes under-specification rather than efficient communication.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the complexity of the planning domain evident in sibling tools (courses, teachers, enrollments, events), the description inadequately explains what planning conflicts are, what causes them, or what the return structure contains. With no output schema provided, the description should explain the return values but omits this entirely.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for both `cursor` and `per_page` parameters, adequately documenting the pagination interface. Since the schema fully documents the parameters, the description does not need to compensate and meets the baseline expectation for this dimension.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all conflicts' is tautological, essentially restating the tool name while omitting the 'planning' qualifier that provides domain context. It fails to specify what constitutes a conflict (scheduling, resource, or availability conflicts) or distinguish this from sibling tools like `get_planning_events` or `get_planned_courses_by_course_id`.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives, nor does it specify prerequisites such as required filters or permissions. There is no indication of whether this returns conflicts for all entities in the system or requires specific planning context to be useful.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description fails to disclose behavioral traits. It misleadingly claims to get 'all' elements while the schema reveals paginated behavior via cursor/per_page. It omits whether results are sorted, the default page size behavior, or what data structure is returned.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The three-word description is brief but constitutes under-specification rather than efficient conciseness. No information is front-loaded; the sentence merely restates the tool name without earning its place through substantive guidance.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Lacking annotations, output schema, and any description of return values or error conditions, the tool definition is incomplete. For a paginated list operation with 3 parameters and numerous siblings, the description should explain the pagination model and result set scope.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage (cursor, per_page, and edition_id are all documented), establishing a baseline of 3. The description adds no additional semantic context regarding parameter interactions or data types beyond the schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all elements' is essentially a tautology of the tool name 'get_program_elements', merely dropping the word 'program'. It fails to specify what constitutes an 'element' in this domain or distinguish from siblings like get_program_element (singular) or get_elements_of_program_edition.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus the sibling get_elements_of_program_edition or when to apply the edition_id filter. There is no mention of pagination workflows despite the presence of cursor and per_page parameters.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to indicate whether updates are partial (PATCH-like) or full replacement, whether the operation is reversible, if it triggers side effects (e.g., notifications to enrolled students), or required permissions. Only the word 'Update' hints at mutation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While the three-word description is brief, it is inappropriately sized for a mutation tool with multiple relational parameters (edition_id, planned_course_id). The extreme brevity represents under-specification rather than efficient information density; no sentences 'earn their place' with valuable context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Completely inadequate for a mutation tool affecting program structures. Given the complexity of the sibling tool ecosystem (90+ tools including program edition management), the description provides no domain context, no output expectations, no error conditions, and no relationship explanation between program elements, editions, and planned courses.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, all three parameters (id, edition_id, planned_course_id) are documented in the input schema itself. The description adds no additional parameter semantics, syntax guidance, or examples, but the high schema coverage establishes a baseline score of 3.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update an element' is a tautology that restates the tool name while stripping the 'program' context, making it more ambiguous. It fails to specify what a program element represents (e.g., a course component, module, or curriculum item) and does not distinguish this update operation from siblings like create_program_element or delete_program_element.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives such as create_program_element, or prerequisites for updating (e.g., whether the element must exist, be in a draft state, etc.). The description offers zero usage context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Description adds no behavioral context beyond what annotations provide. Does not explain implications of idempotentHint:false (risk of duplicates), error conditions when user/account doesn't exist, or what the tool returns.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely brief (three words), but contains wasteful redundancy/error. Front-loaded structure is appropriate, but severe under-specification makes it ineffective.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Incomplete for a creation tool with 3 parameters and no output schema. Lacks explanation of affiliation semantics, business rules (can a user have multiple affiliations?), and differentiation from organization-level affiliations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, clearly documenting the user-account relationship and key_contact flag. Description adds no parameter details, but baseline adequacy is met by the schema itself.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the action ('Create') but contains redundancy/typographical error ('affiliation affiliations'). It fails to define what an affiliation represents (a user-account relationship evident from schema) or distinguish from sibling tool `create_organization_affiliation`.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives like `create_organization_affiliation`, `update_affiliation`, or prerequisites for creating affiliations.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly mark this as destructive and idempotent, the description adds zero behavioral context beyond these hints. It does not specify whether deletion is permanent, if cascading effects occur on related records, or what the tool returns upon success.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (three words), but this represents under-specification rather than efficient conciseness. Every sentence should earn its place; here, the sentence merely echoes the tool name without adding value.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the existence of the confusing sibling tool 'delete_organization_affiliation' and the destructive nature of the operation, the description is incomplete. It lacks necessary disambiguation and behavioral details that would enable confident invocation despite the simple parameter structure.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage (the 'id' parameter is documented), establishing a baseline of 3. The description adds no additional parameter semantics (e.g., examples, format constraints, or lookup methods), but none are required given complete schema documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete an affiliation' is tautological, merely restating the tool name without specifying what an 'affiliation' represents in this domain. Crucially, it fails to distinguish this tool from the sibling 'delete_organization_affiliation', leaving ambiguity about which deletion primitive to use.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided regarding when to use this tool versus alternatives (e.g., 'delete_organization_affiliation' or 'update_affiliation' to terminate rather than destroy). There are no prerequisites, warnings about dependencies, or conditions mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, yet the description fails to disclose any behavioral traits: whether deletion is permanent, if it cascades to related records (enrollments, grades), or required permissions. The description carries the full burden and provides no behavioral context.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), this represents under-specification rather than valuable conciseness. The single sentence fails to earn its place by providing actionable context about the operation's scope or consequences.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite low complexity (1 parameter) and complete schema coverage, the description is inadequate. It omits what a 'program element' represents (e.g., a course within a program, a module) and fails to explain the deletion impact, which is critical for a destructive operation even with good parameter documentation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline is 3. The description adds no additional semantics about the 'id' parameter (e.g., where to obtain it, validation rules) beyond what the schema already provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a element' is largely tautological, restating the tool name without the 'program' context. It fails to distinguish this tool from sibling deletion tools (delete_program, delete_program_edition) or explain what constitutes a 'program element' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like delete_program or delete_program_edition, nor any mention of prerequisites (e.g., whether the element must be unlinked from enrollments first).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, and the description fails to disclose the read-only nature, error handling behavior (e.g., when ID is not found), caching, or required permissions.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief, the description suffers from under-specification rather than efficient conciseness. The four words provide no actionable detail beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Lacking an output schema, the description should explain the return structure or resource relationships, but provides no information about what data is returned or how it relates to parent programs.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% coverage with a clear description for the 'id' parameter, meeting baseline expectations. The description adds no parameter-specific context but does not need to given the complete schema documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a program edition' is a tautology that restates the tool name without clarifying what constitutes a 'program edition' or how it differs from sibling resources like 'program', 'program element', or 'program enrollment'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this single-item retrieval versus alternatives such as 'get_program_edition_of_elements_batch' or list operations like 'get_programs'.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. While 'Create' implies a write operation, the description omits critical behavioral details present in the schema: that login credentials may be emailed (with_authentication parameter), what error conditions occur for duplicate emails, or what the return value contains.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), the description is inappropriately sized for an 11-parameter tool with nested address objects and authentication flows. It is front-loaded with minimal information rather than high-value context like side effects or return format.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Grossly inadequate for the complexity (11 parameters, nested objects, no output schema). Missing: return value structure, whether the created user ID is returned, email sending behavior, locale defaults, and the distinction between address_attributes and invoice_address_attributes.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 73% schema description coverage, the parameter documentation is reasonably complete in the schema itself. The description adds no parameter guidance, but the baseline score of 3 is appropriate since the schema covers most fields (first_name, email, address_attributes, etc.).

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a user.' is a tautology that restates the tool name. It fails to specify the user type (student, teacher, admin) or distinguish from siblings like create_teacher, create_lead, or update_user.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus create_teacher (which also creates a user-type entity), prerequisites for creation, or whether to use update_user for existing records. No mention of the email authentication flow.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description adds zero behavioral context beyond what the annotations already provide (destructiveHint=true, idempotentHint=true). It does not explain whether deletion is permanent or soft, what happens to child resources, or any side effects, despite this being a destructive operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at only four words, which prevents verbosity but results in under-specification. While it is front-loaded with the action verb, the brevity fails to meet the information density required for a destructive operation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given that this is a destructive mutation tool with no output schema, the description is inadequate. It should explain deletion semantics (hard vs soft), recoverability, or cascading effects. The 100% input schema coverage is the only factor preventing a score of 1.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameters are fully documented in the schema itself (object_slug as 'ID of the parent resource' and id as 'ID of the custom record to delete'). The description adds no additional parameter semantics, meeting the baseline expectation for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a custom record' is essentially a tautology that restates the tool name with minor grammatical changes. While it identifies the verb (Delete) and resource (custom record), it fails to differentiate this tool from sibling deletion tools (delete_comment, delete_grade, etc.) or explain what constitutes a 'custom record' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No usage guidelines are provided. The description lacks any indication of when to use this tool versus alternatives like update_custom_record, prerequisites for deletion, or warnings about when deletion might fail (e.g., if the record is referenced elsewhere).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full burden for behavioral disclosure yet provides no information about destructiveness scope, cascade effects, soft vs hard deletion, or required permissions. The word 'Delete' implies mutation but lacks critical safety context.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (3 words), but this constitutes under-specification rather than effective conciseness. No sentences exist to evaluate for earning their place, though the fragment is not verbose.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive operation with no output schema and no annotations, the description is inadequate. It omits expected details about deletion side effects, return values (success/failure indicators), or recovery options.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for its single parameter (id), documenting it as 'ID of the program program to delete'. The description adds no additional semantic context beyond the schema, meeting the baseline for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a program' is a tautology that restates the tool name. While it identifies the verb and resource, it fails to distinguish from siblings like delete_program_edition or delete_program_element, leaving ambiguity about which program entity is targeted.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (e.g., cancel_program_enrollment), prerequisites (e.g., program state requirements), or what happens to dependent objects like editions or enrollments.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already establish idempotent, non-destructive, and non-read-only behavior. The description adds no behavioral context beyond the word 'Update'—it doesn't clarify if omitted optional fields (email, phone, address) are cleared or preserved, nor does it mention side effects like triggering validation or webhooks.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), this represents under-specification rather than efficient conciseness. No value is front-loaded; the description wastes the opportunity to convey critical context about the update behavior or account scope.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a complex mutation tool with 7 parameters (including nested address objects, custom properties, and label arrays), the description is grossly inadequate. No output schema is present, yet the description doesn't indicate what the tool returns (updated object, success boolean, etc.).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 86% schema description coverage, the input schema carries most of the semantic weight. The description mentions no parameters, but the schema adequately documents all fields including the nested address_attributes structure. Baseline score applies since the schema compensates.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update an account' is a tautology that restates the tool name (update_account). While it implies a mutation operation, it fails to specify what constitutes an 'account' in this context (billing entity, user profile, organization) or distinguish scope from sibling tools like create_account or get_account.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus create_account (for new records) or get_account (for retrieval). No mention of prerequisites such as requiring an existing account ID, or whether this performs partial updates (PATCH) or full replacements (PUT) given the required 'name' field.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations indicate idempotentHint=true and destructiveHint=false, but the description adds no behavioral context about side effects, validation rules, or what happens to nested course_tab_contents_attributes when updating.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), it is inappropriately sized for a 14-parameter tool with nested objects. It lacks front-loaded critical information about partial update semantics and required ID parameter context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Completely inadequate for the complexity: 14 parameters including nested arrays (course_tab_contents_attributes), polymorphic custom fields, and financial settings (cost_scheme). No output schema means the description should explain return values or success indicators, but provides nothing.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 79% (high), so the baseline is 3. The description mentions no parameters, but the schema adequately documents most fields including the enum for cost_scheme. No compensation provided for undocumented custom/custom_associations objects.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a course.' is tautological, merely restating the tool name without specifying scope (partial vs full update), distinguishing from sibling create_course, or indicating which fields are updatable.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus create_course, prerequisites for updating (e.g., draft vs published states), or whether omitted fields are preserved or reset.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It implies mutation via 'Update' but fails to specify that this is a partial update (only 'id' is required), omits the fixed versus flexible course type distinction evident in the schema, and provides no information about side effects, permissions, or reversibility.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), it is under-specified rather than efficiently concise. The single sentence fails to front-load critical information about the tool's scope or behavior, wasting the opportunity to clarify the complex 15-parameter operation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Grossly inadequate for a complex tool with 15 parameters, nested objects (custom, custom_associations), and conditional logic (fixed vs. flexible courses indicated by start_date vs. duration fields). No output schema exists, yet the description provides no indication of return values or operation outcomes.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 87% (high), establishing a baseline of 3. The description adds no parameter-specific context beyond the schema, but the schema adequately documents fields like cost_scheme enums and date requirements without additional narrative support.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a planned course' restates the tool name with minimal elaboration. While it identifies the verb (Update) and resource (planned course), it fails to define what constitutes a 'planned course' or distinguish this tool from siblings like create_planned_course or cancel_planned_course.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (e.g., cancel_planned_course for cancellations), prerequisites for use (e.g., needing the planned course ID), or whether this performs partial updates versus full replacements.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but provides none. It omits mutation semantics (idempotency, side effects), error handling for invalid IDs, and the significance of nested operations like course_tab_contents_attributes which modifies child resources.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief, three words is under-specification rather than effective conciseness for an 11-parameter mutation tool with nested objects. The sentence fails to earn its place by providing actionable context for the complexity involved.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Completely inadequate for a complex mutation tool with 11 parameters including nested arrays (course_tab_contents_attributes) and schemaless objects (custom). No output schema or annotations exist to compensate for the description's failure to explain partial update behavior, validation rules, or return values.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 73%, documenting standard fields like cost_scheme and is_published. The description adds no parameter context, but the schema adequately covers most fields including the enum values for cost_scheme. However, it doesn't clarify the open-ended custom/custom_associations objects or the nested update behavior.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a program' is a tautology that restates the tool name without specifying what constitutes a 'program' in this domain or distinguishing from siblings like create_program, delete_program, or update_program_edition. It fails to clarify the scope of updatable fields.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (e.g., update_program_edition for editions vs programs), prerequisites (e.g., obtaining the program ID first), or whether partial updates are supported versus requiring all fields.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Update' implies mutation, the description fails to clarify critical behavioral traits: whether unspecified fields are preserved or cleared (PATCH vs PUT semantics), potential side effects on related enrollments, or idempotency guarantees. For an 11-parameter mutation tool with nested objects, this is insufficient.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At four words, the description is concise but suffers from under-specification rather than efficient information density. The single sentence does not earn its place as it adds no value beyond the tool name. It lacks front-loading of critical behavioral context needed for a complex update operation with nested objects.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the complexity (11 parameters including nested objects, no output schema, no annotations), the description is completely inadequate. It omits critical context: the partial update nature (only 'id' is required), the relationship to program editions vs programs, and the implications of updating fields like is_published or cost_scheme on existing enrollments.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 82% schema description coverage, the input schema effectively documents most parameters (name, dates, cost, participants, etc.). The description adds no parameter-specific guidance beyond the schema, but given the high coverage baseline of 3 applies. The description does not compensate for the 'custom' and 'custom_associations' object parameters which lack detailed schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a program edition' is essentially a tautology that restates the tool name (update_program_edition) with minor grammatical changes. While it confirms the operation type and resource, it fails to distinguish from sibling tools like create_program_edition or delete_program_edition, and provides no insight into what constitutes a 'program edition' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives such as create_program_edition or get_program_edition. It does not mention prerequisites, partial update semantics (implied by 10 optional parameters), or whether this requires specific permissions or states (e.g., unpublished editions only).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to mention that this is a partial update (only 'id' is required), what happens to unspecified fields, idempotency characteristics, or any side effects. No mutation warnings or permission requirements are stated.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief at only three words, this represents under-specification rather than effective conciseness. For a tool with 9 parameters and complex relationships (subject_type/subject_id), the description fails to front-load critical information or justify its brevity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's complexity—9 parameters including relationships between subject_type and subject_id, partial update semantics, and no output schema—the description is entirely inadequate. It reveals nothing about parameter interactions, return values, or error conditions.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline score applies. The schema comprehensively documents all 9 parameters including the subject_type enum and required id field. The description adds no parameter-specific context, but the schema fully compensates.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a task' is a tautology that restates the tool name with minimal elaboration. While it identifies the resource (task) and action (update), it fails to distinguish from siblings like create_task or delete_task, and offers no scope clarification (e.g., partial vs full updates).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like create_task, nor does it mention prerequisites such as needing a valid task ID. It lacks explicit when-not-to-use conditions or workflow context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, placing full disclosure burden on the description. The description fails to specify mutation behavior (partial vs full updates, idempotency), error handling for invalid IDs, or side effects. Only the word 'Update' hints at mutability, providing minimal transparency.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (4 words), the description suffers from under-specification rather than efficient conciseness. The single sentence provides no actionable information beyond the tool name itself, failing the 'every sentence should earn its place' standard.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with 4 parameters and no annotations or output schema, the description is grossly inadequate. It omits critical context such as which fields are updatable (implied only by schema), whether relationships (planned_course_id, teacher_role_id) can be changed independently, and success/failure behaviors.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the structured schema already documents all four parameters (id, planned_course_id, teacher_id, teacher_role_id) and their types. The description adds no additional parameter context, syntax guidance, or examples, meeting the baseline for well-documented schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a teacher enrollment' restates the tool name (update_teacher_enrollment) without adding specificity. It identifies the action and resource but fails to distinguish from sibling tools like create_teacher_enrollment or delete_teacher_enrollment, or specify which attributes can be modified.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (e.g., create_teacher_enrollment_by_planned_course_id), nor any mention of prerequisites such as requiring an existing enrollment ID or whether reassigning teachers requires update versus create/delete workflows.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While the description doesn't contradict the annotations (readOnlyHint: false aligns with 'Create'), it adds no behavioral context beyond what the annotations provide. It omits whether creation is conditional, what identifiers are returned, or how this relates to the broader course management workflow.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The three-word description is brief but constitutes under-specification rather than effective conciseness. No information is front-loaded; the single sentence merely restates the obvious operation implied by the tool name without earning its place through utility.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool creates a resource with complex nested address parameters and lacks an output schema, the description is insufficient. It fails to explain the entity lifecycle, address field requirements, or expected return behavior, leaving critical gaps for an agent attempting to invoke the tool correctly.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With only 50% top-level schema coverage (name is documented, address_attributes object lacks a description), the description fails to compensate for the undocumented nested structure. It provides no guidance on address formatting, required fields within the address object, or the relationship between the location name and address fields.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a course location' is a tautology that restates the tool name without adding specificity. It fails to distinguish this tool from siblings like create_meeting_location or clarify what constitutes a 'course location' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives (e.g., update_course_location for modifying existing locations), nor are prerequisites or validation rules mentioned. The agent receives no signals about appropriate usage contexts.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations indicate this is a non-idempotent write operation (readOnlyHint: false, idempotentHint: false), but the description adds no behavioral context about side effects, return values (e.g., the created lead ID), or whether partial creation is atomic. It does not contradict annotations, but provides no additional transparency.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief, 'Create a lead' is inappropriately sized for an 18-parameter tool with nested objects (address_attributes, lead_products). It is front-loaded but underspecified—conciseness should not come at the cost of essential context for complex operations.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the high parameter complexity (18 optional params, nested structures) and lack of output schema, the description is severely incomplete. It fails to explain return values, required field combinations, or relationships between linked entities (account vs user vs administrator).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 94% schema description coverage, the parameter definitions are already comprehensive. The description adds no parameter guidance, but given the high schema quality, it meets the baseline expectation without needing to compensate for schema gaps.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a lead' is a tautology that restates the tool name. It fails to define what constitutes a 'lead' in this context (e.g., a sales prospect/CRM record) or differentiate from sibling tools like update_lead or get_lead.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus update_lead, or prerequisites for creation. The schema reveals relationships (account_id, user_id, administrator_id) that suggest ownership requirements, but the description mentions none of these constraints.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare readOnly/idempotent/destructive hints, the description fails to disclose critical behavioral traits: that results are paginated (cursor/per_page parameters exist), how the search filter behaves (fuzzy vs. exact), or whether filters combine with AND/OR logic. It mentions 'all' but the parameters clearly enable filtering.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), this represents under-specification rather than efficient conciseness. The single sentence fails to earn its place by providing only tautological information that adds no value beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's complexity (6 optional parameters supporting pagination and multi-field filtering), the description is inadequate. It omits pagination behavior, filter combinators, and the relationship to singular 'get_account', leaving significant gaps despite the rich input schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage across all 6 parameters (cursor, per_page, search, filters), the schema carries the semantic burden adequately. The description adds no parameter-specific context, but baseline 3 is appropriate since no compensation is needed for undocumented parameters.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all accounts' is a tautology that restates the tool name (get_accounts). It fails to specify the domain of 'accounts' (business vs. personal) or distinguish from the sibling tool 'get_account' (singular), which likely retrieves a specific account by ID rather than listing multiple.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus 'get_account' for single-record retrieval, or when to apply filters versus fetching unfiltered results. The existence of 6 optional filtering parameters suggests complex usage patterns that are not addressed.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While the annotations correctly indicate this is a read-only, idempotent, non-destructive operation, the description adds no behavioral context beyond this. It does not disclose that results are paginated (despite the cursor parameter), does not explain the default page size (25), and does not describe what constitutes a 'catalog variant' in this domain.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is four words long, but this represents under-specification rather than efficient conciseness. The single sentence fails to front-load critical behavioral information (pagination, filtering capabilities) and wastes the opportunity to clarify the tool's scope.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given 6 parameters, 100% schema coverage, and no output schema, the description is inadequate. It omits essential context for a paginated list operation: the pagination mechanism, the relationship between filters, and the nature of the returned catalog variants. Without an output schema, the description should compensate by describing the return structure, which it does not.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameter semantics are fully handled by the input schema. The description adds no parameter-specific guidance, but the baseline score of 3 applies since the schema comprehensively documents all 6 parameters including enum values and defaults.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all catalog variants' essentially restates the tool name with the addition of 'all', which is misleading since the tool supports filtering (product_id, variantable_type) and pagination. It fails to distinguish this list operation from the singular sibling tool get_catalog_variant or explain what distinguishes a 'catalog variant' from other variant types (e.g., course_variants).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like get_catalog_variant (singular) or get_catalog_products. There is no mention of when to apply specific filters (variantable_id vs product_id) or how to handle pagination workflows with the cursor parameter.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description adds no behavioral context beyond the annotations (which indicate read-only, idempotent, non-destructive). It fails to describe return value structure, error conditions (e.g., record not found), or the relationship between the parent object_slug and the record id.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), this constitutes under-specification rather than efficient conciseness. The single sentence fails to earn its place by providing actionable information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of easily confused sibling tools (get_custom_records vs get_custom_record) and no output schema, the description should clarify scope (single vs list) and return behavior. It provides the minimum possible information, leaving critical gaps in context.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline is 3. The description 'Get a custom record' adds no additional parameter context beyond what the schema already documents for 'object_slug' and 'id'.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a custom record' is tautological, essentially restating the tool name. It fails to distinguish from sibling tool 'get_custom_records' (plural/list) or clarify that this retrieves a single specific record by ID versus listing multiple records.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like 'get_custom_records' or 'get_custom_object_by_object_slug'. No mention of prerequisites or required context (e.g., needing the object_slug from a parent custom object).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Get' implies a read-only operation, the description fails to disclose what happens if the ID doesn't exist (404 error?), whether the user needs specific permissions to view the element, or the structure of the return value.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At three words, the description is technically concise, but this brevity represents under-specification rather than efficiency. Given the complexity of the domain (evidenced by 100+ sibling tools including create_program_element and update_program_element), the description fails to front-load critical context about the tool's specific purpose.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool in what appears to be a complex LMS/curriculum management system (with related tools for program editions, enrollments, and planning), the description is inadequate. It does not explain the relationship between a program element and other entities (programs, editions), nor does it clarify the distinction between this tool and related retrieval siblings, leaving significant gaps in an agent's understanding.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (the 'id' parameter is documented as 'ID of the program element to retrieve'), the baseline score applies. The description adds no additional parameter context (such as where to obtain valid IDs or formatting requirements), but the schema is self-sufficient.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get an element' uses a generic verb and resource that fails to specify what constitutes a 'program element' in this domain (e.g., a course module, lesson, or curriculum component). Crucially, it omits 'program' from the resource name, making it less informative than the tool name itself, and provides no differentiation from siblings like get_program_elements (plural list) or get_elements_of_program_edition.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this single-item retrieval tool versus the plural get_program_elements (likely a list operation) or get_elements_of_program_edition. There are no prerequisites mentioned (e.g., obtaining the ID from another call) and no error conditions described.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Get' implies a read-only operation, the description omits pagination behavior (despite cursor/per_page parameters), side effects, rate limits, or whether 'all' implies unbounded retrieval or respects the pagination parameters.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is brief but constitutes under-specification rather than efficient conciseness. It fails to earn its place by adding no informational value beyond the tool name itself, leaving the agent without actionable context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of multiple similar sibling tools and the ambiguous domain terminology ('program personal program elements'), the description is insufficient. Without an output schema or annotations, the description needed to explain the entity type and relationships, but provided none.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has 100% description coverage for both parameters (cursor and per_page), establishing a baseline score of 3. The description adds no additional parameter context (e.g., default pagination limits, cursor format), but the schema compensates adequately.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all program personal program elements' is tautological, merely expanding the snake_case tool name into a sentence without clarifying what 'program personal program elements' actually refers to. It fails to distinguish this tool from siblings like 'get_program_elements' or 'get_elements_of_program_edition'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus the many sibling tools that appear to retrieve similar entities (e.g., 'get_program_elements', 'get_program_edition_of_elements_batch'). No mention of prerequisites or filtering capabilities.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but provides minimal information. It does not explain pagination behavior (despite having pagination parameters), rate limits, authorization requirements, or whether 'all' refers to all programs in the system or just those accessible to the authenticated user.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), this is under-specification rather than effective conciseness. The single sentence fails to earn its place by not conveying essential context like pagination behavior or differentiation from singular retrieval tools.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of pagination parameters and numerous sibling tools (including 'get_program', 'create_program', etc.), the description is incomplete. It lacks explanation of pagination, scope of 'all', and differentiation from related tools, leaving agents to infer usage from parameter names alone.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for both parameters (cursor and per_page), so the baseline score applies. The description adds no additional semantic information about the parameters, but the schema adequately documents them.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all programs' is essentially a tautology of the tool name 'get_programs'. While it confirms the action (Get) and resource (programs), it fails to distinguish from the sibling tool 'get_program' (singular) or clarify whether this returns a complete unfiltered list versus a scoped subset.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    There is no guidance on when to use this tool versus 'get_program' (singular) or other related tools like 'get_program_edition'. No mention of pagination strategy despite the presence of cursor and per_page parameters.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It implies a read operation but fails to disclose pagination behavior (despite cursor/per_page params), default page sizes, or whether results are cached. The term 'all' is misleading since the endpoint appears paginated.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (4 words), this represents under-specification rather than efficient conciseness. The single sentence fails to front-load critical information about pagination or filtering capabilities that would help an agent invoke the tool correctly.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 7 filter parameters including complex enums (for_type with 6 values) and pagination, the description is inadequate. It mentions none of the filtering capabilities (visibility, use_as_duplicate_indicator) or the relationship between the for_* parameters.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage (every parameter has a basic description), establishing a baseline of 3. The description itself adds no semantic context about the filtering logic (e.g., how for_user vs for_account interact) or enum meanings beyond what the schema provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all signup_question records' restates the tool name with minimal elaboration. While it identifies the resource (signup_question) and action (get), it fails to distinguish from the many sibling 'get_' tools or explain what distinguishes a 'signup_question' from other record types.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives, when pagination is required, or how to use the filtering parameters (for_user, for_account, for_type) effectively. The agent has no signals about prerequisites or search strategies.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. While 'Get' implies a read-only operation, the description lacks explicit safety guarantees, error handling details (e.g., behavior when ID doesn't exist), or return value structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At 4 words, it is brief, but this is under-specification rather than efficient conciseness. The phrase 'an user' contains a grammatical error, and the sentence adds no value beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of siblings like 'get_users', 'create_user', and 'update_user', the description should clarify this is a singular retrieval operation. Without an output schema, some mention of return behavior would improve completeness.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The schema has 100% description coverage ('ID of the user to retrieve'), so the baseline is 3. The description adds no additional parameter context, but the schema adequately documents the single required parameter.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get an user record' is a tautology that restates the tool name with minimal variation. It fails to distinguish this tool from sibling 'get_users' (which likely returns a list) or specify that this retrieves a single user by ID.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this single-record lookup versus 'get_users' (list operation), or versus 'get_current_educator'. No mention of prerequisites (e.g., knowing the user ID) or error conditions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already indicate this is a non-destructive, idempotent write operation (readOnlyHint=false, destructiveHint=false, idempotentHint=true). The description confirms the mutation nature but adds no behavioral context about side effects, what happens to omitted fields, or how the nested course_tab_contents_attributes are processed (e.g., replaced, appended, or merged).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While the four-word description is brief, it is underspecified rather than efficiently concise. It front-loads no critical information beyond the tool name itself and wastes the opportunity to explain complex nested structures like course_tab_contents_attributes.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's complexity—including nested object arrays, 5 parameters, mutation semantics, and no output schema—the description is inadequate. It fails to explain the relationship between catalog products and course tabs, the purpose of the custom object, or provide any return value information.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 80% schema description coverage, the input schema already documents most parameters (e.g., 'ID of the catalog product to update', 'The custom properties of the product'). The description adds no additional parameter guidance, examples, or constraints, meeting the baseline expectation for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a catalog product' essentially restates the tool name (update_catalog_product) with minimal elaboration. While it identifies the verb and resource, it fails to distinguish this tool from siblings like update_catalog_variant or update_course, and provides no domain context about what constitutes a catalog product in this system.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided regarding when to use this tool versus alternatives (e.g., update_catalog_variant), prerequisites for invocation (such as required permissions or pre-fetching the product), or whether this supports partial updates (PATCH) versus full replacements (PUT).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations declare idempotentHint=true and destructiveHint=false, covering the safety profile. However, the description adds no behavioral context beyond these annotations—omitting whether partial updates are supported, what happens if the ID doesn't exist, or any side effects on child categories via parent_id.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At three words, the description is under-specified rather than appropriately concise. It front-loads no useful information beyond the tool name itself, failing to earn its place as helpful documentation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with no output schema, the description inadequately explains the operation's scope. Despite having well-documented parameters, it fails to clarify whether updates are partial (PATCH-like) or full replacement, and omits hierarchical implications of the parent_id field.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema itself documents all 6 parameters (id, name, slug, description, is_published, parent_id) adequately. The description mentions no parameters, meeting the baseline expectation when schema coverage is comprehensive.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a category' is a tautology that restates the tool name (update_category). While it contains a verb and resource, it fails to specify what 'category' means in this domain or distinguish from sibling tools like create_category or get_category.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (e.g., create_category for new categories), nor any mention of prerequisites like requiring an existing category ID. The description offers zero contextual hints for selection.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description adds no behavioral context beyond what annotations already provide. Annotations indicate the operation is non-destructive and idempotent, but the description doesn't clarify what happens when optional parameters are omitted or whether the operation is reversible.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While extremely brief (three words), this represents under-specification rather than efficient conciseness. The single sentence fails to earn its place by providing meaningful information beyond the tool name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of sibling CRUD operations and the fact that 'content' is optional (while 'id' is required), the description fails to explain the partial update semantics or expected behavior, leaving critical gaps despite the simple schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema already fully documents both parameters (id and content). The description mentions neither, but baseline 3 is appropriate when the schema carries the semantic load.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a comment' is a tautology that merely restates the tool name. It fails to specify what 'update' entails (e.g., partial vs full replacement) and does not distinguish this tool from siblings like create_comment or delete_comment.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (create_comment, delete_comment), nor any mention of prerequisites such as the comment needing to exist beforehand.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations indicate idempotentHint=true and destructiveHint=false, which align with the 'update' operation name, so no contradiction exists. However, the description adds no behavioral context beyond these annotations—missing details on partial update support (evident in schema but not described), error handling for invalid IDs, or side effects like notification triggers.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is three words and severely under-specified for a 7-parameter mutation tool. While not verbose, it fails the 'appropriately sized' criterion for the complexity involved—every sentence should earn its place, but this provides insufficient value for the agent's decision-making.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a write operation with 7 parameters and no output schema, the description is incomplete. It omits the grading domain context (enrollments, gradeables), relationships to sibling CRUD tools (create_grade, delete_grade), and whether partial updates are supported (implied by optional parameters in schema but not confirmed in description).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, with all 7 parameters (id, grade, score, gradeable_id, gradeable_type, comment, enrollment_id) documented in the input schema. The description adds no parameter-specific guidance, but the baseline score of 3 applies when schema coverage is comprehensive.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a grade' is a tautology that restates the tool name. While it identifies the action (update) and resource (grade), it fails to specify the domain context (academic/LMS grading system evident from enrollment/gradeable parameters) or distinguish from sibling operations like create_grade or delete_grade.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use update_grade versus create_grade (for new grades) or set_attendance (for related academic records). No mention of prerequisites like requiring an existing grade ID, or behavior when the grade doesn't exist.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description bears full responsibility for behavioral disclosure. While 'Update' implies mutation, it fails to specify whether unspecified fields are preserved (PATCH) or reset (PUT), nor does it mention idempotency, validation rules, or side effects like webhook deactivation/reactivation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At 3 words, the description is brief, but this represents under-specification rather than efficient conciseness. The single sentence fails to earn its place by providing zero information beyond what is already obvious from the tool name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with no annotations and no output schema, the description should explain operational semantics (partial updates, field persistence) and expected outcomes. Currently, it provides insufficient context for safe invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has 100% description coverage for all 4 parameters (id, url, active, events). The description adds no parameter-specific guidance, but with high schema coverage, the baseline score of 3 is appropriate as it does not need to compensate for undocumented parameters.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a webhook.' is tautological, merely restating the tool name without elaborating on the specific update semantics (e.g., partial vs. full replacement). It does not differentiate from sibling webhook operations like create_webhook or delete_webhook.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus create_webhook (for new webhooks) or delete_webhook (for removal). No mention of prerequisites such as requiring a valid webhook ID or whether all fields must be provided.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to explain what 'cancel' means (status change vs. hard deletion), whether the operation is reversible, or if there are side effects like notifications or refund triggers.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description contains no waste, but it suffers from under-specification rather than efficient conciseness. Given the complexity implied by the sibling tool cancel_enrollment, a single sentence restating the name is insufficiently informative.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive/mutative operation on a program enrollment, the description is inadequate. It lacks explanation of the cancellation logic, status implications, and differentiation from similar tools, leaving critical gaps despite the simple single-parameter schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema fully documents the 'id' parameter as 'ID of the program enrollment'. The description adds no additional parameter semantics, meeting the baseline expectation when the schema does the heavy lifting.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Cancel a program enrollment' is a direct tautology of the tool name cancel_program_enrollment, merely replacing underscores with spaces. While the verb and resource are clear, it fails to distinguish from the sibling tool cancel_enrollment, leaving ambiguity about which cancellation tool to use.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus cancel_enrollment or other state-changing operations. No prerequisites are mentioned (e.g., whether the enrollment must be in a specific status to be cancellable).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations establish this is a non-idempotent write operation (readOnlyHint: false, idempotentHint: false), but the description adds no behavioral context about side effects, uniqueness constraints on uid, or what happens if the authentication_provider_type is invalid.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is appropriately sized and front-loaded, but its extreme brevity results in under-specification rather than efficient communication of essential details.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a creation operation with no output schema, the description lacks critical context: return value structure, error conditions (e.g., duplicate uid), and the entity lifecycle. The 100% schema coverage for inputs does not compensate for missing behavioral documentation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 100% with adequate descriptions for all three parameters (uid, user_id, authentication_provider_type). The description adds no additional parameter context, meeting the baseline for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create an authentication' is essentially a tautology that restates the tool name. It fails to explain what an 'authentication' entity represents (e.g., a login method linking a user to an auth provider) or how it differs from siblings like delete_authentication_from_user.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives, prerequisites (e.g., whether the User must exist first), or constraints on the uid parameter. The agent must infer usage solely from parameter names.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, and the description fails to disclose critical behavioral traits: whether deletion is permanent, if it cascades to attendees/planned courses, or if it requires specific authorization.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is front-loaded and contains no waste, but it is arguably under-specified for a destructive operation—every sentence earns its place, yet there should be more sentences.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive operation with no annotations and no output schema, the description is inadequate. It lacks critical safety context (permanence, side effects, recovery options) that agents need to invoke this tool responsibly.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema fully documents the 'id' parameter. The description adds no parameter details, earning the baseline score for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the specific verb (delete) and resource (meeting), but is extremely minimal and does not differentiate from similar deletion tools like delete_meeting_location or delete_planning_event.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives, prerequisites (e.g., permissions), or whether to use cancel versus delete for meetings.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are present, so the description carries the full burden. It states the destructive action but omits critical behavioral details: whether deletion is permanent or soft, what happens to related planning_attendees (per sibling tools), or required permissions.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief at three words, but the single sentence fails to earn its place by adding value beyond the tool name. It is front-loaded but vacuous.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive operation with no output schema or annotations, the description is inadequate. It lacks warnings about cascading effects on related entities (planning attendees, teachers) or error conditions (e.g., attempting to delete non-existent events).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 100% with the 'id' parameter fully documented as 'ID of the planning event to delete'. The description adds no additional parameter context, meeting the baseline for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a planning event' is a tautology that restates the tool name without adding specificity. It fails to distinguish 'planning events' from similar sibling resources like 'meetings' or 'planned courses', or clarify what constitutes a planning event in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (e.g., cancel_planned_course), prerequisites (such as checking event existence via get_planning_event), or whether the event must be in a specific state before deletion.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While the description does not contradict the annotations (readOnly/idempotent/destructive hints), it adds no behavioral context beyond them. It fails to mention pagination behavior (cursor/per_page), sorting capabilities, or what data structure is returned, all critical for a list endpoint with 7 parameters.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is brief (4 words) and front-loaded, but the inclusion of 'all' creates ambiguity given the filtering capabilities, reducing effectiveness. It wastes the limited space on a misleading absolute rather than qualifying the scope.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 7 parameters including pagination, filtering, and sorting—and no output schema provided—the description is inadequate. It omits critical context about the tool's capabilities, return format, and relationship to sibling tools that an agent would need to invoke it correctly.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline is 3. The description itself adds no information about parameter semantics, usage patterns (e.g., that 'published' only accepts the string 'published'), or examples of valid 'sort' values, relying entirely on the schema documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all catalog products' is essentially a tautology of the tool name. The word 'all' is misleading given the tool supports filtering (category_id, search, published) and pagination. It fails to distinguish from the sibling tool 'get_catalog_product' (singular) or clarify what constitutes a 'catalog product' versus other product types in the system.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No usage guidelines are provided. The description does not indicate when to use this list endpoint versus the singular 'get_catalog_product', nor does it mention prerequisites like authentication requirements or suggest when to apply specific filters (e.g., using 'published' vs fetching all).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly declare this as read-only and non-destructive, the description adds no behavioral context about pagination (despite having cursor/per_page parameters), filtering logic, or return format. The 'all' claim contradicts the actual paginated behavior where only 25 items are returned by default.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at four words, but this represents under-specification rather than efficient information density. The single sentence fails to earn its place by providing substantive guidance beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given that this is a paginated list endpoint with filtering capabilities and no output schema, the description is incomplete. It should explain pagination behavior, the optional student_id filter, and what constitutes a 'credit record' in this domain.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for all three parameters (cursor, per_page, student_id). The description adds no additional semantic information about these parameters, meeting the baseline expectation when schema coverage is high.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all credit records' is tautological, essentially restating the tool name 'get_credits'. It fails to distinguish from sibling tools like 'get_credit_categories' or clarify what type of 'credits' these are (academic vs. financial). The claim to get 'all' records is misleading given the presence of a student_id filter parameter.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides absolutely no guidance on when to use this tool versus alternatives, nor does it explain when to use the cursor/pagination versus fetching all results, or how the student_id filter affects behavior.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare readOnly/idempotent hints, the description fails to resolve the semantic contradiction between the singular 'current educator' name and the list-oriented pagination parameters. It does not disclose what constitutes 'current' (active session? authenticated user?) or explain why a singular getter supports cursor-based pagination typically used for collections.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise (four words), but this brevity is insufficient given the tool's ambiguous semantics. While no words are wasted, the lack of content fails to address critical distinctions needed for proper tool selection.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Absent an output schema, the description fails to clarify the return structure (single object vs paginated list). The 'current' qualifier remains unexplained, and the relationship between the educator entity and teacher entities (per sibling tools) is undefined, leaving significant gaps for an agent attempting to use this tool correctly.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameters cursor and per_page are fully documented in the schema. The description adds no parameter-specific context, but the baseline score of 3 is appropriate given the schema self-documents the pagination intent.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get an educator record' is tautological, merely restating the tool name with 'an' instead of 'the current'. It fails to distinguish 'educator' from sibling 'teacher' entities (get_teacher, get_teachers) or clarify what 'current' means (authenticated user?). The mismatch between the singular 'record' and the pagination parameters (cursor, per_page) creates ambiguity about whether this retrieves one or many records.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus get_teacher or get_teachers. No mention of prerequisites (e.g., authentication requirements to identify the 'current' educator) or filtering capabilities implied by the pagination parameters.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true and idempotentHint=true, confirming safe read operations. However, the description adds no context about pagination behavior (cursor-based), default page sizes, or how multiple filters interact (AND logic).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description is efficiently structured and front-loaded, but it is underspecified for a tool with 6 parameters and pagination support. It prioritizes brevity over necessary functional context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given 6 parameters including cursor-based pagination and multiple filters, the description is incomplete. It lacks explanation of pagination mechanics, filtering logic, or return value structure (no output schema exists to compensate).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has 100% description coverage for all 6 parameters (cursor, per_page, student_id, planned_course_id, status, with_canceled). Since the schema fully documents the parameters, the description baseline is 3, though it adds no additional semantic context.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all enrollment records' essentially restates the tool name (get_enrollments) without clarifying scope. It fails to distinguish from sibling tool 'get_enrollment' (singular) or explain whether this returns a list vs. single record.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus 'get_enrollment' (singular), 'get_program_enrollments', or 'cancel_enrollment'. No mention of pagination requirements or filtering best practices despite having 6 optional parameters.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but offers minimal information. It states 'all' implying unfiltered retrieval, but lacks details on rate limits, default sorting, maximum page sizes, or whether the operation is idempotent/safe.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at four words, but this brevity results in under-specification rather than efficiency. While not verbose or poorly structured, it wastes the opportunity to provide necessary context in the front-loaded position.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a list endpoint with standard pagination, the description fails to explain the referral entity structure, relationships to other objects (like users or programs visible in siblings), or the pagination implementation details. Without an output schema, the description should compensate but does not.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for both parameters (cursor and per_page), clearly documenting pagination mechanics. The description adds no parameter-specific context, but the schema adequately compensates, meeting the baseline expectation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all referral records' is tautological—it merely restates the tool name 'get_referrals' with the addition of 'all' and 'records'. It fails to specify what constitutes a 'referral' in this domain (e.g., student referrals, program referrals) or how this relates to similar concepts like leads visible in the sibling list.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives, nor any mention of pagination behavior beyond the parameter names. The description does not indicate whether this retrieves historical referrals, active referrals, or filtered subsets.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but provides none. It does not indicate whether the operation is safe/idempotent, what happens when the task ID doesn't exist, or what data structure is returned.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief at three words, but lacks front-loaded value. While not verbose, the extreme brevity constitutes under-specification rather than efficient communication of essential details.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the lack of output schema and annotations, the description should explain the task entity structure, relationships to other objects (courses, teachers), or error handling. As a simple CRUD retrieval tool, it meets minimum viability only for the happy path.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the task to retrieve'), adequately documenting the single parameter. The description adds no additional semantic information, but the schema is sufficient for baseline understanding.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a task record' is essentially a tautology that restates the tool name without clarifying what constitutes a 'task' in this domain (e.g., to-do item, assignment, course task). It fails to distinguish from sibling tool 'get_tasks' which likely retrieves multiple records.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this single-record retrieval versus the 'get_tasks' list operation. No mention of prerequisites, required permissions, or error conditions (e.g., invalid ID).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already indicate this is a non-read-only (readOnlyHint: false), non-idempotent write operation. The description adds no behavioral context beyond these annotations—it doesn't clarify whether this creates a new record or updates an existing one, nor does it explain the side effects of multiple invocations despite the idempotentHint: false flag.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief at three words, avoiding verbosity or redundancy. However, this conciseness results from under-specification rather than efficient information density, leaving the agent with minimal actionable context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a state-mutating tool with four parameters and no output schema, the description is insufficient. It fails to explain the business logic (what attendance tracking means), the relationship between required parameters (meeting_id and enrollment_id), or the implications of the different enum states.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the input parameters are fully documented in the structured schema. The description adds no parameter-specific guidance, but the baseline score of 3 is appropriate since the schema carries the semantic burden without needing supplementation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Set an attendance' is essentially a tautology that restates the tool name with minimal elaboration. While it identifies the domain (attendance), it fails to explain what 'setting' entails (e.g., marking presence/absence) or distinguish this mutation operation from the sibling get_attendances tool.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to invoke this tool versus alternatives, nor does it mention prerequisites like obtaining valid meeting_id or enrollment_id values. There is no discussion of workflow context or exclusion criteria.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description does not contradict the annotations (idempotentHint: true, destructiveHint: false, readOnlyHint: false), but adds no behavioral context beyond what the annotations already provide, such as side effects or partial update behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While extremely brief (3 words), the description is not verbose or poorly structured. However, it lacks informational density—every sentence should earn its place, and this provides minimal value.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite good annotations and schema coverage, the description fails to clarify the domain concept of an affiliation (linking users to accounts) or explain the update semantics (e.g., partial updates, immutability of certain fields).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameters are well-documented in the schema itself. The description adds no additional parameter guidance, meeting the baseline for high schema coverage scenarios.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update an affiliation' is essentially a tautology that restates the tool name without explaining what an affiliation represents (a user-account relationship per the schema) or how it differs from create_affiliation or delete_affiliation siblings.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives, prerequisites for the update, or which fields are optional versus required beyond the schema itself.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare this is a non-destructive, idempotent write operation (readOnlyHint: false, idempotentHint: true, destructiveHint: false). The description adds no behavioral context beyond these annotations—failing to mention that unspecified fields are preserved, what validation rules apply to addresses, or the response format.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description is maximally concise, but this brevity renders it unhelpful rather than efficient. While not verbose, the sentence fails to earn its place by providing information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's complexity—including a nested address object with seven sub-fields and no output schema—the description is inadequate. It omits the scope of updatable fields, partial update behavior (only 'id' is required), and return value expectations, leaving significant gaps the agent must resolve through trial and error.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 67% schema description coverage, the input schema adequately documents the id, name, and nested address_attributes fields. The description adds no parameter-specific guidance (e.g., that address_attributes is optional or that id identifies the existing record), but the schema carries sufficient semantic weight to meet baseline expectations.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a course location' is essentially a tautology that restates the tool name. While it identifies the verb (update) and resource (course location), it fails to distinguish this tool from siblings like create_course_location or delete_course_location, and omits the critical detail that this specifically handles address attributes.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (create_course_location), prerequisites (existing location ID required), or whether partial updates are supported. The agent must infer usage patterns solely from the schema structure.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations indicate idempotent, non-destructive mutation, but the description adds no behavioral context beyond this. It doesn't explain the update semantics (partial vs full replacement), what happens to unset fields, or the business logic constraint implied by end_date (fixed vs flexible courses mentioned in schema).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At three words, the description is maximally concise but falls into under-specification. Structure is front-loaded but content-free; it wastes no words yet fails to earn its place by providing actionable context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with no output schema, the description inadequately covers the operation scope. It omits what constitutes a valid update (e.g., end_date restrictions), relationships to enrollment lifecycle states, and whether partial updates are supported.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, with both id and end_date fully documented. The description adds no parameter-specific guidance, but baseline 3 is appropriate since the schema carries the full semantic load.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update an enrollment' is a tautology that restates the tool name without adding specificity. It fails to distinguish from siblings like cancel_enrollment or award_certificate_to_program_enrollment, and doesn't clarify what enrollment attributes are mutable (only end_date per schema).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like cancel_enrollment or create_teacher_enrollment. No prerequisites or conditions mentioned (e.g., can any enrollment be updated or only active ones?).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The annotations already declare idempotentHint=true, readOnlyHint=false, and destructiveHint=false. The description adds no behavioral context beyond these annotations—it doesn't explain what happens during the update, side effects, or validation rules. No contradiction exists, but no value is added either.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief at three words. While it avoids verbosity, it is under-specified rather than elegantly concise. The content is front-loaded, but there is insufficient substance to evaluate structural effectiveness.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (2 parameters, no output schema), the description could suffice if it clarified scope. However, it fails to mention that only status can be updated (not other lead attributes), nor does it address the enum values or their business meanings, leaving gaps given the sibling tool ecosystem.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, both parameters (id and status) are fully documented in the schema itself. The description adds no parameter-specific semantics, but the baseline score of 3 is appropriate since the schema carries the full burden of documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a lead' is essentially a tautology that restates the tool name with minimal expansion. While it identifies the resource (lead), it fails to specify what aspects can be updated (only status per the schema) or distinguish this tool from siblings like create_lead or delete_lead.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives. It does not mention that this is specifically for status transitions, nor does it clarify prerequisites (e.g., that the lead must exist) or when to use create_lead instead.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already disclose readOnlyHint=false, idempotentHint=true, and destructiveHint=false, establishing this as a safe, retryable write operation. The description adds no behavioral context about partial versus full updates, field preservation rules, or side effects, despite the ambiguity of whether omitted parameters (name, material_group_id) are cleared or ignored.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At three words, the description is not verbose, but it is inefficiently terse—it wastes the opportunity to provide necessary context. The single sentence states the obvious without earning its place through valuable additive information.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with idempotent semantics and no output schema, the description lacks critical context about the relationship between materials and material groups (given the material_group_id parameter) and whether unspecified fields are preserved during update. The distinction from 'update_material_group' remains unclear despite the sibling tool's existence.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema fully documents all three parameters (id, name, material_group_id). The description adds no semantic clarification about valid material states, naming constraints, or material group relationships, meriting the baseline score for complete schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a material' is essentially a tautology that restates the tool name with spaces added. While it identifies the verb (Update) and resource (material), it fails to distinguish from siblings like 'update_material_group' or clarify what constitutes a 'material' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives such as 'create_material' (for new materials) or 'update_material_group' (for group classification changes). No prerequisites or conditions are mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description does not contradict annotations (readOnlyHint=false, idempotentHint=true, destructiveHint=false), but adds no behavioral context beyond them. It fails to explain the idempotent nature, error handling for non-existent IDs, or that omitted fields preserve existing values.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is brief (4 words), but verges on under-specification rather than purposeful conciseness. It lacks front-loaded key details that would help an agent quickly identify this as a partial update operation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple 2-parameter schema and presence of annotations, the description still falls short by failing to clarify the partial-update nature (name is optional) and not mentioning what constitutes success or the expected interaction pattern with the idempotentHint.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema adequately documents both id and name parameters. The description adds no additional parameter semantics, meeting the baseline score for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a material group' is a tautology that restates the tool name. While it indicates the general action, it fails to distinguish this tool from siblings like create_material_group or update_material, and does not specify the scope of the update operation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus create_material_group, or prerequisites such as requiring an existing material group ID. No mention of partial update semantics despite the optional name parameter.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations declare idempotentHint=true and destructiveHint=false, establishing the safety profile. However, the description adds no behavioral context about partial updates (implied by single required 'id' field), error handling for non-existent IDs, or authentication requirements. It contributes nothing beyond the structured annotations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description is not verbose, but it is under-specified rather than efficiently concise. It front-loads no useful constraints or scoping information, failing the 'every sentence should earn its place' standard for this complexity level.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For an update operation with 5 parameters including nested objects and no output schema, the description is insufficient. It omits crucial update semantics (partial vs. full replacement), does not clarify the idempotent behavior declared in annotations, and provides no guidance on the optional nature of most fields.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 80%, with all top-level fields except address_attributes having descriptions. Per guidelines, high schema coverage establishes a baseline of 3. The description adds no parameter semantics (e.g., explaining the relationship between course_location_id and the meeting location, or address_attributes structure), but the schema adequately documents individual fields.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a meeting location' is a tautology that restates the tool name without adding specificity. While it identifies the verb (update) and resource (meeting location), it fails to distinguish from sibling tools like create_meeting_location or delete_meeting_location.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like create_meeting_location, nor any mention of prerequisites (e.g., that the location must exist). The description offers no contextual signals for proper tool selection.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full burden for behavioral disclosure but provides none. It does not specify whether this performs partial or full updates (though the schema requires both parameters, suggesting full replacement), does not describe side effects on existing teacher associations, and does not indicate the return value or error conditions.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (4 words) without wasted language, but it is underspecified rather than elegantly concise. It lacks the necessary detail to earn a higher score despite its brevity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation operation with no annotations and no output schema, the description is inadequate. It omits the nature of teacher roles, behavioral constraints, and return value information that would be necessary for an agent to use this tool confidently.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the teacher role to update' and 'The name of the teacher role'), so the structured data sufficiently explains the parameters. The description adds no semantic meaning beyond the schema, meeting the baseline for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a teacher role' is a tautology that restates the tool name without adding specificity. It fails to define what constitutes a 'teacher role' (e.g., job title, permission set, or classification) and does not differentiate from siblings like create_teacher_role or delete_teacher_role beyond the implicit verb.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (e.g., create_teacher_role for new roles), nor any prerequisites such as role existence verification or required permissions. The agent must infer usage solely from the tool name.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description adds no behavioral context beyond what the annotations already provide (idempotentHint: true, destructiveHint: false). It does not clarify whether adding a label replaces existing labels or appends to them, what validation occurs, or what constitutes a successful operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at four words with no redundant or wasted language. However, it is under-specified rather than efficiently informative—appropriate length for the content provided, but lacking necessary detail.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the parameter ambiguity (two ID fields with confusing descriptions) and lack of output schema, the description should clarify parameter usage and expected outcomes. It leaves critical gaps regarding which identifier corresponds to which resource and what the tool returns upon success.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline applies. However, the schema descriptions are confusingly similar (both referencing 'label'), and the description fails to disambiguate that 'id' likely refers to the order while 'label_id' refers to the label. It neither clarifies the relationship nor adds semantic meaning beyond the schema's flawed descriptions.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Add label to an order' is a tautology that merely restates the tool name in sentence case. While it identifies the verb (add) and resources (label, order), it fails to distinguish this tool from siblings like create_label (which creates the label entity itself) or clarify the nature of the association being created.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives, prerequisites (e.g., whether the label and order must already exist), or expected workflow sequences. There is no mention of related operations like removing labels or updating orders.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations indicate the operation is not read-only, idempotent, and non-destructive. The description adds no behavioral context beyond what annotations provide—such as whether cancellation is permanent, reversible, triggers notifications, or has side effects on related enrollments.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (three words), the description is under-specified rather than efficiently concise. The single sentence fails to earn its place by adding information beyond the tool name itself, functioning merely as a placeholder.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite simple schema richness (one well-documented parameter, no output schema), the description is incomplete for the tool ecosystem. Given numerous sibling cancellation tools with overlapping semantics, the failure to clarify scope or differentiate usage makes it inadequate for reliable agent selection.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the enrollment'), the schema fully documents the single parameter. The description adds no additional semantic information (such as where to obtain the ID or validation rules), warranting the baseline score of 3.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Cancel an enrollment' is tautological—it simply restates the tool name with articles added. While it identifies the verb (cancel) and resource (enrollment), it fails to distinguish this tool from siblings like 'cancel_program_enrollment' or 'cancel_order' that operate on similar concepts.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives such as 'cancel_program_enrollment' or 'delete_teacher_enrollment'. Given the extensive list of sibling cancellation tools, the absence of selection criteria leaves the agent without decision-making context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description adds no behavioral context beyond the provided annotations. While annotations correctly indicate this is a non-read-only, non-destructive, non-idempotent operation, the description fails to disclose what the tool returns (the created object ID or full object), error conditions (duplicate names), or side effects.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The three-word description suffers from under-specification rather than genuine conciseness. While it contains no fluff, the single sentence fails to earn its place by providing actionable information beyond the tool name, leaving the agent without necessary context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a creation tool with 4 parameters and no output schema, the description is inadequate. It omits return value documentation, error handling behavior, and domain context (how categories relate to courses/programs in this system) that would be necessary for correct agent invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the input parameters are fully documented in the schema itself. The description adds no additional parameter context, but baseline expectations are met since name, description, is_published, and parent_id are all clearly defined in the JSON schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a category.' is a tautology that restates the tool name (create_category) with minimal variation. While it indicates a write operation, it fails to specify what domain these categories belong to (courses, products, content) or how they relate to the sibling tools like create_course or create_program.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like update_category, or prerequisites for creation. The description does not mention that categories can be hierarchical (implied by parent_id parameter) or explain the publishing workflow indicated by is_published.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description adds no behavioral context beyond what annotations already provide (readOnlyHint: false confirms it's a write operation). It fails to disclose that the tool creates polymorphic associations across eight different entity types, or what happens upon successful creation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), the description suffers from under-specification rather than efficient conciseness. The single sentence fails to earn its place by providing zero information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a creation tool with polymorphic behavior (commenting on multiple entity types) and no output schema, the description is inadequate. It omits the scope of commentable entities, return value expectations, and side effects.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 67% schema description coverage, the schema adequately documents content and commentable_id. The description adds no parameter semantics, but the baseline of 3 applies since the schema covers most parameter documentation without requiring compensation from the description.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a comment' is a tautology that restates the tool name without adding specificity. It fails to mention the polymorphic nature of comments (can attach to Accounts, Invoices, Leads, etc. per the commentable_type enum) or distinguish from sibling tools like update_comment and delete_comment.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus update_comment, or prerequisites for creating comments. The description lacks any 'when-to-use' or 'when-not-to-use' clauses despite the existence of related mutation tools in the sibling list.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly mark this as non-readOnly and non-idempotent, the description adds no behavioral context. It fails to disclose that creation likely requires admin privileges, what validation occurs on the code field (uniqueness?), or that setting is_published=true likely makes the course visible to students immediately.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The three-word description is technically concise but constitutes under-specification rather than efficient communication. For a tool with 9 parameters including complex nested objects and business logic dependencies, this length fails to front-load critical constraints or earning its brevity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's complexity (nested arrays, conditional required fields like cost, enum logic for cost_scheme), the description is grossly incomplete. It omits the course-template nature of the resource, billing implications, and the prerequisite that category_id must reference an existing category. No output schema is present to compensate.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 67% (6 of 9 parameters described), placing it between the 50-80% threshold. The description adds no parameter guidance, but the schema adequately documents most fields including the cost/cost_scheme dependency and nested course_tab_contents_attributes structure. The 'custom' and 'custom_associations' objects remain opaque.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a course' restates the tool name without distinguishing from critical siblings like 'create_planned_course' (likely for scheduling instances) or 'create_program' (curriculum containers). It fails to clarify what a 'course' represents in this domain (apparently a catalog template given the is_published and cost_scheme parameters).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus 'update_course' (if a course code already exists) or 'create_planned_course' (for scheduling instances). The relationship between category_id requirement and prerequisite category creation is not mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare readOnlyHint=false and idempotentHint=false, the description adds no behavioral context beyond these structured hints. It does not explain what constitutes a duplicate (given idempotentHint=false), what relationships are formed, or what the return value represents.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The three-word description is technically concise, but this represents under-specification rather than efficient communication. The single sentence fails to earn its place by providing actionable information, leaving the agent with no more knowledge than the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the existence of 'create_course' and 'get_course_variants' siblings, the description inadequately explains the domain concept of a 'variant'. Without clarifying the relationship between courses and their variants, the agent lacks critical context for correct invocation despite the simple schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (the 'name' parameter is documented as 'The name of the course variant'), the description meets the baseline expectation. However, the description itself mentions no parameters, so it adds no semantic value beyond the schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a course variant' is a tautology that restates the tool name (create_course_variant) with spaces added. It fails to define what a 'course variant' is or how it differs from the sibling tool 'create_course', leaving the agent without domain context to select the correct tool.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus 'create_course' or other sibling tools like 'create_program'. There are no prerequisites, conditions, or workflow context indicating when variants are appropriate.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly indicate idempotent, non-destructive write behavior, the description adds no context about what happens on creation (e.g., duplicate handling, side effects, or the relationship between object_slug and the created record). No contradiction with annotations exists.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description is under-specified rather than efficiently concise, providing no actionable information beyond the tool name. No structural organization exists to prioritize critical constraints or usage patterns.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool handling nested property structures within a custom object system (implied by siblings), the description inadequately explains the data model, what constitutes valid properties content, or post-creation behavior. Critical contextual gaps remain despite existing annotations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 75% schema description coverage, the baseline semantics are handled by the schema itself. However, the description adds no value for the undocumented 'properties' object parameter or to clarify the confusing schema description of 'object_slug' (labeled as ID of the custom record despite being a creation operation).

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a custom record' is a tautology that merely restates the tool name in sentence case. It fails to define what a 'custom record' is, how it relates to 'custom objects' (referenced in sibling tools like get_custom_object_by_object_slug), or distinguish its purpose from update_custom_record.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus update_custom_record, or prerequisites such as requiring an existing custom object (implied by the object_slug parameter). The description lacks any mention of validation constraints or required conditions beyond the schema.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare non-idempotent, non-destructive mutation behavior, the description adds zero behavioral context beyond this. It fails to explain that repeated calls create duplicate labels or clarify the scope of created labels across different model types.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Though extremely brief, the description is inappropriately sized—too short to provide useful context. It lacks complete sentences and fails to front-load critical distinctions from sibling tools.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the risk of confusion with 'add_label_to_order' and the importance of the model_type enum (Lead, Order, etc.), the description is incomplete. It fails to explain that this creates reusable label definitions rather than applying labels to specific records.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (name, color, model_type all documented), the schema carries the full burden. The description adds no parameter guidance, meeting the baseline expectation for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a label' tautologically restates the tool name 'create_label' without distinguishing from siblings like 'add_label_to_order' (which applies labels) or clarifying that this creates label definitions.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives such as 'add_label_to_order', 'update_label', or 'delete_label'. No mention of prerequisites or expected workflow.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description adds no behavioral context beyond the annotations. It fails to explain non-idempotency (annotations indicate idempotentHint: false), what validation occurs (e.g., for the nested address_attributes), or side effects of creation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), it is under-specified rather than efficiently concise. Given the complexity (4 parameters including a nested address object), the description is inappropriately sized and fails to front-load critical context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Inadequate for the tool's complexity. The nested address_attributes object and required course_location_id suggest business logic relationships that are completely unexplained. No output schema means the description should clarify what gets created, but it doesn't.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 75% schema description coverage, the parameters are reasonably documented in the schema. The description adds no parameter context, but the high schema coverage means it doesn't need to compensate significantly. Baseline score applies.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a meeting location' is tautological, essentially restating the tool name without elaborating what distinguishes a 'meeting location' from similar resources like 'course_location' (a sibling tool) or what this entity represents.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus alternatives (e.g., update_meeting_location), nor prerequisites (e.g., requiring an existing course_location_id), nor relationships between meeting locations and course locations.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but provides none. It fails to mention side effects (notifications triggered), idempotency, validation rules for the due_date format, or whether the operation returns the created task ID.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief, this is under-specification rather than effective conciseness. For an 8-parameter tool with complex entity relationships, three words is insufficient to earn a higher score on this dimension.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's complexity (8 parameters including polymorphic subject references) and lack of annotations or output schema, the description is incomplete. It omits critical context about task assignment workflows, the subject linking mechanism, and expected behavior.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, establishing a baseline of 3. The description adds no parameter-specific context (e.g., explaining the relationship between subject_type and subject_id, or that only name is required), but the schema adequately documents individual fields.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a task.' is a tautology that restates the tool name without elaborating on the resource type or its domain context. It fails to distinguish this from sibling creation tools (e.g., create_course, create_lead) or explain that tasks link to subjects like Courses, Accounts, or Leads.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (update_task vs create_task), prerequisites for creation, or required relationships (e.g., whether subject_id is mandatory when subject_type is provided).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but reveals nothing about persistence (how long the webhook lasts), side effects (when event delivery starts), authentication/verification requirements, or error handling.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At three words, the description is technically concise but suffers from under-specification. Given the tool has multiple parameters and no annotations, this brevity represents missed opportunity rather than efficient communication.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 3-parameter mutation tool with no output schema and no annotations, the description is inadequate. It omits critical context such as available event types, whether webhooks are verified upon creation, and the structure of any response.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for all three parameters (url, active, events). The description adds no parameter semantics beyond the schema, meeting the baseline score for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Register a webhook' uses a synonym for 'create' but fails to distinguish from the sibling tool update_webhook or clarify what 'register' entails in this context. It essentially restates the tool name with minimal elaboration.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use create_webhook versus update_webhook, nor any prerequisites (e.g., URL validation requirements, supported event types) or conditions for usage.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare this as destructive, idempotent, and non-read-only, but the description adds no behavioral context beyond these structured hints. It fails to clarify whether deletion is permanent, what happens to associated objects currently using this label, or whether the operation can be reversed. The agent gains no additional safety or operational context from the description text.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence 'Delete a label' is brief but constitutes under-specification rather than efficient, valuable communication. It front-loads no critical context about implications, constraints, or return values. The extreme brevity leaves significant gaps in the agent's understanding despite consuming minimal tokens.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive operation affecting a resource with clear relationships to other entities (evidenced by sibling `add_label_to_order`), the description inadequately addresses cascading effects or dependency checks. No output schema exists to clarify return values, yet the description fails to compensate by explaining success/failure states or idempotency behavior. The minimal description creates operational risk for a destructive tool.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage, with the `id` parameter explicitly documented as 'ID of the label to delete'. The description adds no additional semantics, validation rules, or format examples beyond what the schema provides. With the schema carrying the full descriptive burden, the baseline score applies.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a label' merely restates the tool name 'delete_label' without expanding on scope, side effects, or specific resource constraints. It fails to distinguish this tool from sibling operations like `update_label` or clarify the permanence of the deletion. While not misleading, it provides no semantic value beyond the function name itself.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description offers no guidance on when to use this tool versus alternatives such as `update_label` (for modifying existing labels) or prerequisites such as removing label associations. It omits critical preconditions like whether the label must be unassigned from orders (via `add_label_to_order`) before deletion. No exclusions or error conditions are specified.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but offers only the word 'Delete'. It does not specify whether this is a hard delete or soft delete, what happens to associated enrollments/elements, or whether the operation is reversible.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief at only four words, this represents under-specification rather than efficient conciseness. The single sentence merely restates the tool name and wastes the opportunity to front-load critical behavioral warnings or sibling differentiations.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive single-parameter operation, the description is inadequate. With no output schema, no annotations, and significant ambiguity regarding sibling tools (delete_program), the description should provide behavioral context and usage guardrails that are completely absent.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the program edition to delete'), so the schema adequately documents the parameter. The description adds no supplemental parameter guidance, which is acceptable given the complete schema documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a program edition' is essentially a tautology that restates the tool name without adding specificity. It fails to distinguish this tool from sibling delete_program, which deletes the entire program rather than a specific edition.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus delete_program or cancel_program_enrollment. There are no warnings about prerequisites, cascading effects on enrollments, or irreversibility considerations for this destructive operation.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Delete' implies a destructive operation, the description fails to specify if deletion is permanent, if related data (subtasks, comments) are cascaded, or if the operation is reversible.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is three words and contains no wasted text, but this represents under-specification rather than effective conciseness. The single sentence fails to earn its place by providing minimal value beyond the tool name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive operation with no annotations and no output schema, the description is inadequate. It lacks safety warnings, permanence disclosures, or return value information that would help an agent understand the consequences of invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the single 'id' parameter is fully documented in the schema itself ('ID of the task to delete'). The description adds no additional semantic context beyond the schema, meeting the baseline for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a task' is a tautology that restates the tool name (delete_task). While it technically contains the verb and resource, it fails to distinguish from sibling operations like update_task or create_task, and offers no scope clarification (e.g., permanent vs. soft delete).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives such as update_task (to mark complete) or cancel_task (if it existed). No prerequisites are mentioned (e.g., task existence, user permissions).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations declare readOnly/idempotent/destructive hints, so the description carries reduced burden. However, it adds zero behavioral context beyond these annotations—no mention of pagination behavior, default page sizes, or what data structure is returned. It does not contradict annotations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (3 words), this represents under-specification rather than effective conciseness. The single sentence fails to earn its place by providing meaningful operational context. No information is front-loaded because virtually no information is present.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 4 parameters including pagination controls (cursor, per_page) and filters (user_id, account_id), the description is inadequate. It omits that results are paginated, that filters exist, and what an affiliation represents. Despite good annotations, the agent lacks sufficient context to use this tool effectively.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, establishing a baseline of 3. The description mentions 'all' which conflicts slightly with the filtering parameters (user_id, account_id) present in the schema, but the schema clearly documents these optional filters. The description adds no semantic value beyond the schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all affiliations' restates the tool name with minimal addition. While it identifies the verb (Get) and resource (affiliations), it fails to distinguish from sibling tools like get_organization_affiliations or define what constitutes an affiliation in this domain. It borders on tautology.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus alternatives (e.g., get_organization_affiliations) or prerequisites. The description does not mention the pagination pattern (cursor/per_page) or explain that results can be filtered by user_id/account_id despite claiming 'all' affiliations.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly flag this as read-only and idempotent, the description adds no behavioral context—missing error handling (what happens if ID doesn't exist?), return format, or cache characteristics. It relies entirely on structured metadata.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At three words, it is brief but underwritten rather than concise. The sentence fails to earn its place by providing information not already present in the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple CRUD getter, the description is insufficient given the presence of sibling 'get_labels'. It omits the critical distinction that this retrieves a single entity by identifier versus listing multiples, and provides no domain context for the 'label' resource.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the label to retrieve'), the parameter is well-documented in structured form. The description adds no semantic detail about the ID format or constraints, meeting the baseline for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a label' is tautological, restating the function name without clarifying what a 'label' represents in this domain (e.g., category tag, physical label). Critically, it fails to distinguish from sibling tool 'get_labels' (plural), leaving ambiguity about whether this retrieves one by ID or performs a filtered search.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this singular retrieval versus 'get_labels' for listing, or versus 'add_label_to_order' for application. No prerequisites (e.g., needing a valid label ID) are mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly indicate this is a read-only, non-destructive operation, the description adds no behavioral context about pagination behavior, filtering logic (e.g., whether filters are AND/OR combined), or performance characteristics (e.g., 'all' implies potentially large result sets).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at only three words. However, it wastes its brevity by not front-loading critical information such as pagination support or filtering capabilities that would help an agent select this tool correctly.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool has 5 parameters including complex filtering (model_type enum with 8 values, id array) and pagination, and lacks an output schema, the description is insufficient. It should mention the ability to filter by entity type (model_type) and that results are paginated.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline is 3. The description mentions no parameters, so it neither adds meaning beyond the schema nor contradicts it. It fails to explain the relationship between parameters (e.g., that cursor/per_page enable pagination through the 'all' results).

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all labels' essentially restates the tool name (tautology) and fails to distinguish this tool from its sibling 'get_label' (singular). It does not clarify what 'labels' refers to in this domain or how this listing operation differs from retrieving a single label.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus 'get_label', 'create_label', 'update_label', or 'delete_label'. There is no mention of pagination requirements (despite cursor/per_page parameters) or when to use the various filters (model_type, search, id).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but fails to state what happens when the order ID is not found, what fields are returned, or whether authentication is required, beyond the implied read-only nature of the verb 'Get'.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While the four-word sentence is technically concise, it exemplifies under-specification rather than meaningful conciseness, failing to earn its place by providing actionable guidance for tool selection or differentiation from siblings.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the absence of an output schema and annotations, and the presence of closely named sibling tools like `get_orders`, the description is insufficiently complete as it fails to clarify the tool's specific scope, return value structure, or filtering behavior.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The schema has 100% description coverage for the single `id` parameter ('ID of the order to retrieve'), establishing a baseline score of 3. The description adds no additional semantic context about the parameter, but none is needed given the comprehensive schema documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get an order record' restates the tool name 'get_order' without adding specificity about what constitutes an order record or how it differs from retrieving multiple orders via the sibling tool `get_orders`.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this single-record retrieval tool versus the `get_orders` sibling tool, nor does it mention prerequisites such as needing a valid order ID beforehand.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly identify the operation as idempotent and non-destructive, the description adds no behavioral context. It does not clarify that partial updates are supported (only 'id' is required), nor does it mention error behavior for non-existent IDs.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Although brief at only three words, the description suffers from under-specification rather than efficient conciseness. The sentence fails to earn its place by providing zero value beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of 100+ sibling tools including create_label, delete_label, and get_labels, the description fails to establish proper context for when this specific mutation should be employed. The lack of output schema increases the burden on the description to explain return values, which it does not address.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage (id, name, color, model_type are all documented), so the description meets the baseline expectation. However, the description itself adds no semantic information about parameters beyond what the schema already provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a label' is a tautology that restates the tool name. It fails to specify what constitutes a 'label' in this domain (e.g., a classification tag) and does not differentiate from siblings like create_label or delete_label.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus create_label (which creates new labels) or add_label_to_order (which applies existing labels). No mention of prerequisites such as needing to know the label ID beforehand.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, and the description carries the full burden of disclosure but fails to explain mutation behavior, idempotency, whether unspecified fields are preserved or cleared, or what the response structure contains. The description is silent on security implications noted in the schema (email field restrictions for admins).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At only three words, the description is technically brief, but given the tool's complexity (12 parameters including nested address objects), this constitutes under-specification rather than efficient conciseness. Every sentence must earn its place, and this provides insufficient value for the complexity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Inadequate for a complex mutation tool with nested objects and no output schema; omits critical context such as partial update semantics, the significance of the required ID parameter, and the handling of the with_authentication field which schema indicates is 'only relevant when creating' despite being in an update tool.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 75% description coverage with documented fields including email, locale, address_attributes, and invoice_address_attributes. While the description adds no additional parameter guidance, the high schema coverage establishes a baseline score of 3.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a user.' restates the tool name (update_user) without specifying what aspects of the user can be updated or distinguishing its scope from sibling tools like create_user or delete_webhook. While the verb and resource are clear, the tautological nature and lack of differentiation warrant a low score.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives such as create_user, or prerequisites like requiring an existing user ID. The description fails to clarify whether this performs partial updates (PATCH) or full replacements (PUT).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations indicate this is a non-idempotent write operation (readOnlyHint: false, idempotentHint: false), the description adds no behavioral context about side effects, duplicate creation risks, or what constitutes a successful creation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is not verbose, but it severely under-specifies the tool. It fails to earn its place by providing any actionable information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a complex mutation tool with 7 parameters including nested objects (address, signup answers) and relationships (labels), the description is severely incomplete. It lacks output expectations, error conditions, or domain context needed to invoke this tool correctly.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 71% schema description coverage, the schema carries most of the documentation burden for parameters like `address_attributes` and `signup_answers_attributes`. The description adds no parameter semantics, but the baseline is adequate given the schema's decent coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create an account' is a tautology that merely restates the tool name. It fails to distinguish this tool from sibling `create_user` or clarify what type of account is being created (billing, customer, organizational, etc.) in this complex system.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like `create_user` or `create_lead`, nor are prerequisites mentioned. The agent cannot determine if this is for initial registration, administrative account creation, or bulk imports.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations indicate this is a non-idempotent write operation (readOnlyHint: false, idempotentHint: false), but the description adds no behavioral context beyond this. It fails to disclose that grade and score are mutually exclusive (at least one required), what happens if a grade already exists for the gradeable, or side effects like notifications triggered.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is front-loaded and wastes no words, but given the tool's complexity (polymorphic references, conditional required fields), it is undersized rather than appropriately concise. It provides insufficient information density for an agent to use the tool correctly without guessing.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 5 parameters including a polymorphic association pattern and conditional logic (grade vs score), the description is incomplete. While the schema is well-documented, the description omits domain context (grading enrollments/courses), return value behavior, and the critical XOR relationship between grade and score parameters.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline is 3. The description adds no semantic clarification beyond the schema (e.g., it doesn't explain the polymorphic pattern requiring gradeable_id + gradeable_type, or the grade/score relationship), but it doesn't detract from the well-documented schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a grade' is a tautology that restates the tool name (create_grade) with minimal expansion. It fails to specify what constitutes a 'grade' in this domain (academic grading), what resource it creates (a grade record), or how it differs from sibling tools like update_grade or delete_grade.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus update_grade (which also exists), or prerequisites such as requiring an existing enrollment/gradeable entity first. The polymorphic nature of gradeable_id/gradeable_type (implied by parameter descriptions) is not explained.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While the annotations already indicate this is a write operation (readOnlyHint: false) and not idempotent (idempotentHint: false), the description adds no behavioral context beyond what the name and annotations provide. It does not disclose what gets created, whether duplicates are allowed, or side effects on existing invoices.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (4 words) with no filler content, meeting conciseness requirements. However, it suffers from under-specification rather than verbosity. The single sentence exists but fails to earn its place by being redundant with the tool name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a creation tool with 2 parameters and no output schema, the description is inadequate. It does not explain the domain model (VAT configurations vs. invoice line items), return values, or how this relates to the broader invoicing system evidenced by sibling tools like get_invoice_vats and create_invoice.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (name and percentage parameters are both documented), the baseline score is 3. The description adds no additional semantic context about the parameters (e.g., expected format for percentage, uniqueness constraints for name), but the schema carries the full burden adequately.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create an invoice vat' is a tautology that restates the tool name (create_invoice_vat → 'Create an invoice vat'). It fails to clarify what an 'invoice vat' actually represents (likely a VAT tax rate/category) and does not distinguish this tool from the sibling create_invoice, which could confuse agents about whether this creates a tax configuration or a line item on an invoice.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like create_invoice, nor does it mention prerequisites (e.g., whether a corresponding invoice must exist first). The score reflects 'no guidance' per the rubric.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations indicate this is a non-idempotent write operation (idempotentHint: false, readOnlyHint: false), the description adds no context about this behavior, side effects, or what the tool returns upon success (no output schema exists to compensate). It does not explain that calling this twice creates duplicate materials.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (three words) with no redundant text, but it is under-specified rather than appropriately concise. It fails to front-load critical information about return values or behavioral constraints that structured fields do not cover.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the absence of an output schema, the description should explain what is returned (e.g., the created material ID or object), but it omits this. It also fails to clarify domain-specific concepts like the relationship between materials and material groups, leaving significant gaps despite the well-documented parameter schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for its 3 parameters (name, use_type, material_group_id). Since the schema fully documents the parameters, the baseline score applies; the description itself adds no additional semantic context about parameter relationships or enum implications.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a material' is tautological, merely restating the tool name with an article added. It fails to specify what constitutes a 'material' in this domain (e.g., physical inventory, digital asset) or how it differs from the sibling tool create_material_group.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like update_material or create_material_group. There is no mention of prerequisites (e.g., requiring an existing material_group_id) or workflow integration despite the presence of related CRUD siblings.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare readOnlyHint=false and destructiveHint=false, the description adds no behavioral context beyond this. It fails to disclose that idempotentHint=false means duplicate calls create duplicate groups, nor does it describe success indicators or side effects.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief at four words and front-loaded, but the extreme brevency constitutes under-specification rather than efficient communication. Every word is necessary but collectively insufficient.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given no output schema exists, the description should explain return values or success behavior, which it omits. It also fails to define what constitutes a material group (implied by parameter description to be related to courses) or how it differs from a material.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With schema description coverage at 100%, the input schema fully documents the 'name' parameter including its relation to courses. The description adds no parameter semantics, meeting the baseline score for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a material group' is a tautology that merely restates the snake_case tool name in sentence form. It fails to distinguish from siblings like 'create_material' or explain what a material group represents in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives such as 'create_material', 'update_material_group', or 'delete_material_group'. No prerequisites, context, or exclusion criteria are mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries full burden. While 'Create' implies mutation, the description discloses nothing about side effects (e.g., whether teachers are notified, if it creates draft or published states by default), authorization requirements, or error conditions.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is three words with no waste, but it is under-specified rather than appropriately concise. Given the tool's complexity (15 parameters with type-specific conditionals), this brevity represents inadequate structure rather than efficient communication.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a complex creation tool with 15 parameters, conditional logic (Fixed vs Flexible types), and nested objects, the description is inadequate. No output schema exists, and without annotations, the description fails to compensate for missing behavioral context.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is high at 87%, with clear documentation of conditional parameters (e.g., 'Only needed for fixed planned courses'). The description adds no parameter-specific context, but the baseline 3 is appropriate given the schema's quality.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a planned course' is a tautology that restates the tool name without explaining what distinguishes a 'planned course' from a 'course' (sibling tool create_course exists) or what 'planned' signifies in this domain. It fails to differentiate from siblings.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus create_course, create_program, or create_program_edition. No mention of prerequisites (e.g., requiring an existing course_id) despite the schema showing it as required.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It states the mutation type ('Create') but lacks critical details: idempotency, error handling (what if category_id is invalid?), side effects, or the return value structure (despite no output schema being defined).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The three-word description is technically concise with no filler, but it is under-specified rather than efficiently informative. Given the tool's complexity (9 parameters including nested objects), extreme brevity here represents a failure to front-load critical domain context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 9 parameters, nested array objects (course_tab_contents_attributes), conditional logic (cost required based on cost_scheme), and numerous siblings with similar names, the description is grossly incomplete. It fails to explain business rules or the domain model hierarchy (program vs edition vs element).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 67% (6 of 9 parameters have descriptions). While the description adds no parameter-specific guidance, the baseline score of 3 applies since the schema adequately documents most fields, though it leaves custom, custom_associations, and course_tab_contents_attributes under-documented.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a program' is a tautology that restates the tool name (create_program) without adding specificity. It fails to distinguish this tool from siblings like create_program_edition or create_program_element, leaving ambiguity about what constitutes a 'program' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives (e.g., create_program_edition), prerequisites (e.g., existing category_id), or the relationship between cost_scheme and cost parameters. The description offers no operational context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure, yet it provides none. It does not indicate whether creation is immediate, if it triggers side effects (notifications, emails), whether it is idempotent, or what authorization is required for the 11-parameter mutation operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (3 words) and front-loaded, but it is inappropriately sized for the tool's complexity. While not verbose or redundant, the single sentence fails to earn its place because it provides only tautological information without substance.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a complex creation tool with 11 parameters (including nested objects and conditional requirements like cost depending on cost_scheme), no output schema, and no annotations, the description is severely inadequate. It lacks any explanation of return values, error conditions, or relationships to the broader program management workflow.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 82% schema description coverage, the input schema already documents most parameters effectively (e.g., 'cost_scheme' enum values, 'min_participants' semantics). The description adds no parameter-specific guidance, but the high schema coverage establishes a baseline score of 3.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a program edition' is tautological, merely restating the tool name with spaces added. It fails to distinguish this tool from siblings like 'create_program' or 'update_program_edition', and offers no explanation of what constitutes a 'program edition' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives (e.g., 'create_program' for the parent entity), nor are prerequisites mentioned (e.g., requiring an existing program_id from a previously created program). The description offers zero contextual usage hints.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to indicate whether this operation is idempotent, what happens if the user_id doesn't exist, whether it sends notifications, or what the return value indicates.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While appropriately brief at three words, the description is under-specified rather than efficiently informative. The single sentence does not earn its place by delivering actionable context beyond the function name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter tool without output schema or annotations, the description remains insufficient. It omits the critical domain relationship that teachers are created from existing users (implied by the user_id parameter but never stated), leaving ambiguity about the tool's exact function in the user management workflow.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage for the single user_id parameter, the baseline score applies. The description adds no additional parameter context (e.g., explaining that this converts an existing user rather than creating a new account), but the schema adequately documents the parameter.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a new teacher' essentially restates the tool name (tautology) and fails to distinguish from siblings like create_teacher_role, create_teacher_enrollment, or activate_teacher. It does not clarify whether this creates a new user account or promotes an existing user to teacher status.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like activate_teacher (for reactivating) or create_teacher_enrollment (for assigning to courses). No prerequisites mentioned, such as whether the user_id must already exist via create_user.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already establish the operation is destructive and idempotent. The description adds no behavioral context beyond this, failing to disclose what happens to dependent entities (e.g., planned courses using this location), whether deletion is permanent, or permission requirements.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, brief sentence that is front-loaded, but it is underspecified rather than appropriately concise. It lacks the necessary detail for a destructive operation while not being verbose.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given this is a destructive operation (destructiveHint: true) with no output schema, the description is insufficient. It omits critical context about irreversibility, cascading effects on related entities, and recovery options that would be necessary for safe invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (the 'id' parameter is fully documented in the schema), the baseline score is 3. The description adds no additional semantic information about the parameter beyond what the schema provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a course location' is a tautology that restates the tool name (delete_course_location). While it identifies the verb and resource, it fails to distinguish from siblings like delete_meeting_location or other deletion tools.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., update_course_location to disable vs delete), nor does it mention prerequisites such as checking for existing courses at this location.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the operation is a deletion but fails to specify whether it is permanent, reversible, or what happens to associated data (e.g., attendance records, grades).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is brief (4 words), avoiding verbosity, but the single sentence fails to earn its place by providing substantive information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive operation with no annotations, the description is inadequate. It lacks critical context regarding side effects, data integrity impacts, or the distinction between deletion and cancellation, which is necessary given the sibling tool cancel_enrollment.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for the single 'id' parameter. The description adds no additional parameter context, meeting the baseline expectation when schema documentation is complete.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a teacher enrollment' is a tautology that restates the tool name verbatim. While it identifies the verb and resource, it fails to distinguish this tool from siblings like cancel_enrollment or indicate the scope/permanence of the deletion.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives such as cancel_enrollment or update_teacher_enrollment, nor are prerequisites (e.g., enrollment status requirements) specified.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure, yet provides none. It does not specify whether deletion is permanent, whether it cascades to teacher_enrollments or other related entities, or if the operation can be reversed. The single word 'Delete' implies destruction but lacks crucial safety context for a destructive operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (three words), which prevents verbosity, but this is under-specification rather than efficient conciseness. The single sentence fails to earn its place because it merely labels the tool rather than describing it, providing no actionable information beyond the name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    As a destructive operation with no output schema and no annotations, the description should explain side effects, cascade behavior, and recovery options. Given the ecosystem complexity (evidenced by 100+ sibling tools including teacher_enrollments and planned_courses), the description is dangerously incomplete for a deletion tool.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the teacher role to delete'), so the schema adequately documents the parameter. The description adds no additional semantic information about the parameter format, validation rules, or how to obtain valid IDs, warranting the baseline score.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a teacher role' is essentially a tautology that restates the tool name with spaces added. While it identifies the verb (delete) and resource (teacher role), it fails to distinguish this tool from siblings like delete_teacher_enrollment or clarify the scope/semantics of 'teacher role' versus other teacher-related entities.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like update_teacher_role (modification vs deletion) or deactivate_teacher. There is no mention of prerequisites (e.g., whether the role must be unassigned from all teachers first) or consequences of deletion.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. The description mentions 'Delete' but does not clarify whether this is a permanent hard delete, what happens to pending webhook deliveries, or whether the operation is idempotent. For a destructive operation, this lack of safety context is a significant gap.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at three words, avoiding verbosity. However, the single sentence fails to earn its place by merely restating the tool name rather than adding value. While appropriately sized for a simple single-parameter tool, it represents under-specification rather than efficient information density.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the absence of annotations, output schema, and the presence of numerous sibling webhook tools, the description is incomplete. It fails to explain the implications of deletion (e.g., immediate cessation of calls), error scenarios, or how this operation fits into the webhook lifecycle management alongside create_webhook and update_webhook.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage with the 'id' parameter clearly documented as 'ID of the webhook to delete'. Since the schema fully describes the parameter semantics, the baseline score of 3 applies. The description adds no additional parameter context (e.g., format constraints, where to obtain the ID), but is not penalized given the complete schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a webhook' is essentially a tautology that restates the tool name 'delete_webhook' with slight grammatical variation. While it identifies the verb (delete) and resource (webhook), it fails to distinguish this tool from sibling operations like update_webhook or create_webhook, and provides no scope clarification (e.g., permanent vs. soft delete).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives such as update_webhook or deactivate_teacher. It lacks prerequisites (e.g., checking webhook status before deletion), warnings about irreversibility, or conditions where deletion might fail.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly declare readOnlyHint=true and destructiveHint=false, the description adds no behavioral context beyond this—omitting details about pagination behavior, default page sizes, or what constitutes a 'custom object' in this domain.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At three words, it is appropriately brief, but suffers from under-specification rather than meaningful conciseness. No structural issues, yet no value per sentence.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of a sibling specific-getter tool and pagination parameters, the description is incomplete. It should clarify the 'get all' versus 'get one' distinction and acknowledge pagination behavior.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (cursor and per_page are fully documented), the baseline score applies. The description provides no additional parameter semantics, but the schema carries the load.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all custom objects' is a tautology that restates the tool name. It fails to distinguish from the sibling tool get_custom_object_by_object_slug (which retrieves a specific object by slug) or clarify the scope of 'all'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus get_custom_object_by_object_slug for fetching specific objects, nor does it mention pagination requirements despite having cursor/per_page parameters.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but provides almost none. It does not clarify what constitutes an 'element' in this domain, whether results are paginated (despite cursor/per_page parameters), or the return structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief at one sentence, but the brevity reflects under-specification rather than efficient information density. It is front-loaded by default due to its length, though the content is tautological.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the complexity of the domain (evidenced by 100+ sibling tools with similar naming patterns like 'program', 'edition', and 'element'), the description is inadequate. It fails to clarify relationships between entities or explain what differentiates this from similar list operations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, with all three parameters (id, cursor, per_page) documented in the schema itself. The description adds no parameter-specific context, meeting the baseline score for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get the elements of a program edition' essentially restates the tool name without adding specificity. It fails to distinguish this tool from siblings like 'get_program_elements' or 'get_program_element', leaving ambiguity about whether this retrieves elements within a specific edition instance or something else.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like 'get_program_elements' or 'get_program_edition'. There is no mention of prerequisites (e.g., needing a valid program edition ID) or when pagination parameters are required.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare readOnlyHint=true and destructiveHint=false, the description adds no behavioral context beyond this. It does not disclose what happens when the invoice ID is not found (error vs null), what fields are returned, or any rate limiting considerations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is brief (4 words), but it suffers from under-specification rather than efficient information density. The single sentence does not earn its place by conveying unique value—it merely repeats the function name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of multiple sibling invoice tools and no output schema, the description should explain the return structure or distinguish this from related operations. It provides insufficient context for an agent to confidently select this tool over get_invoice_pdf or other invoice getters.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the invoice to retrieve'), the description meets the baseline expectation. However, the description text itself contributes no additional parameter semantics (e.g., where to find the ID, format requirements beyond the schema).

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get an invoice record' is tautological—it restates the tool name 'get_invoice' without adding specificity. It fails to distinguish from siblings like get_invoice_pdf, get_invoice_vats, or get_invoice_payments_by_invoice_id, leaving the agent uncertain which invoice-related tool to use.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives. The description does not indicate whether this retrieves the full invoice details, how it differs from get_invoice_pdf (which gets a PDF representation), or when to use create_invoice instead.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but fails to mention pagination behavior, default sorting, result limits, or whether results are user-scoped or global. The word 'all' is potentially misleading given the pagination parameters.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief at four words. While not verbose, it wastes the opportunity to add value given the tool's complexity (6 parameters, pagination). It is appropriately sized but underweight for the functional complexity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a list endpoint with 6 parameters including pagination controls and no output schema, the description is inadequate. It should explain the pagination model (cursor-based), filtering capabilities, and relationship to singular retrieval tools.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, with each parameter (cursor, per_page, creator_id, created_at_after, catalog_variant_id, sort) already documented. The description adds no parameter-specific guidance, meeting the baseline expectation when schema coverage is comprehensive.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all order records' is tautological, restating the tool name without adding specificity. It fails to distinguish from sibling tool 'get_order' (singular) or clarify whether this returns a comprehensive list or filtered subset.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus 'get_order' (singular) or other order-related tools like 'approve_order' or 'cancel_order'. The description omits critical context about pagination despite the presence of cursor and per_page parameters.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, yet the description discloses no behavioral traits such as pagination behavior (despite cursor/per_page parameters), read-only nature, or return value structure. The agent must infer behavior solely from parameter names.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence is brief, but constitutes under-specification rather than efficient conciseness. The sentence restates the tool name without adding actionable information, failing to earn its place effectively.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 3-parameter tool with no annotations or output schema, the description is insufficient. It omits domain context (what constitutes an organization affiliation), pagination details, and does not reference the user_id filtering capability in prose.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, documenting cursor, per_page, and user_id adequately. The description adds no parameter-specific context, meeting the baseline expectation for well-documented schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all organization affiliation records' is tautological, restating the tool name without clarifying what 'organization affiliations' represent. It claims to get 'all' records despite supporting a user_id filter parameter, and fails to distinguish from sibling tool get_affiliations.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like get_affiliations or create_organization_affiliation. No mention of prerequisites or typical workflows.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full responsibility for behavioral disclosure but offers none. It doesn't indicate error handling (e.g., ID not found), authentication requirements, rate limits, or whether this is a safe read operation beyond the implicit 'Get' verb.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While the single sentence is brief, it represents under-specification rather than efficient conciseness. The extreme brevity forces the agent to rely entirely on inference from the tool name and schema rather than providing actionable context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the rich ecosystem of sibling tools (create_program, update_program, delete_program, get_program_edition), the description is inadequate. It lacks domain-specific context (educational programs vs. software) and fails to explain relationships to related program entities.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% coverage with the 'id' parameter already documented. The description adds no parameter-specific context, syntax examples, or domain meaning beyond what the schema provides, warranting the baseline score for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a program' largely restates the tool name (get_program) without distinguishing it from siblings like 'get_programs' (list vs. single) or 'get_program_edition'. It fails to clarify what constitutes a 'program' in this educational/training context.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this single-retrieval tool versus the bulk 'get_programs' alternative, nor prerequisites like needing a valid program ID beforehand. The description offers no usage constraints or workflow positioning.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are present, so the description carries the full burden of behavioral disclosure. It fails to mention pagination behavior, safety characteristics (read-only vs. destructive), or what data structure is returned.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is brief (4 words), but the single sentence fails to earn its place by delivering useful information beyond the tool name. It is appropriately sized but not effectively front-loaded with critical details.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 4 parameters including pagination controls, no output schema, and a complex domain with multiple sibling enrollment-related tools, the description is inadequate. It omits filtering behavior, return value structure, and pagination mechanics.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for all 4 parameters (cursor, per_page, student_id, edition_id). Since the schema already documents the purpose of each parameter, the description meets the baseline requirement despite adding no additional parameter context.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all program enrollments' largely restates the tool name (tautology). While 'all' implies a list operation, it fails to distinguish from the sibling tool 'get_program_enrollment' (singular) or explain what constitutes a program enrollment in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus 'get_program_enrollment' (singular) or 'get_enrollments'. No mention of pagination strategy (cursor-based) or when to apply the optional filters (student_id, edition_id).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to indicate whether this is a read-only operation, what data the 'teacher record' contains, what happens if the ID is not found (error behavior), or whether the operation is idempotent.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (four words), but this brevity constitutes under-specification rather than efficient communication. The single sentence fails to earn its place by not conveying any information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the existence of 'get_teachers' (plural) and the lack of an output schema or annotations, the description should clarify that this returns a single teacher object by ID. It omits this crucial distinction and provides no hint about the return value structure or content.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the teacher to retrieve'), so the schema fully documents the parameter. The description adds no additional semantic context about the parameter, meeting the baseline expectation for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a teacher record' is tautological, essentially restating the tool name 'get_teacher'. While it identifies the verb (Get) and resource (teacher record), it fails to distinguish from the sibling tool 'get_teachers', leaving ambiguity about whether this retrieves a single record by ID or a collection.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus 'get_teachers' (for listing multiple teachers) or other teacher-related operations. There is no mention of prerequisites, such as needing to obtain the teacher ID from another call first.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Get' implies a read-only operation, the description does not confirm the read-only nature, describe the response format, mention caching behavior, or specify what happens when the role ID does not exist (404 vs empty).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (four words) and lacks verbosity, but this brevity reflects under-specification rather than efficient information density. While appropriately sized for a simple get-by-ID operation, it fails to front-load any meaningful context about the resource or return value.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the absence of annotations and output schema, the description should compensate by explaining what constitutes a 'teacher role' or what fields are returned. It fails to do so. While the operation is simple (single ID parameter), the description remains inadequate for an agent to understand the domain concept or expected result.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage (the 'id' parameter is documented as 'ID of the teacher role to retrieve'). Since the schema fully documents the parameter semantics, the baseline score of 3 applies. The description adds no additional context about the parameter format or constraints beyond what the schema provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a teacher role' is a tautology that essentially restates the tool name. While it contains a specific verb (Get) and identifies the resource, it fails to distinguish from the sibling tool 'get_teacher_roles' (plural) or clarify that this retrieves a single specific record by ID versus listing multiple.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives. It does not indicate that this should be used when a specific teacher role ID is known, or contrast it with 'get_teacher_roles' for listing purposes, nor does it mention error handling for invalid IDs.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare idempotentHint=true and destructiveHint=false, the description adds no behavioral context about what happens to unspecified fields (partial vs full replacement), does not mention the beta status of the edition_description_section_contents_attributes parameter (indicated in schema), and omits any note about side effects or publication workflows.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is not wordy, but it is under-specified rather than efficiently informative. It fails to earn its place by merely echoing the tool name without adding distinguishing details, falling short of the 'zero waste' standard.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with nested object parameters and beta-feature flags in the schema, the four-word description is inadequate. It does not address the output behavior, the implications of the is_published flag, or the beta nature of certain parameters, leaving significant gaps despite the rich structured context.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema carries the full burden of parameter documentation. The description adds no syntax details, examples, or elucidation of the nested array structure for edition_description_section_contents_attributes, warranting the baseline score.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a catalog variant' is tautological, restating the tool name with minimal variation. It fails to distinguish this tool from siblings like 'update_catalog_product' or clarify what constitutes a 'variant' versus other catalog entities.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., 'update_catalog_product'), nor does it mention prerequisites like obtaining the variant ID from 'get_catalog_variants' first. It offers no 'when-not-to-use' constraints.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare readOnlyHint=false and idempotentHint=true, the description adds no behavioral context beyond these structured fields. It does not explain partial update semantics (whether unspecified fields are preserved), what the properties object accepts, or any side effects of activation/deactivation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at four words, but this brevity results in under-specification rather than efficient communication. The single sentence fails to earn its place by providing actionable information beyond the tool name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with 5 parameters including a complex nested object (properties) and no output schema, the description is inadequate. It fails to explain the flexible properties schema, required identifiers, or return behavior, leaving critical gaps the agent must infer from parameter names alone.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 80% schema description coverage, the baseline is appropriately met by the schema itself. However, the description adds no value for the undocumented 'properties' parameter (which accepts arbitrary additionalProperties) or the relationship between object_slug and id.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Update a custom record' is tautological, merely restating the tool name without distinguishing it from siblings like create_custom_record or delete_custom_record. It fails to specify what aspects of the record are updatable or the scope of the operation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus create_custom_record, delete_custom_record, or other update operations. There is no mention of prerequisites (e.g., record must exist) or idempotency behavior despite the idempotentHint annotation.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but offers none. It does not clarify whether cancellation is reversible, if it triggers refunds, affects associated enrollments/invoices, or requires specific permissions. For a mutation operation, this omission is significant.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (3 words) and front-loaded. While efficient in word count, it errs toward under-specification rather than optimal information density. Structure is simple and direct.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive operation in a complex domain (evidenced by 100+ sibling tools including invoicing and enrollment management), the description is inadequate. It should explain cancellation semantics, side effects on related records, and output behavior, especially absent annotations or output schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 100% with the parameter already documented as 'ID of the order'. The description adds no additional semantic context about the ID format or where to obtain it, but baseline 3 is appropriate given the schema's completeness.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Cancel an order' is essentially a tautology that restates the tool name. While it identifies the verb (cancel) and resource (order), it fails to distinguish from sibling cancellation tools like cancel_enrollment, cancel_planned_course, or cancel_program_enrollment, leaving ambiguity about which entity type to cancel.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus approve_order, deny_order, or delete operations. Given the complex domain with orders potentially linked to enrollments and invoices, the lack of usage constraints or prerequisites (e.g., 'only cancel pending orders') creates selection risk.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Create' implies a mutation, the description lacks critical details: whether the operation is idempotent, what happens if a role with the same name exists, what fields/ID are returned, or any side effects (e.g., webhook triggers).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise with no filler words or redundant sentences. However, it is under-specified rather than efficiently informative. The structure is a simple sentence fragment, which is acceptable but not exemplary.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the dense sibling tool list containing multiple teacher-related concepts (create_teacher, create_teacher_enrollment, create_teacher_role), the description fails to clarify the conceptual model. For a mutation tool with no output schema and no annotations, the description should explain the entity's purpose and relationship to other teacher entities.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for its single 'name' parameter. The description adds no semantic information about the parameter (e.g., naming conventions, uniqueness constraints), but baseline 3 is appropriate when the schema is fully self-documenting.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Create a teacher role' is a tautology that merely restates the tool name (create_teacher_role). It fails to define what a 'teacher role' represents (e.g., a permission template, job title, or classification) or distinguish it from siblings like create_teacher or create_teacher_enrollment.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives such as create_teacher (which creates the educator entity) or create_teacher_enrollment (which assigns teachers to courses). There are no prerequisites, exclusions, or workflow context provided.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly indicate the operation is destructive and idempotent, the description adds no behavioral context beyond what the name and annotations already provide (e.g., whether deletion is permanent, cascades to student records, or requires specific permissions).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (3 words), which is appropriate for a single-parameter tool, but the content is uninformative (restates the name) rather than being efficiently packed with useful context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple arity (1 parameter), complete schema coverage, and comprehensive annotations covering safety profiles, the description is minimally viable, though it could specify the domain concept (academic grade/score) and deletion permanence.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the grade to delete'), the description appropriately relies on the schema to document the parameter, meeting the baseline expectation when structured documentation is complete.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a grade' is a tautology that restates the function name (delete_grade) without adding specificity about what constitutes a 'grade' in this educational context or distinguishing from siblings like update_grade or create_grade.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus update_grade (for modifying vs removing), prerequisites for deletion, or consequences for related enrollments/records.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The annotations already indicate destructiveHint=true, readOnlyHint=false, and idempotentHint=true. The description adds no behavioral context beyond these annotations, such as warning about irreversibility, explaining cascading effects on related records, or confirming the idempotent nature.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely terse at three words, containing no redundant or filler content. While appropriately sized in terms of lack of verbosity, the extreme brevity comes at the cost of omitting necessary context for a destructive operation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple single-parameter schema with complete annotations covering safety profiles, the description is minimally sufficient. However, for a destructive operation without an output schema, the lack of warning about data loss or impact scope leaves a notable gap.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameter 'id' is fully documented in the schema itself ('ID of the lead to delete'). The description adds no supplementary parameter guidance (e.g., where to find the ID, format constraints), meeting the baseline expectation for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Delete a lead.' is a tautology that restates the tool name (delete_lead) with minimal variation. While it identifies the verb and resource, it fails to specify scope (e.g., permanent vs. soft delete) or distinguish from siblings like update_lead or create_lead.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., when to delete vs. update a lead), nor does it mention prerequisites such as lead status or permissions. It lacks any 'when-not-to-use' warnings.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, leaving the description to carry the full burden of behavioral disclosure. It states 'Delete' but does not specify if the operation is permanent, if it performs cascading deletions of related records, or what authorization is required.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description consists of a single sentence that is appropriately sized and free of redundancy, though it offers minimal information beyond what is already present in the tool name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of closely named siblings (`delete_affiliation`, `create_organization_affiliation`) and the destructive nature of the operation, the description lacks necessary context to distinguish this tool's specific domain and explain the consequences of deletion.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for its single parameter ('id'). The description text does not add parameter-specific semantics, meeting the baseline score for schemas with high description coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic operation ('Delete') and target resource ('organization affiliation record'), but fails to differentiate from the sibling tool `delete_affiliation` or clarify whether this removes the organization entity or merely the affiliation relationship.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus the sibling `delete_affiliation`, nor does it mention prerequisites such as the affiliation needing to exist or the impact on associated users/organizations.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but omits whether denial is permanent, reversible, or triggers side effects like notifications. It does not clarify permissions required or the specific state changes enacted on the order.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description consists of a single three-word sentence with no redundancy or wasted language. While appropriately brief for its content, the extreme brevity contributes to under-specification in other dimensions.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of closely related sibling tools (approve, cancel) and lack of output schema, the description should clarify the specific semantics of 'deny' within the order lifecycle. The current description is insufficient to distinguish this operation from related state transitions or explain business consequences.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for the single `id` parameter, documenting it as 'ID of the order'. Since the schema fully documents the parameter semantics, the description meets the baseline without requiring additional parameter details.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Deny an order' is tautological, merely converting the tool name from snake_case to a sentence fragment without clarifying scope or business logic. It fails to differentiate from semantically similar siblings like `approve_order` and `cancel_order`, leaving ambiguity about what distinguishes denial from cancellation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided regarding when to use this tool versus alternatives such as `cancel_order` or `approve_order`, nor are prerequisites or state conditions mentioned. The description lacks any indication of the order lifecycle stage (e.g., pending vs. active) where denial applies.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds no behavioral context beyond these annotations, such as error handling when the ID doesn't exist, return value structure, or caching behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at four words, but brevity here reflects under-specification rather than efficiency. The single sentence fails to earn its place by providing distinct value beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple single-resource retrieval tool with complete annotations and a single parameter, the description is minimally adequate. However, it misses the opportunity to clarify the singular vs. plural distinction from 'get_categories' or describe the expected record structure returned.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the categorie to retrieve'), the schema fully documents the single parameter. The description adds no semantic meaning to the 'id' parameter, meeting the baseline expectation when schema coverage is high.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a category record' is essentially a tautology of the tool name 'get_category'. While it confirms the action and resource, it fails to distinguish this singular retrieval-by-ID tool from its sibling 'get_categories' (plural list endpoint), leaving ambiguity about when to use each.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like 'get_categories' for bulk retrieval, or prerequisites such as needing a valid category ID. The description offers no selection criteria or contextual hints for the agent.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true and idempotentHint=true, establishing the safety profile. The description adds no behavioral context beyond this—no information about what data is returned, relationships to courses, or caching implications. It merely repeats the implied read-only nature of 'Get'.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief at four words. While not wasteful, it borders on under-specification rather than elegant conciseness. No structural issues, but the brevity limits informational value.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple CRUD retrieval operation with comprehensive schema annotations and a single parameter, the description is minimally adequate. However, it lacks differentiation from the plural variant and domain context about course variants, leaving gaps in contextual completeness.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the course variant to retrieve'), the schema fully documents the single parameter. The description contributes no additional parameter semantics, meeting the baseline expectation when schema coverage is complete.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a course variant record' largely restates the tool name (get_course_variant) with minimal expansion. While it confirms the action is a retrieval, it fails to distinguish from the plural sibling get_course_variants or explain what constitutes a 'course variant' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this single-record retrieval versus the batch get_course_variants, nor any mention of prerequisites like needing a valid course variant ID. The description offers no decision-making criteria for the agent.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The annotations already declare readOnlyHint, idempotentHint, and destructiveHint, establishing this is a safe read operation. The description adds no behavioral context beyond this, failing to disclose that results are paginated, that the default page size is 25, or any rate limiting considerations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence with no redundant words. However, it may be overly terse given the complexity of pagination and sibling tool differentiation that should have been addressed.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the existence of a singular variant endpoint and the paginated nature of this endpoint, the description is incomplete. It should clarify that this returns a paginated collection (not truly 'all' at once) and differentiate from 'get_course_variant'. The lack of output schema increases the burden on the description to explain return behavior.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (both cursor and per_page are fully documented), the baseline score applies. The description mentions no parameters, but the schema adequately documents them without needing additional textual support.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action (Get) and resource (course variant records), but uses 'all' which is misleading given the pagination parameters (cursor, per_page). It also fails to distinguish from the singular sibling tool 'get_course_variant', leaving ambiguity about when to use the list vs. single-record endpoint.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives. Specifically, there is no mention of the sibling 'get_course_variant' (singular) or when pagination is necessary versus fetching a specific record by ID.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but fails to deliver. It does not specify what data the meeting record contains, whether the operation is idempotent, what happens when the ID doesn't exist, or any authentication/authorization requirements. The verb 'Get' implies read-only, but this is weak inference without explicit confirmation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief at four words and is front-loaded with the action verb. However, the grammatical error ('an meeting' instead of 'a meeting') and extreme brevity that sacrifices necessary context prevent a perfect score.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite the tool's simplicity (single integer parameter), the description is incomplete. With no output schema and no annotations, it should at least indicate what fields or data structure the meeting record contains, or behavior on missing IDs. As written, it provides only marginally more information than the tool name itself.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the meeting to retrieve'), so the baseline score applies. The description itself adds no semantic information about the 'id' parameter beyond what the schema already provides, but the schema is sufficiently self-documenting.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get an meeting record' is essentially a tautology that restates the tool name with minimal expansion. While it contains a verb (Get) and resource (meeting record), it fails to distinguish from sibling get_meetings_by_planned_course_id or explain what constitutes a 'meeting record'. The grammatical error ('an meeting' instead of 'a meeting') further reduces clarity.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like get_meetings_by_planned_course_id. There is no mention of prerequisites (e.g., needing the meeting ID first), error conditions (e.g., ID not found), or selection criteria for this single-record retrieval versus batch retrieval.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While the annotations correctly identify this as read-only and idempotent, the description adds no behavioral context beyond what the annotations provide. It does not disclose error handling (e.g., what happens if the ID is not found), return format, or whether the operation is atomic.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (6 words) and consists of a single sentence. While this avoids verbosity, it is arguably under-specified rather than elegantly concise. However, the sentence is front-loaded and contains no wasted words or filler.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the existence of the sibling tool get_options_of_custom_field, the description is incomplete because it fails to clarify that this retrieves a specific option by ID versus listing all options. For a 3-parameter read operation with annotations, the description should have addressed this distinction to prevent incorrect tool selection.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline is 3. The description itself adds no parameter-specific context (e.g., clarifying that object_type and field_slug identify the parent custom field, or explaining the relationship between the parameters). It neither improves upon nor contradicts the schema's generic 'ID of the parent resource' labels.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get an option of a custom field' is essentially a tautology that restates the tool name with spaces added. It fails to specify what constitutes an 'option' in this domain (e.g., a dropdown value) and critically omits that this retrieves a single option by ID, distinguishing it from the sibling tool get_options_of_custom_field.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus the plural alternative get_options_of_custom_field, nor are prerequisites (like knowing the field_slug and object_type) mentioned. The description offers no selection criteria or workflow context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but provides none. It does not describe error behavior (e.g., what happens if the ID is not found), authentication requirements, or the structure/content of the returned payment data.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at four words with no filler or redundancy. While efficient, its brevity contributes to under-specification. However, there is no structural waste or misplaced information.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple retrieval tool with a single well-documented parameter, the description is minimally functional but incomplete. It lacks information about the return value (critical given no output schema exists), error states, or relationships to the broader payment/invoice workflow evident in sibling tools.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for the 'id' parameter ('ID of the payment to retrieve'), so the baseline score applies. The description does not add semantic context (such as ID format examples or where to obtain the ID), but it does not need to compensate for schema deficiencies.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get one payment record' is tautological, essentially restating the tool name 'get_payment' with the quantity 'one' added. While it confirms the singular nature of the retrieval, it fails to distinguish from siblings like 'get_invoice_payments_by_invoice_id' or clarify what constitutes a 'payment record' in this system.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No usage guidelines are provided. The description does not indicate when to use this tool versus related payment retrieval tools (e.g., 'get_invoice_payments_by_invoice_id'), nor does it mention prerequisites such as needing a valid payment ID from prior operations.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The annotations already establish this is a write operation (readOnlyHint: false) and non-idempotent (idempotentHint: false). The description adds no behavioral context beyond these annotations— it doesn't explain that multiple calls create duplicate invoices, doesn't describe the invoice lifecycle (draft vs. final), and doesn't clarify the nested invoice_items structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (3 words), which prevents verbosity, but it is too minimal to be genuinely useful. The single sentence structure is efficient, yet it wastes the opportunity to provide substantive guidance that an AI agent would need.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 4 parameters including a complex nested array structure (invoice_items_attributes) and no output schema, the description is inadequate. It fails to mention the account relationship, line item requirements, or the existence of the nested invoice items structure, leaving critical gaps in understanding.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 75% schema description coverage, the input schema adequately documents most fields (account_id, feature, footnote, and nested item fields). The description adds no additional parameter context, but the baseline score of 3 is appropriate since the schema carries the semantic burden.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic verb and resource ('Create an invoice'), but fails to distinguish from siblings like create_invoice_payment_by_invoice_id or create_invoice_vat. It provides minimal information beyond the tool name itself, falling into vagueness regarding scope and specific functionality.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    There is no guidance on when to use this tool versus alternatives (e.g., when to use create_invoice_payment_by_invoice_id versus creating an invoice first). No prerequisites are mentioned, such as requiring a valid account_id or the relationship to catalog variants for line items.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations cover safety profile (readOnly/idempotent), but description fails to disclose pagination behavior implied by cursor/per_page parameters or explain what data structure constitutes an 'association'.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence is efficiently structured without redundancy, but brevity crosses into under-specification given the tool's pagination complexity and abstract domain concepts.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Inadequate for a paginated read tool with no output schema; omits explanation of return values, association types, and the relationship between 'system object' (description) and 'parent resource' (schema).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 100% with clear descriptions for each parameter; the tool description adds no supplemental parameter guidance beyond what the schema already provides, meeting the baseline.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the core action (Get associations) but uses vague jargon ('system object') without defining the domain concept or distinguishing from specific sibling getters like get_account or get_course.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this generic association tool versus specific resource endpoints (e.g., get_program_edition), nor prerequisites for the object_type parameter.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds no behavioral context beyond these annotations—no information on caching, rate limits, or what constitutes a 'record' (fields returned, depth of data).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely brief at four words, which prevents verbosity, but crosses into under-specification. The single sentence states the obvious without earning its place by adding distinctive value.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite low complexity and good structured metadata, the description fails to address the critical distinction between singular retrieval (this tool) and list retrieval (get_courses). For a tool with 100% schema coverage and safety annotations, this omission makes it incomplete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100% (parameter 'id' is documented as 'ID of the course to retrieve'), so the description is not required to compensate. However, the description itself mentions no parameters, earning only the baseline score.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the basic action (Get) and resource (course record) but is vague regarding scope compared to sibling 'get_courses'. It implies singular retrieval but doesn't clarify whether this fetches by ID or returns the first course found, leaving ambiguity against the plural variant.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus 'get_courses' or other course-related siblings. No mention of prerequisites (e.g., needing a valid course ID) or error conditions (e.g., course not found).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly indicate this is read-only and idempotent, the description adds no behavioral context beyond what the name implies. It fails to disclose that results are paginated (requiring cursor management), what the default page size is, or how the 'published' filter affects the result set.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (four words), which avoids verbosity, but it is under-specified for a tool with pagination and filtering capabilities. The single sentence does not earn its place effectively because it lacks critical operational context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of pagination parameters (cursor, per_page) and a filter (published), the description is incomplete. It should explain the pagination pattern and filtering behavior. With no output schema provided, the description misses the opportunity to clarify the return structure (list of records).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for all three parameters (cursor, per_page, published). The description adds no additional semantic information about these parameters, meeting the baseline expectation when the schema is self-documenting.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('course records'), but it is essentially a tautology of the tool name 'get_courses'. It fails to distinguish from the sibling tool 'get_course' (singular) or clarify whether this returns a list versus a single record.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus 'get_course' or other course-related tools. There is no mention of pagination strategy (cursor parameter) or when to apply the 'published' filter versus retrieving all courses.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations declare the operation as read-only and idempotent, which the description doesn't contradict, but the description adds no behavioral context beyond this. It fails to disclose that results are paginated (despite claiming 'all'), doesn't describe the return structure (no output schema exists), and doesn't explain whether tabs include inactive/disabled ones.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description is not verbose, but it is underspecified rather than efficiently concise. Given the complexity of pagination and the ambiguous scope (global vs. course-specific), additional sentences are needed to earn full marks for appropriate sizing.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Inadequate for a paginated list operation with no output schema. The description should explain the pagination pattern (cursor-based), the scope of 'all' (global list), and ideally hint at the return structure. As written, it leaves critical behavioral and contextual gaps unfilled.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage for the two pagination parameters, the baseline is met. However, the description adds no semantic value regarding why these specific pagination controls exist or why there's no course filtering parameter, which would be expected given the tool name and sibling patterns.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action (Get) and resource (course tab records), but 'all' is misleading given the pagination parameters (cursor, per_page), and it fails to clarify what distinguishes a 'tab' from other course-related resources (courses, variants, locations) or explain why no course_id filter is required despite the tool name implying course-scoping.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus siblings like get_course, get_courses, or get_course_location. Given the unusual absence of a course identifier parameter compared to other course-related getters (e.g., get_planned_courses_by_course_id), the description should explicitly clarify this retrieves tabs across all courses.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds no behavioral context about pagination defaults (25), cursor mechanics, or the parent-child relationship implied by object_slug. The claim 'all' contradicts the paginated nature of the endpoint.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely brief (3 words) and front-loaded, but the word 'all' wastes the limited space by inaccurately suggesting non-paginated, global retrieval rather than scoped, paginated listing.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Inadequate for a 3-parameter list endpoint with required parent filtering. Fails to explain the custom record concept, the mandatory object_slug relationship, or the paginated response structure despite lacking an output schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has 100% description coverage documenting object_slug as parent resource ID and pagination controls. Description mentions no parameters, but with full schema coverage, baseline 3 is appropriate.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States basic verb ('Get') and resource ('custom records'), but 'all' is misleading given pagination parameters (cursor, per_page) exist. Fails to mention the required parent resource scoping via object_slug. Distinguishes poorly from sibling get_custom_record (singular).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this versus the singular get_custom_record, create_custom_record, or update_custom_record. Does not mention that object_slug is required to scope the query to a specific parent object.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare the operation as read-only and idempotent, the description adds no behavioral context beyond this. It fails to disclose that results are paginated (contradicting the 'all' claim) or explain the data structure of VAT records.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is brief (four words), but 'all' creates ambiguity given the pagination support. It is efficiently structured but under-specified regarding the actual behavior of the endpoint.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a list endpoint with pagination support, the description inadequately explains the pagination mechanism or the scope of returned data. The misleading 'all' claim combined with no mention of output structure leaves significant gaps despite the simple parameter schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage for both parameters (cursor and per_page), the schema carries the semantic burden. The description mentions no parameters, which is acceptable given the baseline for high-coverage schemas, though it misses the opportunity to explain the pagination workflow.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('invoice vat records'), distinguishing it from the sibling 'create_invoice_vat'. However, the claim 'all' is misleading given the pagination parameters (cursor, per_page) exist, and it fails to clarify what constitutes a 'VAT record' or its relationship to invoices.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like 'get_invoice' or 'create_invoice_vat'. There is no mention of pagination patterns or prerequisites for using the cursor parameter.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While the description doesn't contradict the annotations (readOnlyHint=true aligns with 'Get'), it adds no behavioral context beyond what annotations already provide. It fails to disclose error handling (e.g., invalid ID), return value structure, or cache behavior, despite the absence of an output schema.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (three words), which could be efficient, but the grammatical error ('an meeting') and lack of specificity mean it doesn't earn its place effectively. It is front-loaded but undersized for the tool's context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the absence of an output schema and the presence of similarly-named sibling tools, the description should clarify that this retrieves a single specific record by ID. It fails to address these gaps, leaving critical contextual information missing.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the meeting location to retrieve'), establishing a baseline score. The description itself mentions no parameters, but does not need to compensate given the complete schema documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description provides a basic verb ('Get') and resource ('meeting location'), but remains vague about scope (singular vs. plural retrieval). It fails to distinguish from sibling 'get_meeting_locations' and contains a grammatical error ('an meeting' instead of 'a meeting'), detracting from clarity.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided regarding when to use this tool versus alternatives like 'get_meeting_locations' (which lists all locations) or 'create_meeting_location'. There are no prerequisites, exclusions, or workflow hints mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but offers none. It omits critical context: whether 'all' implies unbounded retrieval or paginated results, default sorting, permission filtering, or response structure (array vs. wrapped object).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At four words, the description is technically concise, but it is under-structured and front-loaded with minimal value. It achieves brevity by sacrificing necessary context (pagination notes, filtering limitations) that would help an agent invoke the tool correctly.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a collection-fetching tool with pagination parameters, the description is incomplete. It fails to mention the pagination mechanism, the relationship to singular retrieval, or any filtering constraints. Given the lack of output schema, additional descriptive context was required but absent.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema provides 100% description coverage for both parameters (cursor and per_page), establishing adequate baseline documentation. The description adds no supplementary semantic context—such as cursor format or pagination strategy—but the schema sufficiency prevents a lower score.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('planning event records'), but remains vague regarding scope—'all' is ambiguous (global vs. filtered) and it fails to distinguish from the sibling tool 'get_planning_event' (singular), leaving the agent to infer the difference from naming conventions alone.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this list endpoint versus the singular 'get_planning_event', nor does it explain pagination workflow despite the presence of cursor/per_page parameters. The agent receives no signal about query patterns or result set expectations.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but offers no information about side effects, return format, pagination behavior beyond the schema, or what criteria determine teacher availability. The description implies a read-only list operation but does not confirm this or explain the data freshness.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description consists of a single short sentence that is not verbose, but it is under-specified rather than meaningfully concise. Every word should earn its place, but 'available' is left undefined and 'planning' is unmentioned, wasting the opportunity to clarify the tool's specific purpose.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the existence of the sibling `get_teachers` tool and the domain-specific 'planning' terminology in the name, the description fails to provide necessary context about the planning domain or the availability criteria. Without an output schema, the description should have explained what distinguishes these teachers, but this critical contextual information is missing.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema provides complete descriptions for both parameters (`cursor` and `per_page`), documenting pagination behavior adequately. Since schema description coverage is 100%, the description does not need to add parameter semantics, meeting the baseline expectation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the tool retrieves 'all teachers that are available' but fails to define what 'available' means (available for teaching? not busy? active status?) or distinguish this from the sibling `get_teachers` tool. The 'planning' context implied by the tool name is unexplained, leaving the scope ambiguous.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    There is no guidance on when to use this tool versus the generic `get_teachers` or other teacher-related tools, nor any mention of prerequisites or filtering capabilities. The description does not clarify what makes a teacher 'available' in this context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to explain pagination behavior, the effect of combining filters, or what constitutes a 'teacher record'. No mention of rate limits, permissions, or response format.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely terse (4 words), which avoids verbosity but underserves the tool's complexity. The front-loaded claim of 'all' records contradicts the filtering capabilities documented in the schema, creating potential confusion rather than clarity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a list endpoint with 5 parameters supporting pagination and multiple filter types, the description is insufficient. Without an output schema or annotations, the description should explain the return structure, pagination mechanics, or filtering behavior—none of which are present.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameters are well-documented in the schema itself. The description adds no specific parameter guidance (e.g., explaining that 'search' performs text filtering or that 'cursor' enables pagination), meriting the baseline score for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic verb and resource ('Get' + 'teacher records'), but the term 'all' is misleading since the tool supports filtering parameters (id, label_id, search). It also fails to distinguish from sibling 'get_teacher' (singular) or other teacher-related list tools like 'get_planning_teachers'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this bulk endpoint versus 'get_teacher' (singular) or other teacher-related tools. The description does not mention pagination requirements despite having cursor/per_page parameters, nor when filtering is preferable to fetching 'all'.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries full burden of behavioral disclosure. It fails to clarify that results are paginated despite claiming 'all', does not indicate if the operation is read-only/safe, and omits any mention of rate limits, authentication requirements, or return value structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (4 words), the description is under-specified rather than efficiently concise. It fails to leverage available space to clarify the pagination behavior or safety characteristics, wasting the opportunity to add value beyond the tool name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite low parameter complexity, the absence of an output schema and annotations creates an information gap. The description does not compensate by describing the return format, pagination mechanics, or the scope of 'thesis records', leaving the agent under-informed for invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema adequately documents the 'cursor' and 'per_page' parameters. The description adds no parameter-specific context, but the high schema coverage meets the baseline expectation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the basic verb ('Get') and resource ('thesis records'), but uses vague quantifier 'all' which conflicts with the pagination parameters (cursor, per_page) present in the schema. Does not differentiate from other list tools (get_courses, get_users, etc.) or clarify what constitutes a thesis record.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus other data retrieval tools in the sibling set. No mention of pagination workflow (how to use cursor), default behavior, or prerequisites.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description fails to disclose that results are paginated (despite cursor/per_page parameters), whether the operation is read-only, rate limits, or whether 'all' respects the filter parameters. It omits critical behavioral context that annotations would typically cover.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (4 words) and front-loaded, but undersized for the tool's complexity. While not verbose, it sacrifices necessary information about pagination and filtering capabilities.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the rich input schema with 6 parameters supporting pagination, filtering, and sorting—and no output schema or annotations—the description is inadequate. It fails to mention pagination behavior, filtering capabilities, or the relationship to the singular 'get_user' endpoint.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, so parameters are fully documented in the schema itself. The description adds no additional semantic context about parameters (e.g., that cursor/per_page enable pagination, or that label_id accepts multiple values), meriting the baseline score.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic function (retrieving user records) using the generic verb 'get', but fails to specify that this returns a paginated list versus the sibling 'get_user' which likely returns a single user. The word 'all' implies unfiltered retrieval, which contradicts the available filter parameters (role, email, label_id).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus the singular 'get_user', or when to apply specific filters. No mention of pagination requirements for large datasets or authentication prerequisites.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Update' implies mutation, the description lacks critical details: whether the operation is idempotent, what happens if the ID doesn't exist, side effects on related records, or required permissions for modifying affiliation status.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (5 words), which prevents redundancy, but it is too minimal to be genuinely useful. The single sentence does not earn its place effectively because it fails to convey any information beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool affecting organizational relationships, the description is insufficient. It lacks explanation of the 'key_contact' concept, return values (though no output schema exists), error handling, and the business logic implications of updating an affiliation versus creating one.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage, documenting the 'id' and 'key_contact' parameters. The main description adds no semantic information beyond what the schema already provides, warranting the baseline score for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the action ('Update') and resource ('organization affiliation record'), satisfying basic clarity. However, it is essentially a restatement of the tool name and fails to distinguish from the sibling tool 'update_affiliation' or explain what constitutes an 'organization affiliation' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus 'create_organization_affiliation', 'delete_organization_affiliation', or the sibling 'update_affiliation'. There are no stated prerequisites, conditions, or exclusion criteria.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description must carry the full burden of behavioral disclosure. While 'Cancel' implies a state change, the description fails to specify whether this is reversible, what happens to existing enrollments/attendees, whether refunds are triggered, or required permissions/authorization levels.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (4 words) and front-loaded. While no words are wasted, the brevity is excessive given the lack of annotations and behavioral context, preventing a score of 5.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a cancellation operation (potentially destructive with business impact), the description is inadequate. It lacks explanation of side effects, output behavior (no output schema exists), error conditions (e.g., cannot cancel past courses), or relationships to other planning entities that might be affected.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (the 'id' parameter is fully documented in the schema), the baseline score is 3. The description adds no additional context about the parameter (e.g., where to obtain the ID, format examples), but the schema adequately covers the semantics.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action (Cancel) and resource (planned course), but borders on tautology by nearly restating the tool name. It does not clarify what constitutes a 'planned course' in this domain or how it differs from similar cancellation siblings like cancel_enrollment or cancel_program_enrollment beyond the resource name.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives such as update_planned_course (for modifications) or delete_planning_event (for removal). No mention of prerequisites, side effects, or business rules (e.g., can only cancel future courses).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. It only indicates this is a write operation ('Create') but fails to disclose side effects, error conditions (e.g., if the affiliation already exists), or whether the operation is reversible via 'delete_organization_affiliation'.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise (5 words) with no redundant text. However, given the presence of similarly-named sibling tools and lack of annotations, this brevity sacrifices necessary context, making it slightly too terse for optimal utility.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool creating a relationship between two entities (user and organization), the description should explain the business logic (linking users to orgs) and distinguish from 'create_affiliation'. With no output schema and no annotations, this minimal description leaves significant gaps in understanding.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for all three parameters. The description adds no additional semantic context, but given the schema documents 'key_contact' as optional and the IDs clearly, it meets the baseline expectation without adding extra value.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the verb ('Create') and resource ('organization affiliation record'), but it essentially restates the tool name without clarifying how this differs from the sibling tool 'create_affiliation'. It lacks specificity about what an organization affiliation represents (e.g., linking a user to an organization).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like 'create_affiliation' or 'update_organization_affiliation'. No mention of prerequisites (e.g., whether the user and organization must exist first) or idempotency concerns.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It offers only the word 'Delete' which implies destruction but provides no details on irreversibility, side effects (e.g., invoice balance recalculation), or required permissions. This is insufficient for a destructive financial operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely terse at three words. While no words are wasted, the description is under-specified rather than elegantly concise. It is front-loaded with the action verb, but lacks the necessary qualifying information to be useful.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Incomplete for a financial deletion tool. It omits the invoice context (critical given the required 'invoice_id' parameter), provides no output guidance (though none is required without an output schema), and fails to address behavioral implications given the lack of annotations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, with 'invoice_id' and 'id' well-described in the schema itself. The description adds no additional semantic context (e.g., explaining that 'id' refers to the payment ID within the context of the specified invoice). Baseline 3 is appropriate when schema documentation is complete.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the basic action ('Delete') and resource ('payment') but fails to specify the invoice relationship implied by the tool name and required 'invoice_id' parameter. Sibling tools like 'create_invoice_payment_by_invoice_id' suggest payments are invoice-specific, yet the description doesn't clarify this is for deleting payments associated with invoices versus general payments.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives (e.g., 'get_invoice_payments_by_invoice_id' to verify existence first), no prerequisites (e.g., invoice must exist), and no warnings about when deletion is appropriate (e.g., regarding reconciliation or refund workflows).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare destructiveHint=true and idempotentHint=true. The description adds no behavioral context beyond this, such as whether the deletion is soft/hard, if it cascades to related records, or what recovery options exist. It merely restates the operation type already implied by annotations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (three words) and front-loaded, containing no filler. However, this extreme brevity constitutes under-specification rather than efficient conciseness, as the single sentence fails to deliver additional value beyond the tool name.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple input schema (single ID parameter with full documentation) and comprehensive annotations covering safety, the description meets minimum viability. However, for a destructive operation, the absence of any mention of side effects or data recovery keeps it from being fully complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the material to delete'), the baseline score applies. The description provides no additional parameter semantics, examples, or validation rules beyond what the schema already documents.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the core action ('Delete') and resource ('material'), but is essentially a tautology of the tool name. It does not differentiate from sibling deletion tools like 'delete_material_group' or explain the specific scope of 'material' in this context.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives (e.g., 'update_material' to archive instead of delete), nor are prerequisites mentioned (such as whether the material must be unused). The description lacks explicit when/when-not direction.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly flag destructiveness and idempotency, the description adds no behavioral context such as cascade effects (what happens to scheduled meetings at this location?), reversibility, or required permissions. For a destructive operation, this lack of side-effect disclosure is a gap.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence with no filler words. However, for a destructive operation with potential entity confusion, this brevity may be excessive rather than optimal.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the destructive nature, the existence of similar sibling tools (delete_course_location), and lack of output schema, the description fails to provide sufficient context. It does not clarify the domain model relationship between meeting locations and course locations, nor explain operational constraints.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for the 'id' parameter. The description itself adds no additional semantic information about the parameter, but with full schema coverage, the baseline score of 3 is appropriate.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the action (delete) and resource (course location), but uses 'course location' while the tool name uses 'meeting_location'. Given the sibling tool 'delete_course_location' exists, this terminology inconsistency creates ambiguity about which entity this tool actually removes.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus the sibling 'delete_course_location', nor any prerequisites (e.g., whether the location must be unused). The description stands alone without contextual usage signals.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While the description does not contradict the annotations (readOnlyHint, idempotentHint, destructiveHint), it adds zero behavioral context beyond them. No information about return values, error conditions (e.g., 404 if ID not found), or side effects is provided.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely concise at four words with no filler. While appropriately front-loaded, the brevity borders on under-specification given the available sibling tools.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite being a simple read operation with good annotations, the description fails to leverage the available complexity budget. It omits output structure, error scenarios, and differentiation from the plural 'get_accounts', leaving significant gaps in context.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the account to retrieve'), the baseline is met. The description itself does not mention parameters, but the schema fully compensates.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get a single account' restates the tool name with the modifier 'single', which distinguishes it from the sibling 'get_accounts'. However, it remains tautological and minimal, barely clearing the threshold of specificity.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this single-record fetch versus the bulk 'get_accounts', nor are prerequisites (like needing a valid ID) mentioned. The agent receives no decision-making criteria.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The annotations declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering the safety profile. The description adds no behavioral context beyond these annotations—no information on what the 'record' contains, caching behavior, or return structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely brief at four words with no filler. While efficient, it is under-specified rather than optimally concise—front-loading is impossible with so little content, but there is no structural waste.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Without an output schema, the description should indicate what data is returned or what a 'catalog product' represents. It also omits the critical distinction from 'get_catalog_products'. Given the rich sibling tool ecosystem, this description is insufficient for proper tool selection.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the catalog product to retrieve'), so the schema carries the full burden. The description mentions no parameters, but baseline 3 is appropriate given complete schema documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('catalog product record'), but is vague regarding scope. It fails to distinguish from the sibling tool 'get_catalog_products' (plural) or clarify that this retrieves a single specific record by ID versus a list.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus 'get_catalog_products' or other retrieval tools. No mention of prerequisites (e.g., needing a valid product ID from a previous search) or error conditions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations cover the safety profile (read-only, non-destructive, idempotent), but the description adds no behavioral context about pagination mechanics, default page sizes, or what constitutes a 'category record' in this domain. It misses the opportunity to clarify that 'all' means 'list endpoint with pagination' rather than a complete dump.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While the single sentence wastes no words and is front-loaded with the verb, it is arguably too concise for a tool with four parameters including pagination and filtering. It meets brevity standards but fails the 'appropriately sized' criterion for this complexity level.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the rich input schema (100% coverage) and complete annotations, the description provides minimum viable context by identifying the resource type. However, gaps remain regarding pagination behavior and differentiation from singular-fetch siblings, leaving agents to infer these from schema inspection alone.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema fully documents all four parameters (cursor, per_page, published, sort). The description adds no parameter-specific guidance, meeting the baseline expectation for well-schematized tools.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the basic verb and resource ('Get all category records'), and the word 'all' distinguishes it from sibling 'get_category' (singular). However, it fails to indicate that results are paginated (not truly 'all' at once) or that filtering/sorting capabilities exist, which could mislead about the tool's scope.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this versus 'get_category' for fetching single records, or when pagination parameters are required. No mention of alternative tools or prerequisites for effective usage.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly declare the operation as read-only and idempotent, the description adds no behavioral context beyond these hints. It fails to disclose what happens when the ID is not found (404 vs null), response format, or whether the lead data includes related entities.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At four words, the description is certainly concise, but it approaches tautology (restating the function name 'get_lead' as 'Get one lead record'). While no sentences are wasted, the extreme brevity constitutes underspecification rather than efficient information density.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple single-resource lookup with good annotations and complete schema coverage, the description is minimally adequate. However, given the lack of output schema, it should ideally characterize the returned lead object or mention error handling to be fully complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the lead to retrieve'), so the description is not required to compensate. The description mentions no parameters, meeting the baseline expectation when the schema fully documents the single required field.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('lead record'), and uses 'one' to implicitly distinguish from the sibling 'get_leads'. However, it lacks domain context about what constitutes a 'lead' and does not explicitly differentiate from other lead-related operations like update_lead or delete_lead beyond the verb used.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this single-record lookup versus the bulk 'get_leads' alternative, nor are prerequisites (like ID availability) or error scenarios (e.g., invalid ID) mentioned. The agent must infer usage solely from the singular parameter schema.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare read-only, idempotent, and non-destructive properties. The description adds no behavioral context about pagination mechanics, filtering behavior, or return structure. It does not clarify that results are paginated despite claiming 'all' records.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely concise at four words. Front-loaded with verb 'Get'. While efficient, the brevity sacrifices necessary context about pagination and filtering capabilities.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Inadequate for a paginated list tool with filtering options. Missing clarification on pagination workflow, email filtering behavior, and differentiation from singular retrieval tools. No output schema exists to compensate.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema has 100% description coverage for all three parameters (cursor, per_page, email). The description mentions none of them, but meets the baseline score since the schema fully documents the semantics.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the basic action (get) and resource (lead records) but fails to distinguish from sibling 'get_lead' (singular). The word 'all' is slightly misleading given the pagination parameters (cursor, per_page) suggest batch retrieval rather than a single complete dump.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this versus the singular 'get_lead', or when to apply the email filter versus retrieving unfiltered lists. No mention of pagination workflow.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The annotations already establish readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering the safety profile. The description adds no behavioral context beyond this, failing to disclose pagination behavior, rate limiting, or whether 'all' implies unfiltered access (the schema suggests no filter parameters exist). No contradiction with annotations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief at three words, avoiding unnecessary verbosity. However, given the tool's complexity (paginated list endpoint among 100+ siblings), this brevity constitutes under-specification rather than efficient communication—every word earns its place, but insufficient words are present.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a list endpoint with pagination parameters (cursor, per_page) and no output schema, the description omits critical context: it does not explain the pagination mechanism, the maximum page size, or what fields are returned. Given the presence of related material operations (create_material, delete_material, etc.), the lack of cross-reference guidance is a significant gap.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('Cursor for fetching the next page of results', 'Number of results per page'), the baseline score applies. The description adds no supplementary parameter guidance, examples, or validation rules beyond what the schema already provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all material records' is essentially a tautology of the tool name 'get_materials', merely restating it with articles. While 'all' hints at plurality distinguishing it from the sibling 'get_material' (singular), it fails to specify that this is a paginated list operation rather than a bulk fetch, and omits any domain-specific context about what 'materials' represent in this system.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus siblings like 'get_material' (single fetch), 'get_material_group', or 'get_material_groups'. The description does not mention pagination requirements, default limits (per_page defaults to 25 per schema), or cursor management strategies necessary for successful usage.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Get' implies a read operation, the description doesn't confirm safety, idempotency, or what happens when the organization ID doesn't exist.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely brief at only four words with no filler, but borders on under-specification. The single sentence does earn its place, though additional sentences would be warranted for a complete tool definition.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple single-parameter lookup with no output schema and no annotations, the description is insufficient. It lacks context on what data is returned, what an 'organization' represents in this system, or how it relates to 'accounts' or 'users'.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the organization to retrieve'), establishing baseline documentation. The description adds no additional parameter context, but the schema sufficiently covers the single 'id' parameter.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States a clear verb ('Get') and resource ('organization record'), but is extremely minimal and fails to distinguish from sibling 'get_organization_affiliations' or clarify what constitutes an 'organization' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives like 'get_organization_affiliations' or 'get_accounts', nor does it mention prerequisites or error conditions (e.g., invalid ID).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Get' implies a read-only operation, the description lacks details about error handling (e.g., 404 behavior), return format, or whether the operation is idempotent.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief (4 words), but contains a grammatical error ('an planning') and the word 'record' is slightly redundant given the 'get' prefix. The brevity is acceptable, but the error and lack of front-loaded distinction from the plural variant reduce the score.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple CRUD retrieval tool with one parameter and no output schema, the description is minimally adequate. However, it misses the opportunity to clarify that this retrieves a single entity by ID (unlike the plural variant) or to hint at the return structure.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the planning event to retrieve'), so the schema fully documents the parameter. The description adds no additional parameter semantics, but with complete schema coverage, the baseline score of 3 is appropriate.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic verb ('Get') and resource ('planning event record'), but it does not distinguish this tool from the sibling 'get_planning_events' (plural). It also contains a grammatical error ('an planning' instead of 'a planning') and borders on tautology by nearly restating the tool name.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this single-record retrieval tool versus the 'get_planning_events' list endpoint. There is no mention of prerequisites (e.g., needing to know the specific ID) or error conditions (e.g., invalid ID).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, yet the description discloses no behavioral traits: it doesn't explain pagination behavior (despite cursor/per_page params), what constitutes 'available', auth requirements, or return value structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence format is appropriately sized and front-loaded, though 'that are available' adds vagueness rather than value. No structural waste.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    With numerous location-related siblings and no output schema, the description should explain what planning locations are and how they differ from course/meeting locations. Currently insufficient for correct tool selection.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema has 100% description coverage for both parameters (cursor and per_page). The description adds no additional semantics beyond the schema, meeting the baseline for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States 'Get' and 'locations' but 'available' is vague and undefined. With siblings like get_course_locations and get_meeting_locations, the description fails to clarify what distinguishes a 'planning location' or what 'available' means in this context.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus siblings like get_meeting_locations or get_course_locations, nor does it mention prerequisites or filtering capabilities.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. It only states 'Get' implying a read operation, but provides no details on error handling (e.g., invalid ID), required permissions, or what data is returned.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (5 words, single sentence). While it doesn't waste words, it is arguably under-specified rather than optimally concise. However, every word does contribute to stating the basic purpose.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a retrieval tool with no output schema and no annotations, the description should do more to clarify the scope (single record vs list) and differentiate from similar siblings. It leaves significant gaps in contextual understanding.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameter 'id' is fully documented in the schema itself. The description adds no additional parameter semantics, which is acceptable given the high schema coverage establishes a baseline of 3.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('program enrollment record'), but it essentially restates the tool name. It fails to distinguish from the plural sibling 'get_program_enrollments' or clarify the difference between 'program_enrollment' and 'enrollment' (siblings 'get_enrollment'/'get_enrollments' exist).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this single-record lookup versus the plural 'get_program_enrollments' for listing, or versus 'get_enrollment'. No prerequisites or conditions are mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to indicate that this is a read-only operation, does not describe the pagination behavior implied by the cursor parameter, and does not mention any rate limits or default sorting of results.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief at four words. While it contains no wasted text, it is arguably under-specified rather than efficiently concise. However, it avoids the verbosity of unnecessary fluff and front-loads the key verb.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of closely named siblings (get_teacher_role singular) and pagination parameters, the description is incomplete. It fails to indicate that results are paginated, does not specify what constitutes a 'teacher role', and lacks any indication of the return structure (list vs single object).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for both parameters (cursor and per_page), establishing a baseline score of 3. The description 'Get all teacher roles' adds no additional semantic context about the parameters (e.g., when to use the cursor, typical page sizes), but the schema adequately documents them.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Get all teacher roles' states the basic action and resource but offers minimal specificity. While the word 'all' hints at a collection/list operation (distinguishing from the singular 'get_teacher_role' sibling), it essentially restates the tool name without clarifying what 'teacher roles' represent or the scope of the retrieval.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus the singular 'get_teacher_role' or other related tools like 'create_teacher_role'. There is no mention of pagination strategy or when to stop iterating through cursors, despite the presence of pagination parameters.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The annotations already establish that this is a non-destructive, idempotent write operation (readOnlyHint: false, idempotentHint: true, destructiveHint: false). The description adds no further behavioral context, such as what happens to existing data if only specific fields are provided, or whether the update is partial or full.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, short sentence with no redundant words. However, its extreme brevity results in under-specification rather than efficient information density; it wastes the opportunity to provide crucial context in the same space.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with 5 parameters and no output schema, the description is inadequate. It fails to explain what aspects of an option can be modified (e.g., 'value' and 'enabled' status), what the 'id' parameter refers to specifically, or how this operation relates to the broader custom field lifecycle.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 80% schema description coverage (4 of 5 parameters documented), the baseline score applies. The description itself adds no parameter-specific guidance (e.g., it does not clarify the confusing schema descriptions where both 'object_type' and 'field_slug' are labeled as 'ID of the parent resource', nor explain what 'value' represents).

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic purpose (updating an option of a custom field) with a clear verb and resource, but it is extremely minimal and does not differentiate from siblings like 'add_option_to_custom_field' or 'delete_option_of_custom_field'. It borders on tautology by nearly restating the tool name.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., when to update an existing option vs. adding a new one with 'add_option_to_custom_field'). It does not mention prerequisites, such as needing to identify the custom field via 'object_type' and 'field_slug' first.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds no behavioral context beyond these annotations—such as what data structure is returned, error handling for invalid user_ids, or that results are paginated.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (6 words), which avoids verbosity, though it contains a grammatical error ('an user' instead of 'a user'). It is front-loaded with the verb but arguably underspecified rather than optimally concise.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (4 parameters, 1 required, no output schema) and high schema coverage, the description is minimally sufficient. However, it fails to explain the domain concept of 'authentications' (evident from the provider enum values like azure_active_directory) or mention pagination behavior implied by cursor/per_page.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameters are well-documented in the schema itself. The description mentions no parameters, so it adds no additional semantic value beyond the baseline.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('authentications of an user'), but it essentially restates the function name without adding specificity about what 'authentications' represents (e.g., identity provider connections). It does not differentiate from sibling tools like get_user.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., create_authentication or get_user), nor does it mention prerequisites or typical use cases.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds no behavioral context beyond these annotations—it does not describe the return structure, error cases (e.g., invalid ID), or what constitutes a 'course location' in this domain.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief at five words and one sentence. However, while concise, it borders on under-specification given the lack of differentiation from sibling tools.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple CRUD retrieval with complete input schema and behavioral annotations, the description is minimally adequate. However, it lacks critical context regarding the output structure and fails to clarify the singular-vs-plural distinction necessary for correct tool selection among siblings.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the course location to retrieve'), the schema fully documents the single parameter. The description adds no semantic information about the parameter, warranting the baseline score of 3 for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses the verb 'Get' and resource 'course location record', but this largely restates the tool name `get_course_location` with minimal elaboration. It fails to distinguish this singular retrieval from the sibling tool `get_course_locations` (plural list), leaving the scope ambiguous.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like `get_course_locations` (for listing) or `get_meeting_location` (for related location types). There are no stated prerequisites or conditions for usage.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly indicate read-only, idempotent, and non-destructive behavior, the description adds no additional behavioral context such as error handling when the ID is not found, the structure of the returned record, or whether related entities are included.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at only four words, with no redundant or wasted language. However, it may be overly terse given the lack of output schema and behavioral details, suggesting insufficient information density rather than optimal conciseness.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the low complexity (single required parameter) and presence of safety annotations, the description is minimally adequate. However, the absence of an output schema creates a gap that the description fails to fill regarding return value structure or content.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage for the single 'id' parameter, the description meets the baseline expectation. However, it adds no supplementary context about the parameter's semantics, format requirements, or how to obtain valid IDs beyond what the schema already provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('material group record'), but lacks specificity regarding scope (single record retrieval by ID) and fails to differentiate from sibling tools like 'get_material_groups' (plural) or explain what constitutes a material group.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives such as 'get_material_groups' for listing multiple records. There are no prerequisites mentioned, though the required 'id' parameter implies the user must know the specific identifier beforehand.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. It omits critical behavioral traits: it does not disclose that results are paginated (despite cursor/per_page parameters), does not clarify the read-only nature of the operation, and does not indicate expected response size or limits.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely concise at only four words with no redundant or wasteful content. However, brevity comes at the cost of missing contextual details, preventing a perfect score.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple two-parameter schema with full coverage and no output schema, the description is minimally adequate. However, it should mention the pagination model to be complete, as 'Get all' implies a complete dataset retrieval while the parameters suggest paginated access.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema adequately documents both 'cursor' and 'per_page' parameters. The description adds no additional parameter semantics, meeting the baseline expectation for well-documented schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states a clear verb ('Get') and resource ('payment option records'), but fails to differentiate from similar sibling tools like 'get_payment_methods' and 'get_payment'. It does not clarify what distinguishes a 'payment option' from a 'payment method' or a single 'payment'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus siblings like 'get_payment_methods', nor when pagination (via cursor/per_page) is required versus fetching the full dataset.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to mention pagination behavior, rate limits, authentication requirements, or whether 'all' refers to system-wide tasks or user-scoped tasks. The description implies a complete dump but the schema reveals paginated access.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely concise at only four words. While not verbose, it is front-loaded and efficient. However, given the lack of annotations and the presence of pagination features, the description is arguably too minimal to be fully effective.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a low-complexity list endpoint with 100% schema coverage and no output schema, the description meets minimum viability but has clear gaps. It should clarify pagination behavior and distinguish from 'get_task' to be complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, with both 'cursor' and 'per_page' fully documented in the input schema. The description adds no additional parameter context (e.g., default pagination behavior), warranting the baseline score for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States a basic verb-resource action ('Get all task records'), but the term 'all' is misleading given the pagination parameters (cursor, per_page) exist. It also fails to distinguish from sibling tool 'get_task' (singular), leaving ambiguity about when to use the list vs. single-resource endpoint.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives like 'get_task' or 'create_task'. Does not mention that pagination is required to retrieve the full dataset implied by 'all', or that results are paginated by default.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. It states the intended state change but fails to disclose idempotency (can it be called on already active teachers?), reversibility, side effects, or return value structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely efficient at three words with zero redundancy. The entire description is action-oriented and front-loaded, though minimal.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a state-mutation tool with no annotations and no output schema, the description is insufficient. It lacks explanation of the 'active' state semantics, error conditions (e.g., invalid ID), or whether the operation is idempotent.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100% (the 'id' parameter is fully documented), establishing the baseline score. The description adds no parameter-specific guidance, but none is needed given the schema completeness.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the action ('Mark') and resource ('teacher') with the specific state change ('as active'). However, it does not explicitly distinguish from the sibling tool 'deactivate_teacher' or explain what 'active' status means in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives, prerequisites (e.g., teacher must exist), or when not to use it. The agent must infer usage patterns solely from the tool name.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description must carry the full burden. It fails to disclose whether approval is reversible, what side effects occur (e.g., triggering invoicing or notifications), or idempotency behavior for repeated calls.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely concise at three words with no redundant or wasted text. Every word earns its place, though extreme brevity contributes to under-specification in other dimensions.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a workflow state-transition tool with siblings like deny_order and cancel_order, the description is incomplete. It lacks explanation of the order lifecycle, valid state transitions, or the business logic consequences of approval.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100% ('ID of the order'), so the baseline applies. The description adds no semantic details about the parameter beyond the schema, but no compensation is needed given the complete schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States a specific verb (approve) and resource (order) clearly. However, it does not distinguish from sibling tools like deny_order or cancel_order, which are part of the same order workflow.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives (e.g., deny_order), nor does it mention prerequisites such as order state requirements or authorization levels needed to approve.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Awards' implies a write operation, the description fails to specify side effects (e.g., email notifications, PDF generation), whether the operation is idempotent, or what happens if a certificate already exists for the enrollment.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence with zero redundancy. It front-loads the action and target without wasting tokens on tautology, making it appropriately sized for the tool's complexity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a mutation tool with no output schema or annotations, the description is insufficient. It lacks disclosure of success indicators, error conditions (e.g., invalid template ID), or whether awarding is reversible. The minimal description leaves significant gaps in the agent's understanding of the operation's consequences.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, with both 'id' and 'certificate_template_id' adequately documented in the input schema. The description adds no additional context about parameter relationships or valid value ranges, meeting the baseline expectation when the schema is self-documenting.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Awards') and clearly identifies the resource (certificate) and target entity (program enrollment). It implicitly distinguishes from sibling tools like 'delete_certificate_from_program_enrollment' and 'get_certificate' through its action verb, though it could explicitly clarify the create-vs-update semantics if awarding multiple times is allowed.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., when to award vs. delete), nor does it mention prerequisites such as enrollment completion status or certificate template availability. It assumes the agent knows the business logic for awarding certificates.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full disclosure burden but reveals nothing about idempotency (can the same teacher be assigned twice?), side effects, error conditions, or the return value. The term 'Assign' implies creation, but behavioral specifics for this mutation are absent.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is front-loaded with the action verb and contains no redundant words. While extremely brief, it efficiently communicates the core function without filler, though additional sentences would improve completeness.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a creation tool with 4 parameters, no annotations, and no output schema, the description is insufficient. It lacks critical context such as prerequisite conditions, conflict handling (if teacher already assigned), and whether the operation is reversible (though 'delete_planning_attendee' exists as a sibling, this relationship isn't noted).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, establishing a baseline of 3. The description adds marginal value by mapping 'attendable' to 'meeting or planning event', but does not explain the optional 'teacher_role_id' parameter or provide usage examples for the polymorphic attendable fields beyond what the schema already documents.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses specific verbs ('Assign') and resources ('teacher', 'meeting or planning event') that clearly map to the tool name and schema. However, it fails to distinguish from the sibling tool 'create_planning_required_teacher_group_attendee', leaving ambiguity about individual vs. group attendee creation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like 'create_planning_required_teacher_group_attendee', nor any mention of prerequisites (e.g., whether the meeting/event must exist beforehand). The description offers no 'when-not' exclusions or workflow context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description must carry the full burden of behavioral disclosure. While 'Assign' implies a write operation, the description fails to clarify if this action is reversible (how to unassign), idempotent (safe to call twice), or what side effects occur (e.g., validation constraints on the teacher or group).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single-sentence description is efficiently structured with no redundant words. However, given the lack of annotations and output schema, the extreme brevity leaves critical behavioral and contextual information unstated.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    As a creation tool with no output schema and no annotations, the description should explain the domain concept of 'required teacher group' and what the assignment enables (e.g., scheduling constraints, eligibility). It currently provides only the minimal operation label without functional context.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema already documents both parameters clearly ('Unique identifier of the teacher to assign', 'Unique identifier of the required teacher group to satisfy'). The description reinforces the assignment concept but does not add syntax details, format constraints, or examples beyond what the schema provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Assign') and identifies both resources involved (teacher and required teacher group), clearly stating what the tool accomplishes. However, it does not differentiate from similar sibling tools like 'create_planning_attendee' or 'create_teacher_enrollment'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives. Given siblings like 'create_planning_attendee' and 'create_teacher_enrollment', the agent needs explicit criteria to distinguish between general attendees, teacher enrollments, and required teacher group assignments.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description must carry the full burden of behavioral disclosure. It states the action ('Enroll') but does not clarify if the operation is idempotent, what data is returned (no output schema exists), or what side effects occur (e.g., notifications sent, validation rules).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description consists of a single efficient sentence with no redundant words. However, given the presence of similarly-named sibling tools and the mutation nature of the operation, it may be overly terse rather than appropriately concise.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    As a mutation tool with no annotations and no output schema, the description should explain the operation's effects and return behavior. It lacks information about what constitutes a successful enrollment, error conditions, or the relationship to the optional 'teacher_role_id' parameter.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the input parameters are fully documented in the schema itself. The description does not add syntactic details or semantic context beyond the schema, meeting the baseline expectation for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Enroll') with clear resource ('teacher') and target ('planned_course'), establishing what the tool does. However, it fails to differentiate from the sibling tool 'create_teacher_enrollment_by_planned_course_id', which appears to offer similar functionality.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (particularly 'create_teacher_enrollment_by_planned_course_id'), nor does it mention prerequisites such as whether the teacher or planned course must exist beforehand, or required permissions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While the description does not contradict the annotations (destructiveHint=true, idempotentHint=true, readOnlyHint=false), it adds no behavioral context beyond what the annotations already provide. It fails to clarify whether deletion is permanent, if recovery is possible, or what side effects occur.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief ('Delete a comment.'), which prevents bloat, but the single sentence merely restates the function name without adding actionable intelligence. It is front-loaded but wasteful in terms of information density.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple single-parameter structure and complete schema coverage, the description meets minimum viability. However, as a destructive operation, it lacks important context about irreversibility, cascading effects, or authorization requirements that would make it fully complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the comment to delete'), the schema adequately documents the parameter. The description adds no additional semantic meaning regarding the ID format or constraints, warranting the baseline score for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states a clear verb ('Delete') and resource ('comment'), accurately describing the operation. However, it does not differentiate from sibling delete operations (e.g., delete_task, delete_grade) beyond the resource name inherent in the function name itself.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like update_comment (for editing instead of removing), nor does it mention prerequisites such as ownership requirements or administrative permissions needed to delete comments.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare the operation as destructive and idempotent, the description adds no behavioral context beyond the verb 'Delete'. It fails to disclose what happens to materials contained within the group (cascade delete vs. orphan), whether the deletion is permanent, or any side effects.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise with no redundant words or filler. However, it borders on under-specification—it could add crucial context (like cascade behavior) with only a few additional words without sacrificing clarity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter destructive operation, the definition is minimally adequate given the rich schema and annotations. However, it lacks domain context regarding sibling tools and the implications of deletion, which are important for an AI agent to avoid destructive errors.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage for the 'id' parameter, the baseline is met. However, the description contributes no additional semantic information about the parameter (e.g., where to obtain the ID, format constraints beyond the schema) or how it identifies the specific group.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a clear verb ('Delete') and identifies the resource ('material group'), making the basic purpose unambiguous. However, it does not differentiate from sibling tools like 'delete_material' or explain what constitutes a 'material group' in this domain.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., 'delete_material' or 'update_material_group' to empty a group), nor does it mention prerequisites such as permissions or whether the group must be empty before deletion.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly flag the operation as destructive and idempotent, the description adds no behavioral context about what happens to existing records referencing this option, whether deletion is permanent, or side effects beyond what the annotations state.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single-sentence description is efficiently structured with the action verb front-loaded. However, extreme brevity results in under-specification rather than true conciseness, as every word is necessary but insufficient.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive operation with three required parameters and no output schema, the description is inadequate. It fails to explain the hierarchical relationship between object_type, field_slug, and id, or to describe the idempotency behavior implied by the annotation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline is 3. The description does not add parameter semantics, but it doesn't need to compensate. Note: The schema confusingly uses identical descriptions ('ID of the parent resource') for both object_type and field_slug, which the description does not clarify.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a clear verb ('Delete') and identifies the resource ('option from custom field'), distinguishing it from siblings like 'add_option_to_custom_field' or 'update_option_of_custom_field'. However, it lacks specificity about what 'option' means (e.g., dropdown value) and omits scope details.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like 'update_option_of_custom_field', nor does it mention prerequisites (e.g., whether the option must be unused) or consequences of deletion.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true and idempotentHint=true, so the description's burden is lower. The description confirms the read-only nature with 'Get', but fails to disclose pagination behavior or typical result set sizes despite having cursor-based pagination parameters.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise (three words) with no redundant or filler text. However, it may be overly terse given the presence of pagination and filtering capabilities that could benefit from brief mention.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple parameter structure (3 optional primitives), good annotations, and high schema coverage, the description is minimally adequate. However, it lacks mention of pagination mechanics or filtering behavior, which would be helpful given the lack of an output schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema fully documents cursor, per_page, and meeting_id. The description adds no additional semantic meaning, but meets the baseline expectation for high-coverage schemas. The word 'all' slightly conflicts with the filtering capability implied by meeting_id.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the verb ('Get') and resource ('attendance records'), but the word 'all' is misleading given the meeting_id filter parameter exists. It does not distinguish from the sibling tool 'set_attendance' or clarify whether this retrieves records for all meetings or a specific one.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives, nor does it mention the pagination parameters (cursor, per_page) or when to apply the meeting_id filter versus retrieving unfiltered results.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare the operation as read-only, idempotent, and non-destructive. The description adds no further behavioral context, such as error handling when the ID is not found, caching behavior, or clarification regarding the 'slug' terminology mismatch with the integer 'id' parameter.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single sentence and appropriately brief for a simple retrieval tool. It is front-loaded with the action verb, though the content provides minimal additive value beyond the tool name itself.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter read operation with comprehensive annotations and no output schema, the description is minimally sufficient. However, the unresolved discrepancy between the 'slug' terminology in the name/description and the 'id' parameter in the schema leaves a notable gap in contextual clarity.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline is established. However, the description refers to an 'object slug' while the input schema accepts an integer 'id', creating semantic confusion without clarifying whether these terms are interchangeable in this context or which identifier format is actually expected.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the specific action ('Get') and resource ('custom object'), but it is essentially a tautology of the tool name ('get_custom_object_by_object_slug'). It fails to distinguish from the sibling tool 'get_custom_objects' (plural) or clarify what constitutes an 'object slug'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this single-object retrieval tool versus the sibling 'get_custom_objects' (likely a list operation) or 'get_custom_record'. There are no exclusions, prerequisites, or alternative suggestions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description aligns with annotations (read-only operation) and implies pagination scope with the word 'all', but adds no information about return format, rate limits, or what data the records contain.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely concise single sentence with verb-fronted structure. While efficient, it borders on under-specification given the domain ambiguity of 'edition description sections'.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a simple paginated list endpoint with strong annotations, but lacks explanation of the return data structure or the relationship between description sections and program editions.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema adequately documents cursor and per_page parameters. The description adds no additional parameter context, meeting the baseline expectation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the verb (Get) and resource (edition description section records) clearly, but fails to distinguish from similar sibling tools like get_elements_of_program_edition or explain what constitutes a 'description section' versus other edition-related data.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives such as get_program_edition or get_elements_of_program_edition, nor does it mention prerequisites like needing a specific edition ID.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering the safety profile. The description adds no further behavioral context (e.g., error handling when ID is missing, what fields are returned, caching behavior), but does not contradict the annotations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely brief at four words with no redundancy. While efficient, it borders on under-specification rather than optimal conciseness given the lack of sibling differentiation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple read operation with one parameter and good annotations, the description is minimally adequate. However, it fails to mention the relationship to 'get_enrollments' or describe what constitutes an 'enrollment record' in this domain.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the enrollment to retrieve'), the baseline is 3. The description adds no parameter details, but the schema fully documents the single 'id' parameter.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action ('Get') and resource ('enrollment record'), but offers no differentiation from the sibling tool 'get_enrollments' (plural). It implies a single record retrieval but doesn't explicitly clarify this is for fetching by ID versus listing/filtering.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this singular fetch versus 'get_enrollments' for bulk retrieval, nor any mention of prerequisites like needing a valid enrollment ID. The description provides zero usage context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With annotations already declaring readOnlyHint=true and destructiveHint=false, the description carries a lower burden but still fails to add meaningful behavioral context such as error handling, authentication requirements, or what the returned grade record contains.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief at four words with no redundant information. While appropriately concise for a simple single-parameter tool, its brevity contributes to the lack of contextual depth.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (one required parameter, read-only operation) and the presence of safety annotations, the description is minimally serviceable. However, it lacks return value documentation which is important given the absence of an output schema.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100% (the 'id' parameter is fully documented in the schema), establishing a baseline score of 3. The description adds no additional parameter semantics beyond what the schema provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states a specific verb ('Get') and resource ('grade record'), clearly identifying this as a retrieval operation. It adequately distinguishes from siblings like create_grade, update_grade, and delete_grade through the CRUD verb, though it lacks additional scope details (e.g., whether it retrieves a single record vs. list).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites like requiring a valid grade ID or error conditions (e.g., not found).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but provides minimal information. It does not mention that results are paginated (despite cursor/per_page parameters existing), does not describe error behavior for invalid invoice IDs, and omits any mention of rate limits or permissions required.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately front-loaded with the verb 'Get' and contains no wasted words. However, it may be overly concise given the lack of annotations and output schema—an additional sentence explaining pagination behavior or return structure would improve utility without sacrificing clarity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a list retrieval tool with 100% schema coverage, the description is minimally viable but has clear gaps. It fails to mention the paginated nature of results (implied by parameters but not stated), does not describe the return value structure (list of payment objects), and lacks differentiation from related payment tools in the sibling set.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage (invoice_id, cursor, per_page all documented), establishing a baseline score of 3. The description adds no additional parameter context (e.g., explaining that cursor is for pagination or that per_page defaults to 25), but none is needed given the comprehensive schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description provides a clear verb ('Get') and resource ('payment records of an invoice'), accurately describing the retrieval operation. However, it fails to differentiate from siblings like 'get_payment' (singular) or contrast with 'create_invoice_payment_by_invoice_id', which would help the agent select the correct tool in a workflow.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No usage guidance is provided. The description does not indicate when to use this tool versus 'get_payment', nor does it mention prerequisites (e.g., needing a valid invoice_id) or suggest using pagination parameters for large result sets.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds the 'all' scope indicating bulk retrieval, but omits pagination mechanics, default page sizes, or what material groups represent in the domain.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely brief at four words with no filler. While efficient, it borders on underspecification for a tool with 100+ siblings. No structural issues or wasted sentences.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a simple list operation given the schema quality and annotations, but lacks output format description (no output schema exists to compensate) and fails to explain the relationship between material groups and other entities.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100% (cursor and per_page fully documented), establishing baseline 3. The description adds no parameter context, usage examples, or constraints beyond what the schema provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic action (Get) and resource (material group records) with scope (all), but fails to distinguish from sibling 'get_material_group' (singular). It does not clarify that this tool requires no identifier and returns a collection, while the singular variant likely retrieves a specific record.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus 'get_material_group' or other material-related operations. No mention of pagination strategy or when to stop iterating through cursors.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare readOnlyHint=true and idempotentHint=true, the description adds no behavioral context beyond the tautology. It does not disclose that results are paginated (despite cursor/per_page parameters), does not explain pagination mechanics, or clarify what 'all' encompasses (historical vs. upcoming).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is front-loaded with the action verb and wastes no words. However, it borders on under-specification given the pagination complexity, leaving the agent to discover cursor behavior solely from the schema.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a paginated list operation with sorting capabilities and no output schema, the description is insufficient. It fails to prepare the agent for pagination handling, result set limits, or the nature of the returned meeting records.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline score applies. The description does not mention any parameters or add semantic context about the planned_course_id relationship, but the schema fully documents all four parameters including the sort enum values.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the verb (Get) and resource (meeting records) with scope (of a planned course). However, it does not explicitly distinguish from sibling `get_meeting` (singular), which likely retrieves a specific meeting by ID rather than listing all meetings for a course.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives. It fails to mention that this is for listing meetings when you have a planned_course_id, or that `get_meeting` should be used when you have a specific meeting ID.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full burden but offers minimal behavioral disclosure. It does not clarify pagination behavior (despite cursor/per_page parameters), cache policies, rate limits, or whether 'all' means global scope or user-scoped data. The 'Get' implies read-only but safety is not explicit.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence of 9 words with no redundancy. Efficiently structured, though arguably too minimal given the lack of annotations and potential for confusion with sibling tools.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Inadequate given the context: no output schema, no annotations, and confusingly similar sibling `get_payment_options`. The description should explain what distinguishes a 'payment method' from a 'payment option' and what fields the records contain.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, documenting cursor and per_page sufficiently. The description adds no parameter-specific context, but baseline 3 is appropriate when the schema carries the full load.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the verb (Get) and resource (payment method records) with scope (available for customers). However, it fails to distinguish from the sibling tool `get_payment_options`, which could cause selection ambiguity since these terms are often synonymous.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus siblings like `get_payment` (likely for transactions) or `get_payment_options`. No mention of prerequisites or specific use cases where this is preferred.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Get' implies a safe read operation, the description fails to mention pagination (cursor, per_page), extensive filtering capabilities (type, status, date ranges), or sorting behavior—all critical behavioral traits evident in the 12-parameter schema.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence with no filler words or redundancy. It is appropriately front-loaded with the action and resource. While extremely minimal for a 12-parameter endpoint, it avoids the verbosity that would harm its conciseness score.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the high complexity (12 parameters with rich filtering, pagination, and sorting) and the absence of both output schema and annotations, the description is insufficient. It fails to indicate that this is a paginated list endpoint or that it supports filtering by status, type, and date ranges—information essential for correct invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, so the schema fully documents all 12 parameters including course_id, cursor, filters, and sort options. The description mentions 'single course' which aligns with the required course_id parameter, but adds no additional semantic context (e.g., date formats, filter combinations) beyond what the schema already provides, warranting the baseline score.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the verb (Get), resource (planned course records), and scope (of a single course). However, while the word 'all' implicitly distinguishes it from the sibling 'get_planned_courses_by_id_and_course_id', it does not explicitly clarify when to use this versus that specific-record alternative.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no explicit guidance on when to use this tool versus alternatives (like the sibling get_planned_courses_by_id_and_course_id), nor does it mention prerequisites such as needing a valid course_id from get_courses. Usage must be inferred entirely from the parameter names and tool name.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description must carry the full burden of behavioral disclosure. While 'Get' implies a read-only operation, the description does not specify error handling (e.g., 404 behavior), required permissions, or rate limiting. It fails to disclose whether the operation is safe or idempotent beyond the implicit verb choice.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description consists of a single, efficient nine-word sentence that immediately states the operation and resource type. There is no redundant or wasted language, though the brevity comes at the cost of omitting helpful contextual details.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a two-parameter retrieval tool without output schema or annotations, the description meets minimum viability by identifying the core operation and resource. However, it lacks critical contextual details regarding sibling differentiation and behavioral specifics (error states, permissions) that would be necessary for robust agent operation given the presence of similar tools.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema features 100% description coverage for both parameters (`course_id` and `id`), with the schema itself documenting that one is the parent resource ID and the other the specific record ID. The description adds no additional parameter semantics, but the schema adequately covers requirements, keeping this at baseline.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses the specific verb 'Get' and identifies the resource as a 'planned course record'. It clarifies scope by stating 'single course', implying a singular retrieval operation that distinguishes it from potential list operations like `get_planned_courses_by_course_id`. However, it does not explicitly name sibling alternatives or clarify when to use this specific endpoint versus the list variant.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to select this tool over similar retrieval options such as `get_planned_courses_by_course_id`. It lacks prerequisites, error conditions, or contextual triggers for usage. No alternatives or exclusions are mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. It fails to mention this is a read-only operation, does not describe the pagination behavior (despite cursor/per_page parameters), and omits error handling details like what happens if the planning event ID doesn't exist.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence with the verb front-loaded. It contains no redundant or wasteful text, though it borders on being too minimal for the complexity of a paginated sub-resource retrieval.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the 100% schema coverage and simple parameter structure, the description is minimally adequate. However, for a tool with no output schema and no annotations, it could improve by clarifying that this returns a paginated list of materials associated with the specified event.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the schema adequately documents all three parameters (id, cursor, per_page). The description adds no additional parameter context, meeting the baseline expectation when the schema is self-documenting.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the verb (Get), resource (planning materials), and scope (of a planning event). It effectively distinguishes this tool from siblings like 'get_materials' (general materials) and 'get_planning_event' (the event itself) by specifying the nested relationship.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives such as 'get_materials', nor does it mention prerequisites like needing a valid planning event ID or when pagination is necessary.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It only indicates a write operation via 'Adds' but fails to specify side effects (e.g., whether existing elements are replaced or appended), return values, permission requirements, or rate limits.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence with no redundant words. It is appropriately front-loaded with the action verb and conveys the core purpose without waste.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the complex polymorphic input schema with nested object structures and two distinct element types, the description is inadequate. It lacks explanation of the element types, nesting behavior, and output expectations (no output schema provided), leaving significant gaps for an agent attempting to construct valid inputs.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 50% schema coverage, the description partially compensates by referencing 'set of elements', which maps to the elements array parameter. However, it does not explain the polymorphic nature of elements (CourseElement vs BlockElement), the nested structure of BlockElement, or the relationship between course_id and planned_course_id.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the action (Adds) and target (set of elements to a program edition). It implies batch operations via 'set of elements', distinguishing it from singular 'get' or 'create' siblings. However, it does not explicitly clarify the distinction between adding elements to an edition versus creating program element definitions (create_program_element).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like create_program_element or get_elements_of_program_edition. It omits prerequisites (e.g., requiring an existing program edition) and does not indicate idempotency or error conditions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description must carry full behavioral disclosure. While 'Get all' implies a list operation, the description fails to mention pagination behavior (despite cursor/per_page parameters), read-only safety, or error handling (e.g., invalid planned_course_id). This leaves significant behavioral gaps.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence with no redundancy. However, given the complete absence of annotations and output schema, it may be overly terse—one additional sentence explaining pagination or return behavior would improve utility without sacrificing clarity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    With no output schema and no annotations, the description is minimally viable for a 3-parameter tool. It identifies the resource type returned (teacher enrollments) but omits list behavior details, pagination guidance, or data scope that would help an agent handle the response correctly.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the parent resource', 'Cursor for fetching the next page', etc.), so the baseline is 3. The description mentions 'given planned course' which aligns with the required parameter, but adds no additional semantics about valid ID formats or pagination usage beyond what the schema already provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Get') with clear resource ('teacher enrollments') and scope ('by planned course'). However, it does not explicitly distinguish from sibling tools like 'get_enrollments' (likely for students) or 'get_teacher' (profiles vs enrollments), which would help the agent select correctly among the 100+ available tools.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives. Given siblings like 'create_teacher_enrollment_by_planned_course_id' and 'get_enrollments', explicit guidance on read vs write scenarios or teacher vs student data would be necessary for a higher score.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but fails to specify read-only safety, error behavior (what happens if the ID doesn't exist), return format, or whether this triggers any side effects like logging.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief at four words with no redundancy. While efficient, it borders on underspecification—every word earns its place, but critical information is omitted that would justify a fifth point.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the absence of annotations and output schema, the description should disclose return structure or error behavior. As a retrieval tool for a specific resource, it fails to indicate whether it returns the full webhook object, a subset, or null/404 on missing IDs.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the webhook to retrieve'), establishing baseline 3. The description 'Get a registered webhook' implies the ID parameter identifies the webhook to retrieve, but adds no additional semantic context about ID provenance or valid ranges beyond the schema's 'exclusiveMinimum: 0'.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Get') and resource ('webhook'), and the singular form distinguishes it from sibling 'get_webhooks' (plural) while the verb distinguishes it from 'create_webhook', 'update_webhook', and 'delete_webhook'. However, it lacks explicit scope clarification between singular and plural variants.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this singular retrieval tool versus 'get_webhooks' (list), nor any mention of prerequisites (e.g., needing the webhook ID from a previous list operation) or error conditions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Get' implies read-only access, the description fails to disclose pagination behavior (despite cursor/per_page parameters), what data structure is returned (no output schema exists), or whether this endpoint has rate limits.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely compact at four words with zero redundancy. It is front-loaded with the action verb and contains no filler text; every word serves the core purpose statement.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 5 parameters and no output schema or annotations, the description is insufficient. It fails to explain the return value format, what defines a 'failed' notification state, or how pagination interacts with the time-range filters.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the structured fields already document all 5 parameters adequately. The description adds no additional semantic context (e.g., date format expectations for 'start'/'end'), meeting the baseline expectation for well-documented schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a clear verb ('Get') and specific resource ('failed webhook notifications'), distinguishing it from sibling tools like 'get_webhook' or 'get_webhooks' by targeting notification delivery failures rather than configuration. However, it lacks domain context explaining what constitutes a 'failed' notification.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites (e.g., needing a valid webhook_id from 'get_webhooks' or 'create_webhook'). There are no exclusions or workflow hints.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but fails to clarify that results are paginated (despite cursor/per_page parameters), what webhook data is returned, or any rate limiting. The claim 'Get all' is ambiguous regarding pagination behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description is appropriately front-loaded with no redundant phrases. However, given the complete absence of annotations and output schema, extreme brevity becomes a liability rather than a virtue.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a paginated listing tool with no output schema and no annotations, the description is insufficient. It fails to mention pagination behavior, default page sizes, or the structure of returned webhook objects, leaving critical operational context undocumented.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has 100% description coverage for both parameters (cursor and per_page), so the baseline score applies. The description adds no parameter-specific context, but the schema adequately documents the pagination controls.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses the specific resource 'webhooks' and scope modifier 'all' to distinguish from the singular sibling 'get_webhook'. While 'Get' is slightly generic, the phrase clearly identifies this as a list operation for the webhook resource.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this plural/list endpoint versus the singular 'get_webhook', nor when to use pagination parameters versus fetching all results. The description offers no contextual cues for tool selection.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already establish this is read-only, idempotent, and non-destructive. The description adds 'all' to indicate bulk retrieval behavior, but fails to disclose that the operation is paginated (despite having cursor/per_page parameters) or what happens if the custom field doesn't exist.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely concise at 7 words. The single sentence efficiently communicates the core operation without redundancy or filler content. Every word serves a purpose, and the description is appropriately front-loaded.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of annotations covering safety properties and 100% schema coverage, the description is minimally adequate. However, for a 4-parameter tool with pagination support and no output schema, it should mention the pagination behavior and distinguish itself from the singular 'get_option' sibling.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, establishing a baseline of 3. The description adds no parameter-specific context, and the schema descriptions for required parameters 'object_type' and 'field_slug' are confusingly identical ('ID of the parent resource'), which the description does not clarify or disambiguate.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the basic operation ('Get') and resource ('options of a custom field') but fails to differentiate from the sibling tool 'get_option_of_custom_field' (singular). While 'all' implies a list operation, it doesn't clarify when to use this paginated list endpoint versus retrieving a single specific option.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like 'get_option_of_custom_field' (singular) or mutating siblings ('add_option_to_custom_field', 'update_option_of_custom_field'). No mention of prerequisites or required context for the 'object_type' and 'field_slug' parameters.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description confirms the dual action of creating and sending (not just drafting), which complements the annotations (readOnlyHint: false). However, it fails to elaborate on the idempotentHint: true annotation (what makes resending safe), nor does it mention delivery guarantees, rate limits, or failure behaviors that would help an agent understand operational constraints.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence of nine words with no filler. It immediately communicates the core action without preamble, making it appropriately front-loaded for quick agent comprehension.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple 4-parameter schema with complete field documentation and no output schema, the description is minimally adequate. However, for an email-sending operation, it lacks context about return values (confirmation IDs, success indicators) or side effects (notification triggers, audit logging), which would be valuable given the operation's importance.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the baseline score applies. The schema comprehensively documents all four parameters including template syntax (curly braces) and JSON escaping requirements for the body field. The description adds no parameter-specific guidance, so it meets but does not exceed the baseline expectation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool creates and sends an email message to a user, using specific verbs ('Create and send') and identifying the resource (email message). It aligns with the tool name's implication of targeting a specific user, though it could explicitly mention the user_id parameter to fully distinguish it from potential bulk email tools.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives, prerequisites (e.g., user existence verification), or when not to use it. It does not mention if this is for transactional emails only or marketing emails, nor does it reference the sibling tools that might handle email-related tasks differently.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries full disclosure burden. It fails to mention mutation effects, idempotency, error conditions (e.g., duplicate enrollment), or whether the operation is reversible. The word 'Enroll' implies creation but lacks safety context.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single-sentence description is efficiently structured and front-loaded with the core action. However, it borders on under-specification given the lack of annotations and presence of sibling alternatives, warranting a slight deduction from perfect conciseness.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    As a mutation tool with no output schema, no annotations, and overlapping sibling functionality (create_teacher_enrollment), the description is incomplete. It omits return value details, error scenarios, and the specific workflow context distinguishing this endpoint from alternatives.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, establishing a baseline of 3. The description maps 'teacher' to teacher_id and 'planned course' to planned_course_id, but adds no syntax details, validation rules, or clarification about the optional teacher_role_id parameter.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the action (enroll), subject (teacher), and target (planned course). However, it does not differentiate from the sibling tool 'create_teacher_enrollment', leaving ambiguity about which creation method to prefer.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description implies usage context (when you need to enroll a teacher to a specific planned course), but provides no explicit guidance on prerequisites (e.g., teacher must exist) or when to use this specific 'by_planned_course_id' variant versus the generic 'create_teacher_enrollment'.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but only states the basic action without clarifying side effects, reversibility, or impact on existing teacher enrollments. It does not indicate whether this operation affects historical data or merely prevents future assignments.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description 'Mark teacher as inactive' is appropriately front-loaded with no redundant or wasteful language. Every word directly contributes to understanding the tool's core function.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (single parameter, no output schema) and the presence of a clear sibling inverse (`activate_teacher`), the description adequately conveys the basic operation but omits important mutation context such as reversibility confirmation or cascade effects. It meets minimum viability but leaves gaps for an agent determining operational safety.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for its single `id` parameter, documenting it as 'ID of the teacher'. The description adds no additional semantic context about the parameter, meeting the baseline expectation when schema documentation is complete.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Mark teacher as inactive' clearly identifies the verb (mark), resource (teacher), and end-state (inactive). While it implicitly contrasts with the sibling tool `activate_teacher` by specifying the inactive state, it does not explicitly differentiate when to use deactivation versus other lifecycle operations.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives, prerequisites such as teacher existence, or potential conflicts with active enrollments. It fails to mention the sibling `activate_teacher` as the reversal mechanism.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. While the word 'Deletes' implies a destructive operation, the description lacks critical details: whether the deletion is permanent or reversible, if it triggers notifications, required permissions, or side effects on the program enrollment status. For a destructive operation, this omission is significant.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence of six words with no redundant information. It is front-loaded with the action verb and immediately identifies the target resource, making it easy to parse quickly.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (single integer parameter, no output schema, clear CRUD operation), the description provides the minimum viable context. However, it omits behavioral implications typical for deletion operations (e.g., idempotency, error cases if no certificate exists) that would be necessary for robust agent operation without trial and error.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage ('ID of the program enrollment'), so the schema fully documents the parameter. The description adds no additional semantic context about the parameter (e.g., that it represents the enrollment from which to remove the certificate), meeting the baseline expectation when schema coverage is complete.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Deletes') and clearly identifies the resource ('certificate from a program enrollment'). It implicitly distinguishes from sibling tools like 'award_certificate_to_program_enrollment' and 'cancel_program_enrollment' by explicitly mentioning the certificate-specific nature of the operation, though it does not explicitly name alternatives.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives such as 'award_certificate_to_program_enrollment' (the inverse operation) or 'cancel_program_enrollment' (which affects the enrollment itself). It also does not mention prerequisites, such as whether the enrollment must exist or currently have a certificate attached.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Remove' implies deletion, the description does not state whether this action is reversible, if notifications are sent to the teacher, or what happens if the teacher is required for the event. It omits critical safety context for a destructive operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, efficient sentence with no redundant words. It immediately states the action and target, making it appropriately front-loaded for quick comprehension.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the single-parameter schema and lack of output schema, the description adequately covers the basic operation. However, for a destructive tool with no annotations, it is incomplete—it should mention irreversibility, potential side effects (e.g., triggering rescheduling), or the domain-specific constraint that this specifically removes teacher attendees.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the planning attendee to delete'), the baseline score applies. The description implies the attendee is a teacher, adding slight semantic context, but does not elaborate on parameter format, valid ranges, or how to obtain the ID. It meets the baseline expectation without adding significant param-specific guidance.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Remove') and identifies the resource ('teacher'/'planning event'), clarifying that this targets an attendee association rather than the event itself (distinguishing it from siblings like delete_planning_event). However, it does not explicitly differentiate from delete_meeting or clarify that the attendee must be a teacher.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like delete_planning_event (which removes the entire event) or cancel_planned_course. It also fails to specify prerequisites, such as needing the specific planning attendee ID rather than the teacher ID or event ID.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true and idempotentHint=true, covering safety and idempotency. The description adds the qualifier 'awarded', indicating the tool retrieves issued certificates rather than templates, which provides useful context given the sibling 'award_certificate_to_program_enrollment'. However, it omits details about error handling (e.g., 404 if ID not found) or return structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (four words) and front-loaded, containing no redundant or wasted language. However, its extreme brevity results in under-specification rather than efficient information density.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the absence of an output schema, the description should ideally explain what certificate data is returned or clarify the singular vs. plural retrieval distinction. As written, it provides the bare minimum for a simple getter but leaves critical contextual gaps for effective tool invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the certificate to retrieve'), the baseline score applies. The description adds no supplementary parameter semantics (e.g., that the ID refers to an awarded certificate ID specifically), but the schema documentation is sufficient.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description provides a clear verb ('Get') and resource ('awarded certificate'), establishing the basic operation. However, it fails to explicitly distinguish from the sibling tool 'get_certificates' (plural), leaving ambiguity about when to use the single-ID retrieval versus the list operation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like 'get_certificates' or 'award_certificate_to_program_enrollment'. There are no stated prerequisites, conditions, or exclusion criteria to guide the agent's selection.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations declare read-only and non-destructive traits, the description adds the domain-specific qualifier 'awarded' (implying issued certificates rather than templates). However, it omits critical behavioral context about pagination despite having cursor/pagination parameters.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise (4 words) with no filler content. While efficient, it sacrifices necessary context about pagination behavior that would have been valuable to front-load.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the lack of output schema and the presence of pagination parameters, the description is incomplete. It fails to indicate that results are paginated or describe what constitutes an 'awarded certificate' in the return data, leaving operational gaps.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for both parameters (cursor and per_page), so the description does not need to compensate. It neither repeats nor extends the schema information, meeting the baseline expectation when structured data is complete.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the specific action ('Get') and resource ('awarded certificates'), distinguishing it from the singular 'get_certificate' sibling by using the plural form and implying bulk retrieval. However, it could be improved by explicitly stating this is a list operation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus the singular 'get_certificate' or the 'award_certificate_to_program_enrollment' sibling. It fails to mention pagination requirements or filtering capabilities.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already establish this is read-only, idempotent, and non-destructive. The description adds minimal behavioral context beyond stating the scope ('all' records), but does not mention pagination mechanics, rate limits, or what constitutes a 'course location record'.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at five words. While efficient, it borders on underspecified given the lack of output schema and pagination details. It is appropriately front-loaded but could benefit from one additional sentence on pagination.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Without an output schema, the description should explain what a 'course location record' contains or what the return structure looks like. It also omits pagination behavior, which is critical for a 'list all' style endpoint. The description is too minimal for the tool's complexity.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage, the parameters (cursor, per_page) are fully self-documenting. The description adds no supplemental parameter information, but given the complete schema coverage, this meets the baseline expectation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a clear verb ('Get') and resource ('course location records'), distinguishing it from the singular 'get_course_location' sibling via the plural form and word 'all'. However, it loses a point for not clarifying whether this retrieves every record in one call or requires pagination to get the full set.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus 'get_course_location' (singular) or other location-related tools like 'get_meeting_locations'. It also fails to mention that pagination is required to retrieve the complete dataset.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true and idempotentHint=true, so the safety profile is covered. The description adds the 'all' scope indicator, suggesting an unfiltered list operation, but provides no additional context on pagination behavior, rate limits, or error conditions.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is front-loaded with the action verb and contains no filler words. While extremely brief, every word earns its place in describing the core operation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the complexity of this API (100+ sibling tools) and lack of output schema, the description is inadequate. It fails to define what constitutes a 'credit category', how it relates to 'credits', or what data structure is returned.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (cursor and per_page are well-documented in the schema), the baseline score applies. The description adds no parameter-specific semantics, but none are needed given the comprehensive schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a clear verb+resource pattern ('Get all credit category records') that states the tool's basic function. However, it fails to differentiate from siblings like 'get_credits' or 'get_categories', leaving ambiguity about whether this returns categories of credits, credits themselves, or general categories.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like 'get_credits' or 'get_categories'. Does not mention pagination requirements or when to prefer this over filtered retrieval methods.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    While annotations correctly declare readOnlyHint=true and idempotentHint=true, the description adds no behavioral context beyond this. It fails to disclose pagination behavior, whether 'all' includes expired/inactive codes, or return value structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is maximally concise at four words with zero redundancy. It leads with the action verb and immediately identifies the resource, placing critical information first with no filler content.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple list retrieval tool with good annotations and optional pagination parameters, the description is minimally adequate. However, it lacks guidance on handling large result sets, pagination traversal, or output format, leaving operational gaps.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage (cursor and per_page are fully documented in the schema), the description does not need to elaborate on parameters. It neither adds nor detracts from the schema definitions, warranting the baseline score.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Get') and resource ('discount codes') and includes scope ('all'). However, it does not differentiate from potential sibling tools (e.g., if a singular 'get_discount_code' existed) or clarify filtering limitations as seen in higher-quality descriptions.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives, nor does it mention pagination patterns despite the presence of cursor/per_page parameters. It states only what the tool does, not when to invoke it.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description aligns with annotations (readOnlyHint=true, idempotentHint=true) by using the verb 'Get', but adds no additional behavioral context such as error handling when the ID is not found, return format, or caching behavior. With annotations covering the safety profile, the description meets the minimum threshold.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The four-word description is maximally concise with no filler, though it borders on underspecification. Every word earns its place, but the extreme brevity sacrifices helpful context.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter retrieval tool with complete annotations and no output schema, the description is minimally adequate. However, it lacks guidance on the relationship to sibling tools and does not describe the expected return structure or error states.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 100% description coverage for its single 'id' parameter. The description does not add semantic details about the parameter (e.g., examples, format), so it earns the baseline score for high schema coverage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a clear verb ('Get') and resource ('catalog variant record'), making the basic purpose understandable. However, it fails to distinguish from the sibling tool 'get_catalog_variants' (plural), which could help the agent select the correct tool for single-record vs. list retrieval.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like 'get_catalog_variants', 'get_catalog_product', or 'update_catalog_variant'. The agent must infer from the parameter schema alone that this retrieves a single record by ID.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations already declare readOnlyHint=true and idempotentHint=true, covering the safety profile. The description adds minimal behavioral context—it claims 'all' records but doesn't disclose that the response is paginated (requiring cursor iteration for complete dataset) or describe response structure.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence with no redundant words. However, extreme brevity leaves gaps regarding pagination behavior and plural vs singular distinction that could be resolved with one additional clause without sacrificing clarity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a simple list endpoint with good schema coverage, but incomplete given the lack of output schema and presence of closely-named siblings. Missing clarification that this returns a paginated collection rather than literally 'all' records at once.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('Cursor for fetching the next page', 'Number of results per page'), the description doesn't need to duplicate parameter documentation. It meets the baseline expectation for high-coverage schemas without adding supplementary context.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses specific verb 'Get' with resource 'meeting location records', clearly indicating a retrieval operation. However, it does not explicitly distinguish this list endpoint from the sibling 'get_meeting_location' (singular), which could cause confusion given the similar naming.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like 'get_meeting_location', nor does it explain when pagination is necessary. The description mentions 'all' but doesn't clarify that results are actually paginated.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    The description aligns with annotations (destructiveHint=true, readOnlyHint=false) by using the word 'Remove', but adds no behavioral context beyond what annotations provide. It doesn't mention idempotency (covered by annotation), irreversibility, or whether the user is notified.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely efficient at 6 words with zero redundancy. The single sentence is front-loaded with the action and target, making it immediately scannable despite its brevity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a destructive two-parameter operation with 100% schema coverage and explicit annotations, the description is minimally adequate. However, it lacks context about what 'authentication' specifically refers to (API key, OAuth, password) and immediate side effects, which would be valuable given the destructive nature.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the authentication to delete', 'ID of the parent resource'), the description meets the baseline. It adds no additional semantic information about the parameters beyond what the schema already provides.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the verb (Remove) and resource (authentication) with scope (from a user). While it doesn't explicitly differentiate from sibling delete operations like delete_affiliation, the specific resource 'authentication' provides sufficient distinction in context.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., deactivate vs delete), nor does it mention prerequisites like needing the authentication ID from get_authentications_by_user_id or potential side effects like invalidating active sessions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Annotations declare read-only/idempotent safety, so description properly focuses on adding the crucial output format detail (base64 encoding) that annotations lack. However, fails to explain why pagination parameters (cursor, per_page) exist for a singular 'PDF' retrieval.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single efficient sentence with zero waste. Front-loaded with action verb, immediately specifies encoding format (base64) and resource type (PDF) which are critical for correct invocation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Without output schema, the description adequately specifies the return format (base64). However, the presence of pagination parameters vs singular 'PDF' description creates an unresolved ambiguity that should have been addressed for a complete definition.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 100%, establishing baseline 3. Description doesn't add parameter-specific semantics, but the schema adequately documents the 'id' as referring to a parent resource. The pagination parameters remain semantically confusing given the singular tool purpose.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Specific verb 'Get' + resource 'invoice PDF' + format 'base64 encoded' clearly defines the scope. Distinguishes from sibling 'get_invoice' (which returns invoice metadata) by specifying PDF file content retrieval.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Implies usage (when you need the actual file content vs metadata), but lacks explicit when-to-use guidance comparing it to 'get_invoice' and doesn't mention prerequisite steps like obtaining the invoice ID.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Adds crucial behavioral details beyond annotations: explains the state machine transition (concept→open), the side effect of invoice number assignment, and confirms idempotent behavior ('does nothing' if already processed). Does not cover auth requirements or rate limits, but covers the essential mutation behavior clearly.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Three tightly constructed sentences with zero waste: action (sentence 1), consequence/value (sentence 2), guard condition/idempotency (sentence 3). Information is front-loaded and logically sequenced.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness5/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter state transition tool with no output schema, the description is comprehensive. It covers the trigger condition, the transformation, the side effects, and the safety/idempotency behavior. No critical gaps remain given the tool's simplicity.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 100% schema description coverage ('ID of the invoice'), the schema fully documents the parameter. The description implies the parameter by referencing 'the invoice' but does not add syntax, format, or semantic details beyond what the schema already provides. Baseline 3 is appropriate when schema carries the full load.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description explicitly states the core action ('Changes the state from concept to open'), the resource (invoice), and the side effect ('assign the actual invoice number'). It clearly distinguishes from siblings like create_invoice or get_invoice by specifying this is a state transition workflow tool.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines4/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides explicit when-not guidance ('If the current state is not concept, this endpoint does nothing'), establishing the prerequisite state for usage. However, it does not explicitly name sibling alternatives like create_invoice for creating the initial concept invoice.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

GitHub Badge

Glama performs regular codebase and documentation scans to:

  • Confirm that the MCP server is working as expected.
  • Confirm that there are no obvious security issues.
  • Evaluate tool definition quality.

Our badge communicates server capabilities, safety, and installation instructions.

Card Badge

eduframe-mcp MCP server

Copy to your README.md:

Score Badge

eduframe-mcp MCP server

Copy to your README.md:

How to claim the server?

If you are the author of the server, you simply need to authenticate using GitHub.

However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.

{
  "$schema": "https://glama.ai/mcp/schemas/server.json",
  "maintainers": [
    "your-github-username"
  ]
}

Then, authenticate using GitHub.

Browse examples.

How to make a release?

A "release" on Glama is not the same as a GitHub release. To create a Glama release:

  1. Claim the server if you haven't already.
  2. Go to the Dockerfile admin page, configure the build spec, and click Deploy.
  3. Once the build test succeeds, click Make Release, enter a version, and publish.

This process allows Glama to run security checks on your server and enables users to deploy it.

How to add a LICENSE?

Please follow the instructions in the GitHub documentation.

Once GitHub recognizes the license, the system will automatically detect it within a few hours.

If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.

How to sync the server with GitHub?

Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.

To manually sync the server, click the "Sync Server" button in the MCP server admin interface.

How is the quality score calculated?

The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).

Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.

Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).

Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/martijnpieters/eduframe-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server