ShotPulled

Ownership verified

Server Details

Connect your espresso routine to your AI assistant. This MCP server allows LLMs to act as a personal coffee coach by accessing your shot logs, grinder settings, and active bean inventory. Use it to automatically calculate extraction adjustments, track remaining coffee supply, and lock in winning recipes directly from your chat session.

Status: Healthy
Last Tested: 2026-07-26 02:39
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.5/5.0

Tool DescriptionsB

Average 3.5/5 across 30 of 30 tools scored. Lowest: 2.4/5.

Server CoherenceA

Disambiguation5/5

Each tool has a clearly distinct purpose. For example, diagnose_preview and diagnose_shot are distinguished by sandbox vs. real execution. Overlap between log_recommendation and suggest_next_step is minimal, as one records a choice and the other fetches a suggestion.

Naming Consistency5/5

All tools follow a consistent verb_noun pattern in snake_case, such as log_shot, delete_shot, update_bean, register_machine, etc. There are no deviations or mixed conventions.

Tool Count4/5

30 tools is on the higher end but justified by the comprehensive coverage of coffee brewing management: equipment registration, shot logging, diagnosis, recommendations, and knowledge base. Each tool serves a necessary function, though the set could be slightly streamlined.

Completeness4/5

The tool surface covers essential CRUD operations for beans, grinders, machines, waters, programs, and shots, plus diagnostics and recommendations. Missing explicit delete operations for equipment (only archive) and a get_shot tool for single shot details, but these gaps are minor.

Available Tools

30 tools

compute_ageC

Read-only

Inspect

Compute coffee age in days off roast and resting/staling verdict.

ParametersJSON Schema

Name	Required	Description	Default
`bean_id`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description doesn't need to repeat that. However, it fails to disclose behavioral traits like output format (e.g., days as integer, verdict as string), error handling for invalid IDs, or reliance on bean roast dates. The description adds minimal value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no extraneous words. Every word contributes meaning, making it highly efficient for a simple tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite being a simple tool with one parameter and no output schema, the description omits crucial details: it doesn't specify the return structure (e.g., is the verdict a string or boolean?), any assumptions about the bean's registration, or how to interpret results. This leaves ambiguity for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, and the description does not mention the 'bean_id' parameter at all. It leaves the agent to infer what bean_id refers to and how to obtain it, which is insufficient for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Compute coffee age in days off roast and resting/staling verdict' clearly states the tool's function with a specific verb and resource. It distinguishes itself from siblings like 'get_stats' or 'list_beans' by focusing on age computation, but does not explicitly differentiate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, such as needing a valid bean_id from 'list_beans', nor does it exclude scenarios where this tool is inappropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_shotA

DestructiveIdempotent

Inspect

Delete a logged shot from history. Restores the bean's remaining weight (which is derived from logged doses). Hard delete — there is no undo. To fix a mistake on an otherwise-valid shot, prefer update_shot over delete-and-relog.

ParametersJSON Schema

Name	Required	Description	Default
`shot_id`	Yes	ID of the shot to delete

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide destructiveHint: true and idempotentHint: true. Description adds behavioral context: restores bean's remaining weight and states hard delete with no undo. This adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with one parameter, no output schema. Description covers purpose, side effect (weight restoration), and alternative tool. Complete for this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (shot_id) with 100% schema coverage. Description does not add details beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Delete a logged shot from history' and distinguishes from update_shot by advising its use for fixing mistakes. It adds context about restoring bean weight.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (deleting a shot) and when not to ('To fix a mistake on an otherwise-valid shot, prefer update_shot'). Also warns that 'there is no undo'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diagnose_previewA

Read-only

Inspect

Sandbox-only diagnosis, writes nothing (no verdict, no recommendation-trail entry). Two modes: pass shot_id to dry-run a LOGGED shot (optionally overriding its sensory_tags — the "what would this read as?" preview; the shot's own bean and its age at pulled_at are used), or pass the full metric set (bean_id, grinder_id, machine_id, grind_label, dose_g, yield_g, time_s, source) for a hypothetical shot. Identical output shape to diagnose_shot, including bean_context.

ParametersJSON Schema

Name	Required	Description
`dose_g`	No	Dose in grams
`source`	No	Grinder position source — affects G1 warning (raw mode only)
`time_s`	No	Extraction time in seconds
`bean_id`	No	Coffee bean ID (must belong to this account)
`shot_id`	No	Preview a logged shot by ID (dry-run; ignores the metric params below)
`yield_g`	No	Yield in grams
`grinder_id`	No	Grinder ID (must belong to this account)
`machine_id`	No	Machine ID (must belong to this account)
`grind_label`	No	Grinder setting label, e.g. "1.1.3"
`sensory_tags`	No	Sensory observations, e.g. ["sour","bitter"]. With shot_id: replaces the stored tags for this preview (omit to use stored; [] = none).

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=false. The description adds valuable context: it 'evaluates a hypothetical shot against the real rules engine without logging anything', which transparently explains the non-destructive, sandbox behavior beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with only two short sentences, yet it conveys the core purpose, behavior, and relationship to the sibling tool. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lacks an explicit output schema but references the output shape of diagnose_shot. For a sandbox tool with 9 parameters fully described in the schema, this is mostly adequate, though it assumes agent knowledge of the sibling's output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema itself provides adequate meaning for all 9 parameters. The description does not add any additional semantic information about the parameters beyond what is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it is a 'sandbox-only diagnosis' that evaluates a hypothetical shot without logging. It specifies 'identical output shape to diagnose_shot', which distinguishes it from the sibling tool 'diagnose_shot' that likely logs results.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'sandbox-only', implying it is for testing or hypothetical scenarios. By referencing 'identical output shape to diagnose_shot', it contrasts with the production sibling, providing context for when to use this tool instead of diagnose_shot.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diagnose_shotA

Read-only

Inspect

Evaluate shot metrics to provide ranked hypotheses (C6) and warn on flip-flops (G2) or fatigue (G6). Resolves the SHOT's own bean — age computed at the shot's pulled_at — never the active profile, and echoes it as bean_context, so diagnosing an older or differently-filed shot is always safe. The engine reads metrics and sensory tags — NOT free-text notes — so make sure taste feedback is recorded as sensory_tags on the shot (via log_shot or update_shot) before diagnosing; otherwise an in-range shot that tastes bad will come back "balanced".

ParametersJSON Schema

Name	Required	Description	Default
`shot_id`	No	Optional shot ID; defaults to last pulled shot.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adds value by clarifying that it reads metrics and sensory tags, and warns about the consequence of missing sensory tags (balanced result for bad taste). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste, front-loaded with purpose and followed by essential usage guidance. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers prerequisites, behavior, and a warning. It could mention return format or common hypotheses, but is sufficiently complete for a diagnostic tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear description for shot_id (optional, defaults to last pulled shot). The tool description does not need to add more parameter detail; baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates shot metrics to provide ranked hypotheses and warnings, using specific references to C6, G2, G6, which distinguishes it from siblings like diagnose_preview.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly warns that the engine reads metrics and sensory tags, not free-text notes, and advises recording taste feedback as sensory_tags before diagnosing. This provides clear when-to-use and prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_dial_stateA

Read-only

Inspect

Retrieve current dialing state, active context, recent shots, recommendations, and locked recipes.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=false. Description adds no additional behavioral context beyond listing the components returned. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 12 words, front-loaded with the verb 'Retrieve'. Every word is informative, no filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and zero parameters, the description covers the purpose and output components adequately. Could optionally note that no arguments are needed, but that is implicit. Still sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters; schema coverage is 100%. Description adds context by enumerating the state components, which is helpful for an agent to understand what the tool returns.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'Retrieve' and specific resource 'current dialing state' plus listed associated data. Distinct from sibling tools like list_shots or list_recipes by combining multiple pieces of information into one call.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies use for obtaining a comprehensive snapshot of dial state, but does not explicitly state when to use this over alternatives like list_shots or get_stats. No exclusion criteria provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_kb_versionA

Read-only

Inspect

Retrieve the current version of the knowledge base.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description is consistent but adds no additional behavioral context beyond stating the action. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single, clear sentence with no unnecessary words. Perfectly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description covers the basic purpose, it does not explain what format the version takes or any other return value details. Without an output schema, the description could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so the schema coverage is 100%. Baseline for 0 parameters is 4; description does not need to add parameter detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool retrieves the current version of the knowledge base, using a specific verb and resource. It distinguishes from sibling tools as none others target version retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. While the tool is simple, the description does not provide any context for its appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_ruleA

Read-only

Inspect

Retrieve the detailed text of a specific rule from the knowledge base by its rule ID.

ParametersJSON Schema

Name	Required	Description	Default
`rule_id`	Yes	The ID of the rule to fetch, e.g. CAT.DARK

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=false. The description adds that it retrieves 'detailed text', which is consistent but does not disclose any additional behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no wasted words, efficiently conveying the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single parameter, no output schema), the description is complete. It specifies what is retrieved and how.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the parameter description 'The ID of the rule to fetch, e.g. CAT.DARK' already provides clear meaning. The description's mention of 'rule ID' adds no value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'retrieve', the resource 'detailed text of a specific rule', and the method 'by its rule ID'. It distinguishes from sibling tools like get_dial_state or get_kb_version by specifying 'rule'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when you have a rule ID, but does not explicitly state when to use this tool versus alternatives or provide exclusions. It provides no guidance on when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_statsB

Read-only

Inspect

Get compact flat usage stats for a grinder, machine, or bean.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	The entity ID
`scope`	Yes	The stats scope

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so safety is known. The description adds 'compact flat' but no further behavioral details like output structure or limitations. Minimal addition beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 12 words, front-loaded with the core purpose. No superfluous text, highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with annotations, the description covers the purpose and scope. Slightly ambiguous about what 'compact flat usage stats' means, but overall adequate given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both 'id' and 'scope'. The description merely restates the scope but adds no new semantic information beyond the schema, achieving the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'get' and the resource 'compact flat usage stats' for specific entities (grinder, machine, bean). It is specific but could better differentiate from sibling tools like 'list_grinders' that list entities rather than stats.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as 'list_*' tools or 'get_dial_state'. The description does not specify prerequisites or situations where this tool is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grinder_mathC

Read-only

Inspect

Determine steps/clicks difference and directions between setting labels (C3).

ParametersJSON Schema

Name	Required	Description	Default
`to_label`	Yes
`from_label`	Yes
`grinder_id`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. The description adds that it computes differences and directions, which is consistent. However, it does not disclose any additional behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no fluff. It is as concise as possible, though it sacrifices information density for brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, 3 required parameters, and no enums, the description should provide more context. It does not explain what the output looks like, what 'C3' means, or how to use the result. The tool is not fully documented for an autonomous agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate but only vaguely hints at 'from_label' and 'to_label'. It does not explain parameter types, valid values, or the role of 'grinder_id'. This is insufficient for an agent to invoke correctly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool determines steps/clicks difference and directions between setting labels, which is a specific verb and resource. The 'C3' reference is ambiguous but does not undermine overall clarity. It distinguishes from sibling tools as the only 'math' tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. Among many sibling tools, there is no mention of use cases or exclusions. The description does not help an agent decide when this tool is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kb_changelogA

Read-only

Inspect

Retrieve the changelog showing updates and version differences for the knowledge base.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=false, so safety is clear. The description adds that it shows 'updates and version differences', which is additional behavioral context, but does not elaborate on format, pagination, or limits. Some value added beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that is front-loaded and contains no extraneous words. Every part is useful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple no-parameter retrieval tool with no output schema, the description is mostly complete. It explains what is retrieved (changelog with updates and version differences). However, it could briefly mention the return format (e.g., list of changes) to be fully self-contained, especially given many sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has no parameters, so the schema coverage is 100% by default. The description provides meaning about what the changelog contains ('updates and version differences'), adding value beyond the empty schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Retrieve') and the resource ('changelog... for the knowledge base'), providing a specific verb and resource. It distinguishes itself from siblings like 'get_kb_version' which likely retrieves the current version, not history.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage (when you want changelog details) but does not provide explicit guidance on when to use this tool versus alternatives, nor any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_beansA

Read-only

Inspect

List coffee beans registered for the account, with status filter.

ParametersJSON Schema

Name	Required	Description	Default
`status`	No	Filter by status: current, archived, or all. Defaults to current.

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=false. The description adds the ability to filter by status, which is useful but does not disclose other behavioral aspects like pagination, ordering, or error conditions. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no wasted words. It is concise and directly conveys the purpose and key feature (status filter).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple listing tool with one optional parameter and no output schema, the description is nearly complete. It could mention that the default behavior is 'current' as indicated in the schema, but that is already in the schema. No critical missing information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with a single parameter 'status' that includes enums and a default description. The description's mention of 'status filter' adds minimal extra meaning beyond what the schema already provides, earning a baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list' and the resource 'coffee beans' with a specific scope 'registered for the account' and a filter option. It effectively distinguishes from sibling tools like list_shots or list_grinders which target different resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is used for listing beans with a status filter, but provides no explicit guidance on when to use it versus alternatives (e.g., search or get endpoints). No mention of prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_grindersA

Read-only

Inspect

List grinders registered for the account, with status filter.

ParametersJSON Schema

Name	Required	Description	Default
`status`	No	Filter by status: current, archived, or all. Defaults to current.

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=false. The description adds minimal behavioral context beyond the purpose, consistent with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with the action, no redundancy or unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one optional enum parameter and no output schema, the description covers the necessary purpose and parameter adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with the status parameter fully described. The description mentions the filter but adds no new semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists grinders and supports a status filter, distinguishing it from sibling tools like register_grinder or set_equipment_archived.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through the name and filter but lacks explicit guidance on when to use this tool over alternatives or when to avoid it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_machinesA

Read-only

Inspect

List machines registered for the account, with status filter.

ParametersJSON Schema

Name	Required	Description	Default
`status`	No	Filter by status: current, archived, or all. Defaults to current.

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. The description adds that it lists machines with a status filter, but does not disclose pagination, limits, sorting, or any side effects. With annotations covering safety, the description provides minimal additional behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence. It is front-loaded with the primary action and resource. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with one optional parameter and no output schema. While the description covers the basic functionality, it lacks details about return format, pagination, or possible error conditions. Given the low complexity, it is somewhat complete but could be improved.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description mentions 'with status filter,' which matches the schema's 'status' parameter, but adds no new semantics beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (List), resource (machines), and scope (registered for the account). It distinguishes from sibling tools like list_beans and list_grinders by naming the resource explicitly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives, no when-not-to-use conditions, and no prerequisites mentioned. The description simply states what it does without contextual usage advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_recipesC

Read-only

Inspect

List recipes (all or filtered by equipment and status).

ParametersJSON Schema

Name	Required	Description
`status`	No	Filter by status: current, archived, or all. Defaults to current.
`bean_id`	No
`grinder_id`	No
`machine_id`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adds minimal behavioral context. It does not mention default behavior (e.g., status defaults to 'current'), pagination, or ordering.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no filler. It communicates the core purpose efficiently, though it could include a bit more detail without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lacks information about return format, pagination, and defaults for equipment filters. Given the absence of an output schema and low schema coverage, the description is insufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description groups bean_id, grinder_id, and machine_id as 'equipment' but does not differentiate them or explain their use. With 25% schema coverage, the description should provide more detail for the undocumented parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the verb 'list' and the resource 'recipes', and indicates filtering by equipment and status. It clearly states the tool's function, though it could be more specific about which equipment parameters (bean_id, grinder_id, machine_id) are involved.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like list_beans or list_shots. Given multiple sibling list tools, explicit context for when to choose list_recipes would be helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_shotsA

Read-only

Inspect

List shot history with support for pagination and filtering. Every shot carries a derived taste_pending flag (1 = logged with no sensory tags, no rating, and no tasted flip — the tasting is still owed and can be backfilled with update_shot).

ParametersJSON Schema

Name	Required	Description
`limit`	No	Number of records to return (1-500)
`offset`	No	Offset for pagination
`filters`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=true) already declare read-only behavior. Description adds minimal context beyond listing, lacking details on sorting, default ordering, or return format. Does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with the main action. No redundant information; every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking output schema, description does not explain return format, sorting behavior, or how multiple filters combine. For a tool with nested filters and pagination, more behavioral context is needed for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is high (67%), with descriptions for limit, offset, and filter sub-fields. Description merely repeats 'pagination and filtering' without adding new meaning or clarifying parameter relationships.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List shot history', specifying the verb and resource. Among sibling tools, it distinguishes itself from list tools for other resources (list_beans, list_grinders) and from shot-specific create/delete tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for listing shots with pagination/filtering but provides no explicit guidance on when to use vs. alternatives (e.g., other list tools) or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_watersB

Read-only

Inspect

List registered waters for the account.

ParametersJSON Schema

Name	Required	Description	Default
`status`	No	Filter by status: current (unarchived), archived, or all. Defaults to current.

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description doesn't need to restate that. It adds no behavioral details beyond listing waters, but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that efficiently conveys the purpose. It is front-loaded and avoids unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one optional parameter and read-only annotations, the description is minimally adequate. However, it lacks context about what 'waters' are, the account scope, and return information, especially given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema fully describes the single parameter with enum and default. The description adds no additional parameter information, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list) and the resource (registered waters) for the account. It effectively communicates the tool's purpose, though it does not distinguish from sibling list tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like list_beans or list_shots. The description lacks explicit usage context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lock_recipeB

Idempotent

Inspect

Lock a successful shot as the reference dialing recipe for this equipment profile.

ParametersJSON Schema

Name	Required	Description	Default
`drink_intent`	Yes
`from_shot_id`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotentHint=true and destructiveHint=false. The description adds that it 'locks a successful shot' implying a state change, but does not explain what locking entails for future shots or if it overrides previous references. With annotations already covering idempotency and non-destructiveness, the description provides modest additional context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with a clear verb-object structure. No wasted words. Front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple mutation tool with 2 required params and no output schema, the description gives a basic purpose. However, it omits what happens to previous reference recipes, return behavior, or required conditions (e.g., shot must be successful). Adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% with no parameter descriptions. The description mentions 'successful shot' hinting at from_shot_id, but does not explain drink_intent (enum straight/milk) at all. The agent must infer its purpose from context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'lock' and the resource 'a successful shot as the reference dialing recipe for this equipment profile', distinguishing it from siblings like get_dial_state or suggest_next_step.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description does not mention prerequisites, such as having a successful shot, or when locking is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

log_recommendationCInspect

Log a recommended dialing change to allow contradiction/oscillation checking (G2).

ParametersJSON Schema

Name	Required	Description	Default
`lever`	Yes
`direction`	Yes
`rationale`	Yes
`confidence`	Yes
`cited_rules`	No
`based_on_shot`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description only says 'log', which implies a write operation, but does not disclose any behavioral traits such as whether it modifies state, requires authentication, or has side effects. Annotations are all 'false' and provide no safety hints, so the description carries the full burden but fails to reveal important behaviors.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is concise and front-loaded with the action and resource. However, it could be more informative without sacrificing brevity, and the cryptic 'G2' detracts from clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters, no output schema, and minimal annotations, the description is woefully incomplete. It does not explain what 'contradiction/oscillation checking' is, how the recommendation is formed, or what the agent should expect after logging. The agent lacks context to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 6 parameters (4 required, 3 with enums) but the description provides zero information about them. It does not explain what 'lever', 'direction', 'rationale', 'confidence', etc., mean or how they relate to the recommended dialing change. Schema description coverage is 0%, so the description adds no value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the action ('log') and the resource ('recommended dialing change'), and specifies the purpose ('contradiction/oscillation checking (G2)'). However, the reference to 'G2' is cryptic and not explained, and there is no differentiation from similar tools like 'log_shot' or 'suggest_next_step', though the unique name helps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not state prerequisites, when not to use it, or what other tools might be more appropriate. The agent must infer usage solely from the tool name and purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

log_shotAInspect

Log an espresso or alternative shot/brew attempt. Uses the active context (bean/grinder/machine/program) unless overridden via bean_id/grinder_id/machine_id/program_id — ALWAYS pass bean_id explicitly when the user names a specific coffee, so the shot cannot land on the wrong bag. Backdate with pulled_at when the shot happened earlier. When the user reports taste (bitter, sour, harsh, hollow…), include matching sensory_tags — the diagnosis engine reads tags and metrics, not free-text flavor_notes. When the user tasted the shot and reports it as GOOD/clean (no defects), pass tasted=1 with no sensory_tags — otherwise a tag-less, rating-less shot is filed taste-pending as if never tasted.

ParametersJSON Schema

Name	Required	Description
`tds`	No	Total Dissolved Solids percentage (e.g. 9.1 or 1.35)
`dose_g`	Yes
`rating`	No
`tasted`	No	1 = the shot was tasted at log time. The explicit way to record a CLEAN tasting: sensory tags all describe defects, so a good shot has none — without this flag (or a rating) it would be filed taste-pending. Tags or a rating also mark a shot as tasted; 0 (default) = taste later.
`time_s`	Yes
`bean_id`	No	Override: log against this bean instead of the active one (null = active)
`verdict`	No
`yield_g`	Yes
`water_id`	No	Optional references to waters formulation
`best_brew`	No	1 if marked as best brew, 0 otherwise
`favourite`	No	1 if favourite, 0 otherwise
`pulled_at`	No	When the shot was actually pulled (ISO 8601 or "YYYY-MM-DD HH:MM:SS"), for backdated entries. Defaults to now (null = now). When backdating, pass the returned shot_id to diagnose_shot explicitly — the default diagnosis target is the chronologically latest shot.
`beverage_g`	No	Yield mass excluding cup/vessel weight in grams
`grinder_id`	No	Override: grinder used, if not the active one (null = active)
`machine_id`	No	Override: machine used, if not the active one (null = active)
`program_id`	No	Override: machine program used, if not the active one (null = active)
`temp_taste`	No
`vessel_name`	No	Cup/vessel name
`bloom_time_s`	No	Pre-wetting or bloom duration in seconds
`drink_intent`	No
`flavor_notes`	No
`flow_profile`	No	Opaque flow profile data (JSON/TEXT)
`method_tools`	No	JSON representation of method tools used (e.g., paper filters, screen)
`observations`	No
`sensory_tags`	No
`finish_action`	Yes
`temperature_c`	No	Brew temperature in Celsius
`data_confidence`	No	measured = weighed/timed live; recalled = from memory; estimated = a best guess (e.g. missed the timer — "~25s")
`vessel_weight_g`	No	Cup/vessel weight in grams
`pressure_profile`	No	Pressure profile description or values
`first_drip_time_s`	No	Time to first drip in seconds

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate it's a write operation (readOnlyHint=false), but the description adds behavioral context: active context usage, the critical need for explicit bean_id to avoid logging to wrong bag, and that sensory_tags feed the diagnosis engine. Does not mention side effects or prerequisites, but adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph, front-loaded with the main action. Each sentence adds value with no fluff. It is concise yet comprehensive for the core behavior.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (30 params, 4 required, nested objects, no output schema), the description covers essential behavioral aspects and usage context. Missing return value or error information, but provides sufficient guidance for the agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 60%, so baseline is 3. The description compensates by reinforcing the importance of bean_id and clarifying that sensory_tags are used by the diagnosis engine while flavor_notes are just free text. Adds practical usage advice for key parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Log an espresso or alternative shot/brew attempt' using a specific verb and resource, and it distinguishes from sibling tools like update_shot by being the creation action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance: use active context unless overridden, ALWAYS pass bean_id explicitly, backdate with pulled_at, include sensory_tags for taste reports. Missing explicit alternatives to other tools like log_recommendation, but clearly indicates when overrides are needed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_coffeeCInspect

ParametersJSON Schema

Name	Required	Description
`ean`	No
`url`	No
`cost`	No
`name`	Yes
`state`	No
`origin`	No
`rating`	No
`co2e_kg`	No
`origins`	No	List of bean origin details
`qr_code`	No
`roaster`	No
`storage`	No
`variety`	No
`archived`	No	1 if archived, 0 otherwise
`bean_mix`	No	E.g. blend details
`buy_date`	No	Date in YYYY-MM-DD format
`currency`	No
`finished`	No	1 if finished, 0 otherwise
`aromatics`	No
`bag_notes`	No
`favourite`	No	1 if favourite, 0 otherwise
`frozen_at`	No	Datetime in YYYY-MM-DD HH:MM:SS format
`roast_date`	Yes	Date in YYYY-MM-DD format
`attachments`	No	JSON or comma-separated list of attachments
`frozen_note`	No
`opened_date`	No
`roast_level`	Yes
`roast_range`	No
`unfrozen_at`	No	Datetime in YYYY-MM-DD HH:MM:SS format
`bag_weight_g`	No	Bag weight in grams
`best_by_date`	No	Date in YYYY-MM-DD format
`process_type`	Yes
`roast_custom`	No
`decaffeinated`	No	1 if decaf, 0 if regular
`dial_category`	No	Dialing behavior class: classic = medium/traditional espresso roasts (chocolate/nut, balanced); dark = genuinely dark/roasty; the light categories cover Nordic and ultra-light styles.
`roasting_type`	No
`cupping_points`	No
`frozen_storage`	No
`rest_window_days_max`	No
`rest_window_days_min`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a write operation (readOnlyHint: false) but not destructive. The description does not disclose behavioral traits beyond 'register', such as whether duplicates are checked, required fields, or side effects. It adds minimal value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (one sentence) and front-loaded, but it is too brief to be fully useful. While conciseness is valued, the minimal content reduces its effectiveness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (40 parameters, no output schema), the description is grossly incomplete. It lacks information about return values, ordering of required fields, and overall behavior when registering a bean.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description provides no information about any of the 40 parameters. With schema description coverage at only 35%, the description fails to compensate by explaining key parameters like 'name', 'roast_date', 'roast_level', or 'process_type'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('register') and the object ('a new bag of coffee beans'), distinguishing it from sibling tools like register_grinder or register_machine. However, it could be more specific about what registering entails, such as 'in the inventory'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool versus alternatives like update_bean or list_beans. The description does not mention prerequisites, use cases, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_grinderBInspect

ParametersJSON Schema

Name	Required	Description
`name`	Yes	The name/model of the grinder, e.g. Kinu M47
`notes`	No	Additional notes
`photo`	No	Photo path or URL
`archived`	No	1 if archived, 0 if active
`burr_type`	Yes
`max_value`	No	Optional travel max bounds
`min_value`	No	Optional travel min bounds
`motor_type`	Yes
`nominal_step`	Yes	Smallest adjustment increment (e.g. 1 click or 0.1collar units)
`setting_scheme`	Yes	Collar mark style: single clicks or compound (rotation.number.clicks)
`components_spec`	No	JSON mapping string representing compound adjustment math
`microns_per_step`	No	Optional mechanical burr travel microns per nominal step
`direction_convention`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are neutral but description fails to disclose side effects, uniqueness constraints, or required permissions typical for a creation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence is concise and front-loaded with the action, though could include key details without becoming bloated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, and description omits expected return value, error conditions, or which of 13 parameters are critical, leaving agent underinformed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers most parameters with descriptions (77% coverage), and the description adds little beyond 'with its dial settings scheme'; baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool registers a grinder with its dial settings scheme, distinguishing it from listing or math tools among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like list_grinders or update functions; no mention of prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_machineCInspect

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Machine name/model, e.g. Dedica EC685
`notes`	No
`photo`	No	Photo path or URL
`tools`	No	JSON or comma-separated tools, e.g., paper filters, flow control, metal mesh
`archived`	No	1 if archived, 0 if active
`prep_type`	No	Preparation equipment type, e.g., espresso_machine, v60_dripper, French_press
`prep_style`	No	Preparation style category, e.g., espresso, filter, immersion
`boiler_type`	No
`control_type`	Yes
`connected_device`	No	Metadata mapping to smart hardware APIs
`basket_size_grams`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description only says 'Register', implying a write operation, but adds no behavioral details beyond the annotations. Annotations indicate it is not read-only, not idempotent, and not destructive, but the description does not elaborate, e.g., whether it creates a new record, updates an existing one, or requires authentication.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence, which is concise but at the expense of essential details. It front-loads the purpose but omits usage, parameters, or behavior, making it insufficiently informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (11 parameters, no output schema), the description is incomplete. It does not explain the required fields (name, control_type), the meaning of 'alternative preparation machine', or any side effects. The agent lacks necessary context for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 64%, the description adds no explanation of the 11 parameters. Users must rely solely on the schema, which lacks descriptions for some parameters (e.g., notes, boiler_type, basket_size_grams). The description should have clarified critical parameters like prep_type or control_type.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool registers a machine and specifies 'espresso or alternative preparation machine', distinguishing it from other registration tools like register_grinder or register_coffee. However, it could be more specific about the types of machines covered.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives such as register_grinder or register_coffee. The description does not mention any prerequisites, contraindications, or preferred scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_programAInspect

Register a programmed shot button on a machine. A program cuts the shot off either by weight (program_type "volumetric", set target_volume_g) or by time (program_type "timed", set target_time_s). One machine can have a mix of both.

ParametersJSON Schema

Name	Required	Description
`machine_id`	Yes
`description`	No
`program_type`	No	Cutoff method: "volumetric" (weight) or "timed" (duration). Defaults to volumetric.
`target_time_s`	No	Target shot duration cutoff in seconds (required for timed programs)
`volume_source`	No	Confidence in the stored cutoff value, for either type
`program_number`	Yes	Program number / button index (e.g. 1 or 2)
`target_volume_g`	No	Target yield cutoff in grams (required for volumetric programs)

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate the tool is not readOnly and not destructive. The description adds context about program types and parameter dependencies but does not disclose behavior like whether registering an existing program_number overwrites or errors. More detail on side effects would improve transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the purpose and key distinctions. Every sentence is concise and meaningful, with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the core logic of program types and their parameters, which is sufficient given the schema covers individual parameters. It could mention that program_number is a button index, but the schema already provides that. No output schema exists, so return values are not required.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 71%, and the description adds value by clarifying the default program_type ('volumetric') and the conditional requirement of target_volume_g or target_time_s. Parameters like volume_source are not elaborated, but the schema already describes them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool registers a programmed shot button on a machine, distinguishing between volumetric and timed program types. The verb 'register' is specific, and the mention of mixing both types on one machine differentiates from sibling tools like delete_shot or log_shot.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use the tool and which parameters to set based on program_type (target_volume_g or target_time_s). However, it does not mention prerequisites (e.g., machine must exist) or explicitly state when not to use it, though the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_waterBInspect

ParametersJSON Schema

Name	Required	Description
`gh`	No	General hardness (GH) in ppm
`kh`	No	Karbonate hardness (KH) in ppm
`tds`	No	TDS in ppm
`name`	Yes	Water formulation name, e.g. Lotus Light & Bright, Third Wave Water
`type`	No	Type, e.g., mineralized, tap, distilled, RO
`notes`	No	Additional notes
`sodium`	No	Sodium concentration in ppm
`calcium`	No	Calcium concentration in ppm
`magnesium`	No	Magnesium concentration in ppm

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations show non-readonly and non-destructive, but the description adds no behavioral details (e.g., whether duplicate names are allowed, if it overwrites, or what happens on success). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words. Could be slightly expanded without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters and no output schema, the description is minimal. Does not explain return value, validation rules, or behavior for duplicates. Incomplete for a registration tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds nothing beyond the schema; it does not elaborate on parameter meaning or usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'register' and the resource 'water formulation', distinguishing it from siblings like register_coffee or register_grinder. It covers both custom and commercial types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., list_waters to view existing waters). No exclusion criteria or context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_activeC

Idempotent

Inspect

Set active dial profile context.

ParametersJSON Schema

Name	Required	Description
`bean_id`	No	null = leave unchanged
`grinder_id`	No	null = leave unchanged
`machine_id`	No	null = leave unchanged
`program_id`	No	null = leave unchanged

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotency (idempotentHint=true) and non-destructiveness (destructiveHint=false). Description verb 'Set' aligns with readOnlyHint=false but adds no new behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence is concise but under-specified for a tool with 4 parameters and many siblings. Sacrifices completeness for brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 optional parameters, no output schema, and a vague description, the tool is incomplete. Agent cannot determine how to invoke it correctly or what 'dial profile context' entails.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% and description does not explain any of the 4 parameters (bean_id, grinder_id, machine_id, program_id). Their purpose and usage are completely unspecified.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states verb 'Set' and resource 'active dial profile context,' but the term is vague and doesn't distinguish from sibling tools like 'set_equipment_archived' or 'set_grinder_position.'

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. No exclusions, prerequisites, or examples provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_equipment_archivedB

Idempotent

Inspect

Archive or restore a grinder, machine, or water.

ParametersJSON Schema

Name	Required	Description
`id`	Yes	The ID of the equipment
`kind`	Yes	The kind of equipment to archive/restore
`archived`	Yes	1 to archive, 0 to restore

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the archive/restore behavior, and annotations already provide idempotentHint=true. It adds minimal new information beyond annotations, so a 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with no wasted words. It is appropriately sized for a simple tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 3 parameters and no output schema, the description adequately covers the action and object types. It does not explain return values, but that is acceptable given the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema documents all parameters fully. The description does not add extra meaning beyond the schema, baseline 3 is correct.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (archive or restore) and the resource types (grinder, machine, or water). However, it could be more precise by explicitly stating it sets the archived flag. It distinguishes from sibling tools like set_active or set_grinder_position.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool, when not, or what alternatives exist. Given the sibling tools include list_grinders and register_grinder, such context would be helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_grinder_positionA

Idempotent

Inspect

Set the current grinder collar position. source="measured" resets verification freshness.

ParametersJSON Schema

Name	Required	Description
`source`	Yes	Whether setting is verified ("measured") or guess ("recalled"/"assumed")
`grinder_id`	Yes
`setting_label`	Yes	Verbatim display label, e.g. "1.1.2"

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide idempotentHint=true and destructiveHint=false; the description adds context about source='measured' resetting verification freshness, which is useful behavioral detail beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with main purpose, no redundancy. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple set operation with no output schema, the description covers core behavior but omits return values, prerequisites (e.g., grinder existence), and explanation of 'verification freshness'. Adequate but leaves some gaps for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 67%; two parameters have descriptions in schema. The description does not add new meaning for grinder_id (missing schema description) beyond stating the role of source='measured'. Adequate but does not fully compensate for the gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool sets the grinder collar position, adding a specific behavioral note about source='measured' resetting verification freshness. It distinguishes from siblings as no other tool sets this position.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for setting grinder position but provides no explicit guidance on when to use vs alternatives or prerequisites. Siblings include get_dial_state for reading, but no exclusion criteria are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

suggest_next_stepB

Read-only

Inspect

Retrieve the single next experiment step from the reasoning engine.

ParametersJSON Schema

Name	Required	Description	Default
`bean_id`	Yes

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=false. Description aligns with 'Retrieve' but adds no behavioral details beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, concise sentence that front-loads the purpose. No extraneous words or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single parameter with no documentation and lack of output schema, the description should provide more context on parameter usage, return behavior, and when to call this function. It fails to inform the agent adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% coverage of parameter descriptions. The description does not explain what 'bean_id' means or its role, leaving the agent to infer from the name alone. This is insufficient for unambiguous invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'Retrieve', the specific resource 'single next experiment step', and the source 'reasoning engine'. It distinguishes itself from sibling tools like diagnose_shot or get_stats by focusing on experiment step retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus siblings. The description does not indicate prerequisites, scenarios, or exclusions. A single sentence with no usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_beanC

Idempotent

Inspect

Update coffee bag fields with structured reasoning (G5, C10).

ParametersJSON Schema

Name	Required	Description
`fields`	Yes
`reason`	Yes	A clear justification why this category/roast level is updated.
`bean_id`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotentHint=true, readOnlyHint=false, destructiveHint=false. The description adds no further behavioral context beyond 'update', which is consistent. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, efficient but lacks structure. The parenthetical is not self-explanatory, reducing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With a complex nested 'fields' parameter and no output schema, the description fails to explain the update effect, default values, or return value. Incomplete for safe usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not explain any parameters. Schema coverage is 33% meaning only 1 of 3 top-level params has a description in schema, but the tool description adds no value for 'bean_id', 'fields', or 'reason'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it updates coffee bag fields, distinguishing it from create tools like register_coffee. The parenthetical '(G5, C10)' is cryptic but does not obscure the overall purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use vs alternatives like update_shot or register_coffee. No exclusions or prerequisites mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_shotA

Idempotent

Inspect

Correct fields on an already-logged shot in place — no need to delete and re-log. Use for fixing a wrong dose/yield/time or grind label, re-filing a shot onto the right bean (bean_id), backfilling rating/tasting notes, or fixing the timestamp (pulled_at). Changing grind_label re-derives the numeric grind position from the shot's grinder; changing yield/time/dose/tds keeps flow rate and extraction yield consistent automatically.

ParametersJSON Schema

Name	Required	Description	Default
`fields`	Yes	Only the fields to change
`shot_id`	Yes	ID of the shot to correct

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotentHint=true and destructiveHint=false. The description adds behavioral details beyond annotations: changing grind_label re-derives numeric grind position, and changing yield/time/dose/tds keeps flow rate and extraction yield consistent automatically. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, starting with the core purpose, then listing use cases, then explaining side effects. Every sentence adds value with no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description covers purpose and behavior well, it lacks information about the return value after update. Since there is no output schema, the agent would benefit from knowing what the tool returns (e.g., updated shot object). The description is otherwise complete for the use case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds meaning by categorizing use cases (fixing dose, re-filing, backfilling) and explaining the automatic effects of modifying specific fields like grind_label and yield/time/dose/tds, which are not in the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Correct fields on an already-logged shot in place' which clearly specifies the verb 'correct' (update) and the resource 'shot'. It distinguishes from the alternative of delete and re-log, which is a sibling tool, so the purpose is specific and not ambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists explicit use cases: fixing wrong dose/yield/time/grind label, re-filing onto right bean, backfilling rating/tasting notes, fixing timestamp. It directly states 'no need to delete and re-log' indicating when not to use alternatives. It also explains automatic re-derivation and consistency behaviors, providing clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?