WebSlop

Server Details

AI first app deployment, unlike lovable or figma make, webslop.ai lets you or your ai of choice setup node.js apps or static sites in seconds. Designed be be the perfect place for you to deploy websites and apps super fast to the rest of the world and has a generous free tier.

Status: Healthy
Last Tested: 2026-07-05 03:32
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.7/5.0

Tool DescriptionsA

Average 3.9/5 across 37 of 37 tools scored. Lowest: 2.9/5.

Server CoherenceA

Disambiguation4/5

Most tools have distinct purposes, but some overlap exists: get_app, get_app_status, and get_app_metrics all provide app status information, which could cause confusion. However, their descriptions clarify specific focuses (detailed info, Docker service status, and container metrics respectively), helping agents differentiate.

Naming Consistency5/5

Tool names follow a highly consistent snake_case verb_noun pattern throughout, such as create_app, delete_app, get_app, list_apps, and set_git_remote. This predictability makes the tool set easy to navigate and understand.

Tool Count3/5

With 37 tools, the count is borderline high for a deployment platform, potentially overwhelming. While many tools are justified for comprehensive app management, it may feel heavy compared to typical MCP servers, risking complexity for agents.

Completeness5/5

The tool set provides complete coverage for app deployment and management: full CRUD for apps and files, Git integration, package management, version control, logging, metrics, and authentication. No obvious gaps exist, enabling agents to handle end-to-end workflows without dead ends.

Available Tools

37 tools

check_loginAInspect

Check if the device login code has been approved. Call this after the user enters the code on the website. Returns the login status.

ParametersJSON Schema

Name	Required	Description	Default
`code`	Yes	The 6-digit device code from login

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool returns login status, which is useful behavioral context. However, it lacks details on error handling, rate limits, authentication needs, or what specific status values mean. For a tool with no annotations, this is adequate but has gaps in behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose, followed by usage timing and return value. Every sentence earns its place by adding value: the first states what it does, the second when to use it and what it returns. No wasted words, appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is reasonably complete. It covers purpose, usage timing, and return value. However, without annotations or output schema, it could benefit from more detail on status values or error cases, but it's sufficient for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the 'code' parameter documented as 'The 6-digit device code from login.' The description adds no additional parameter semantics beyond what the schema provides. According to rules, with high schema coverage (>80%), the baseline is 3 even with no param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Check if the device login code has been approved' specifies the verb (check) and resource (login status). It distinguishes from siblings like 'login' (which likely initiates login) by focusing on verification after code entry. However, it doesn't explicitly contrast with all siblings, so it's not a perfect 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use it: 'Call this after the user enters the code on the website.' This gives a specific timing guideline. It doesn't explicitly state when not to use it or name alternatives, but the context is sufficient for most cases, missing only exclusions or direct sibling comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_appBInspect

Create a new Bonto app. It starts running immediately and is accessible at {subdomain}.bonto.run.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	Display name for the app
`subdomain`	Yes	Unique subdomain (3-30 chars, lowercase alphanumeric + hyphens, cannot start/end with hyphen)

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds useful context beyond the schema: the app starts running immediately and is accessible at {subdomain}.bonto.run. However, it doesn't cover important behavioral aspects like permissions needed, whether creation is reversible, error conditions, or what happens if the subdomain is taken. For a creation tool with zero annotation coverage, this leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that states the core action and two key behavioral outcomes. Every word earns its place - there's no redundancy or unnecessary elaboration. It's appropriately sized for a creation tool with two parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a creation tool with no annotations and no output schema, the description should do more to compensate. While it covers the basic action and some behavioral outcomes (immediate start, accessibility), it doesn't address what the tool returns, error conditions, or important constraints. The 100% schema coverage helps, but for a mutation tool, more contextual information would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters ('name' and 'subdomain') well-documented in the schema. The description doesn't add any parameter-specific information beyond what's in the schema. According to guidelines, when schema coverage is high (>80%), the baseline is 3 even with no param info in the description, which applies here.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a new Bonto app') and the resource ('Bonto app'), making the purpose immediately understandable. It distinguishes from siblings like 'update_app' or 'list_apps' by specifying creation. However, it doesn't explicitly differentiate from 'get_app' or 'delete_app' beyond the verb, which keeps it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'update_app' or 'list_apps'. It mentions the app starts running immediately and is accessible at a subdomain, but this is behavioral information rather than usage context. There are no explicit when/when-not instructions or prerequisites stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_directoryAInspect

Create a directory in an app. Parent directories are created automatically.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`path`	Yes	Directory path to create (e.g. "src/components")

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It adds value by revealing that 'Parent directories are created automatically,' which is a key behavioral trait not inferable from the schema alone. However, it lacks details on permissions required, error conditions (e.g., invalid paths), or what happens if the directory already exists. For a mutation tool with zero annotation coverage, this leaves gaps in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose and followed by a critical behavioral detail. Every sentence earns its place by adding value: the first defines the tool's action, and the second clarifies automatic parent creation. There is no wasted text, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a mutation operation with 2 parameters), lack of annotations, and no output schema, the description is moderately complete. It covers the basic purpose and a key behavioral trait but omits details like return values, error handling, or security implications. For a create operation in a system with many sibling tools, more context would be beneficial to ensure safe and correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear descriptions for both parameters ('app' as 'App ID (UUID) or subdomain' and 'path' as 'Directory path to create'). The description adds no additional parameter semantics beyond what the schema provides, such as format examples or constraints. With high schema coverage, the baseline score of 3 is appropriate, as the description doesn't compensate but also doesn't detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a directory') and the target ('in an app'), which is specific and unambiguous. It distinguishes this from sibling tools like 'create_app' (which creates an app rather than a directory) and 'list_files' (which lists rather than creates). However, it doesn't explicitly differentiate from 'write_file' (which might create files) or 'upload_file' (which uploads files), so it's not fully sibling-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by stating 'Parent directories are created automatically,' which suggests this tool handles nested directory creation without requiring separate calls. However, it doesn't provide explicit guidance on when to use this versus alternatives like 'write_file' (for files) or 'create_app' (for apps), nor does it mention prerequisites (e.g., needing an existing app). The guidance is useful but incomplete.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_appAInspect

DESTRUCTIVE: Permanently delete an app, its Docker service, volume, and all data including version history. This cannot be undone. You MUST confirm with the user before calling this tool.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and excels by disclosing critical behavioral traits: it's destructive ('DESTRUCTIVE: Permanently delete'), irreversible ('This cannot be undone'), and has a prerequisite action ('You MUST confirm with the user'). This covers safety, permanence, and user interaction requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the key warning ('DESTRUCTIVE'), followed by specifics of what gets deleted, and ends with the mandatory confirmation requirement. Every sentence earns its place with no wasted words, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with no annotations or output schema, the description is complete: it explains the action, scope, irreversibility, and user confirmation requirement. Given the high-stakes nature, it provides all necessary context for safe and correct usage without needing to detail return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for its single parameter ('app'), so the baseline is 3. The description adds value by implicitly clarifying that 'app' refers to a deletable entity with associated resources (Docker service, volume, data), enhancing understanding beyond the schema's technical definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('permanently delete') and resource ('an app, its Docker service, volume, and all data including version history'), distinguishing it from sibling tools like 'remove_file' or 'restore_version' which handle different resources or less destructive actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit guidance on when to use this tool ('You MUST confirm with the user before calling this tool') and implies alternatives through sibling tools like 'stop_app' or 'restore_version' for less destructive actions, though it doesn't name specific alternatives directly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_fileAInspect

DESTRUCTIVE: Delete a file or directory in an app. Directories are deleted recursively. Cannot delete package.json or node_modules. You MUST confirm with the user before deleting directories.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`path`	Yes	File or directory path to delete

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing destructive nature, recursive deletion behavior, restrictions on specific files/directories, and user confirmation requirement. It doesn't mention error conditions or response format, keeping it from a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with zero waste: first states purpose and behavior, second adds restrictions, third provides critical usage guideline. Each sentence earns its place and is front-loaded with the most important information ('DESTRUCTIVE').

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with no annotations and no output schema, the description provides strong behavioral context and usage constraints. It could mention what happens on success/failure or return values, but covers the essential safety aspects well given the complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters adequately. The description doesn't add meaning beyond what the schema provides about 'app' and 'path' parameters, maintaining the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Delete' and the resource 'a file or directory in an app', specifying it's destructive and includes recursive deletion for directories. It distinguishes from siblings like 'delete_app' by focusing on files/directories within apps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when NOT to use this tool ('Cannot delete package.json or node_modules') and provides a mandatory prerequisite ('You MUST confirm with the user before deleting directories'). It also implies alternatives like 'delete_app' for app-level deletion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_appBInspect

Get detailed information about a specific app including its Docker service status.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'Docker service status' as part of the returned information, which adds some behavioral context beyond a simple read. However, it lacks details on permissions needed, error conditions, rate limits, or what 'detailed information' entails beyond Docker status, making it insufficient for a mutation-free but context-rich tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('Get detailed information about a specific app') and adds a specific detail ('including its Docker service status'). There is no wasted verbiage, and it's appropriately sized for a simple lookup tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter with full schema coverage and no output schema, the description is minimally adequate. It specifies the tool returns 'detailed information' including 'Docker service status', which helps set expectations. However, without annotations or output schema, it lacks details on response format, error handling, or completeness of information, leaving gaps for a tool in a complex app management context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'app' documented as 'App ID (UUID) or subdomain'. The description adds no additional parameter semantics beyond what the schema provides, such as examples or constraints on the 'app' value. Baseline 3 is appropriate since the schema fully covers the parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and resource 'detailed information about a specific app', including the specific detail 'Docker service status'. It distinguishes from siblings like 'get_app_status' by specifying it provides detailed information, though it doesn't explicitly contrast with 'list_apps' or 'get_app_metrics'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_app_status' (which might provide only status), 'get_app_metrics' (which might provide metrics), or 'list_apps' (which lists multiple apps). There's no mention of prerequisites or exclusions, leaving usage context implied at best.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_app_metricsBInspect

Get CPU, memory, and network stats for a running app container.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states what data is retrieved but doesn't mention permissions needed, rate limits, whether it's real-time or historical, or the format of returned stats. For a metrics tool with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('Get CPU, memory, and network stats') and specifies the target ('for a running app container'). There is no wasted verbiage, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a tool that retrieves metrics. It lacks details on return format (e.g., numeric values, timestamps), error conditions, or behavioral aspects like polling intervals. For a monitoring tool with rich data output, this leaves critical gaps in understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'app' documented as 'App ID (UUID) or subdomain.' The description adds no additional parameter details beyond what the schema provides, such as examples or constraints, so it meets the baseline for high schema coverage without compensating further.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get') and resources ('CPU, memory, and network stats for a running app container'), distinguishing it from siblings like get_app (general app info) or get_logs (log data). It precisely identifies what metrics are retrieved and the target resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by specifying 'for a running app container,' suggesting it's for monitoring active apps, but doesn't explicitly state when to use this vs. alternatives like get_app_status or get_logs. No exclusions or prerequisites are mentioned, leaving some ambiguity about context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_app_statusAInspect

Get the real-time Docker service status for an app: replica count, whether it is running, and health check status.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses the tool's behavior as a read operation that returns real-time status, but doesn't mention potential limitations like rate limits, authentication requirements, or error conditions. The description doesn't contradict annotations (none exist), but provides only basic behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence efficiently conveys purpose, resource, and specific return values with zero wasted words. The description is appropriately sized and front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only status check tool with 100% schema coverage but no output schema or annotations, the description provides adequate context about what information is returned. However, it lacks details about response format, error handling, or operational constraints that would be helpful for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with the single parameter 'app' fully documented. The description doesn't add any parameter-specific information beyond what's in the schema, maintaining the baseline score of 3 for high schema coverage situations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get'), resource ('real-time Docker service status for an app'), and specific output details ('replica count, whether it is running, and health check status'). It distinguishes from siblings like get_app (general app info) or get_app_metrics (performance metrics) by focusing specifically on operational status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing operational status information, but doesn't explicitly state when to use this tool versus alternatives like get_app (for general app details) or get_app_metrics (for performance data). No guidance on prerequisites or when-not-to-use scenarios is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_install_statusAInspect

Check the status of an ongoing npm install operation. Returns "idle" (no install running), "installing" (in progress), "done" (success), or "error" (failed with error message). Poll this after calling install_packages or remove_package.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior by listing all possible return states ('idle', 'installing', 'done', 'error') and explaining what each means, including error handling. It doesn't mention rate limits or authentication needs, but covers the core operational behavior well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first explains the tool's purpose and return values, the second provides usage guidance. Every phrase adds value with zero wasted words, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple status-checking tool with one parameter and no output schema, the description is nearly complete. It explains what the tool does, when to use it, and what to expect in return. The only minor gap is lack of explicit mention of polling frequency or timeout behavior, but it's sufficient for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'app' documented as 'App ID (UUID) or subdomain'. The description adds no additional parameter information beyond what the schema provides, so it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Check the status') and resource ('ongoing npm install operation'), distinguishing it from sibling tools like get_app_status or get_logs. It explicitly identifies what it monitors versus other status-checking tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Poll this after calling install_packages or remove_package'), including specific alternative triggers. It clearly defines the operational context without ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_logsBInspect

Get console output (stdout + stderr) for an app. Useful for debugging runtime errors.

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`head`	No	Return only the first N lines of the fetched logs
`tail`	No	Number of recent log lines to return (default: 100, max: 1000)
`since`	No	Only return logs after this timestamp (ISO 8601 format)
`end_line`	No	End at this line number (1-indexed, inclusive). Use with start_line for a range.
`start_line`	No	Start from this line number (1-indexed). Use with end_line for a range.

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the tool is 'useful for debugging runtime errors,' which hints at read-only behavior, but doesn't explicitly state whether it's safe, requires permissions, has rate limits, or what the output format looks like (e.g., text, JSON). This leaves significant gaps for a tool with multiple parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and front-loaded: two sentences that directly state the purpose and usage context without any wasted words. Every sentence earns its place by providing essential information efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (6 parameters, no output schema, no annotations), the description is minimally adequate. It covers the basic purpose and usage hint but lacks details on behavioral traits, output format, and differentiation from siblings. With no output schema, the agent must infer return values, making the description incomplete for full context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all six parameters thoroughly. The description doesn't add any parameter-specific details beyond what's in the schema, such as explaining interactions between parameters (e.g., 'head' vs. 'tail'). Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get console output (stdout + stderr) for an app.' It specifies the resource (app logs) and the action (get). However, it doesn't explicitly differentiate from sibling tools like 'get_app_status' or 'get_app_metrics' that also retrieve app-related information, which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage guidance with 'Useful for debugging runtime errors,' suggesting when this tool is appropriate. However, it lacks explicit guidance on when to use this tool versus alternatives like 'get_app_status' for health checks or 'get_app_metrics' for performance data, and doesn't mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_version_file_contentAInspect

Read the content of a specific file at a past version of an app. Use list_versions to get a commit hash and get_version_files to browse available files.

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`hash`	Yes	Commit hash (full or abbreviated, from bonto_list_versions)
`path`	Yes	File path within the commit (e.g. "index.js", "src/app.ts")

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It indicates this is a read operation ('Read the content') but doesn't mention permissions, rate limits, error conditions, or return format. The workflow guidance is helpful but doesn't fully compensate for missing behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each serve distinct purposes: first states the core functionality, second provides essential workflow guidance. No wasted words, front-loaded with the primary action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read operation with 3 parameters and no output schema, the description provides good workflow context but lacks details about return values, error handling, or behavioral constraints. With no annotations and no output schema, more completeness would be beneficial for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds no parameter-specific information beyond what's in the schema. The baseline score of 3 reflects adequate but not enhanced parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Read the content') and resource ('a specific file at a past version of an app'), distinguishing it from sibling tools like read_file (current version) or get_version_files (list files at version). It provides a complete, unambiguous purpose statement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides when-to-use guidance by naming two prerequisite tools (list_versions to get commit hash, get_version_files to browse files). This clearly establishes the workflow context and distinguishes it from alternatives like read_file for current versions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_version_filesAInspect

List the files and directories at a specific version (commit) of an app. Use list_versions to get a valid commit hash.

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`hash`	Yes	Commit hash (full or abbreviated, from bonto_list_versions)
`path`	No	Directory path within the commit to list (default: root)

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses the tool's read-only nature (listing) and the prerequisite of obtaining a commit hash from another tool, but lacks details on output format, pagination, error conditions, or rate limits. Adequate but with clear gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste: first states purpose, second provides essential usage guidance. Front-loaded and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No annotations or output schema exist, and the description covers purpose and basic usage but lacks behavioral details like output format or error handling. For a read-only tool with 3 parameters, it's minimally adequate but incomplete for full agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters. The description adds no additional parameter semantics beyond implying the 'hash' parameter comes from 'list_versions', which is already hinted in the schema. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('files and directories') with specific scope ('at a specific version (commit) of an app'). It distinguishes from sibling tools like 'list_files' (current files) and 'get_version_file_content' (file content at version).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool ('at a specific version') and provides a clear alternative ('Use list_versions to get a valid commit hash'), distinguishing it from 'list_files' which presumably lists current files. No misleading guidance is present.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

git_infoAInspect

Get the Git clone URL and authentication instructions for an app. Use this to help users set up local development with Git push-to-deploy.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the tool's behavior as a read operation ('Get'), which is useful, but lacks details on permissions, rate limits, or error handling. It adds some context about the output's purpose (setup instructions), but more behavioral traits would improve transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence and adds usage context in the second. Both sentences earn their place by providing essential information without redundancy, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, 100% schema coverage, no output schema), the description is reasonably complete. It covers purpose and usage well, but without annotations or output schema, it could benefit from more details on return values or error cases to be fully comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents the single parameter 'app' as 'App ID (UUID) or subdomain'. The description does not add any parameter-specific information beyond this, resulting in a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Get the Git clone URL and authentication instructions') and resource ('for an app'), and distinguishes it from siblings like 'git_pull', 'git_push', and 'set_git_remote' by focusing on setup information rather than operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly states when to use this tool: 'to help users set up local development with Git push-to-deploy.' This provides clear context and distinguishes it from alternatives like 'get_app' (general app info) or 'set_git_remote' (configuring remotes).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

git_pullAInspect

Pull the latest changes from the upstream Git remote into the app. Requires a remote to be configured first via set_git_remote.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the prerequisite (remote configuration) and implies a mutation (pulling changes), which is helpful. However, it lacks details on behavioral traits like what happens on conflicts, whether it's idempotent, or if it requires specific permissions. This is adequate but has clear gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core action and followed by a prerequisite. Every sentence earns its place: the first defines the purpose, and the second provides critical usage context. There is zero waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a Git operation with potential side effects), no annotations, and no output schema, the description is minimally adequate. It covers the purpose and prerequisite but lacks details on behavior, error handling, or return values. This leaves gaps for an agent to use it correctly in all scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the 'app' parameter documented as 'App ID (UUID) or subdomain'. The description adds no additional parameter semantics beyond what the schema provides. According to the rules, with high schema coverage (>80%), the baseline is 3 even with no param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Pull the latest changes') and target ('from the upstream Git remote into the app'), which is specific and actionable. It distinguishes from sibling tools like git_push (which pushes changes) and git_info (which provides information). However, it doesn't explicitly mention what 'pull' entails (fetching and merging), which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: after a remote is configured via set_git_remote. This is explicit guidance on prerequisites. However, it doesn't mention when not to use it (e.g., if there are uncommitted changes) or alternatives like git_info for checking remote status, so it falls short of a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

git_pushBInspect

Push the app's commits to the upstream Git remote. Requires a remote with a Personal Access Token configured.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds value by specifying the requirement for a configured remote and Personal Access Token, which are crucial for authentication. However, it lacks details on potential side effects (e.g., overwriting remote changes), error handling, or rate limits, leaving gaps in behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences that are front-loaded and efficient, stating the action and a key requirement without unnecessary details. Every sentence earns its place by providing essential information, though it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a mutation operation with no annotations and no output schema), the description is somewhat complete but has gaps. It covers the purpose and a prerequisite, but lacks details on return values, error cases, or interactions with sibling tools, making it adequate but not fully comprehensive for safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the single parameter 'app' documented as 'App ID (UUID) or subdomain'. The description doesn't add any additional meaning beyond this, such as examples or constraints, so it meets the baseline for adequate but not enhanced parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Push') and resource ('app's commits to the upstream Git remote'), making the purpose specific and understandable. However, it doesn't explicitly differentiate from sibling tools like 'git_pull' or 'set_git_remote', which would require mentioning contrasting actions or scopes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by stating 'Requires a remote with a Personal Access Token configured', which provides some context about prerequisites. However, it doesn't explicitly say when to use this tool versus alternatives like 'git_pull' for fetching changes or 'set_git_remote' for configuring remotes, leaving the guidance incomplete.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

install_packagesAInspect

Install one or more npm packages in an app. Updates package.json and runs npm install inside the container. Use get_install_status to poll for completion.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`packages`	Yes	Array of packages to install

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and discloses key behavioral traits: it updates package.json, runs npm install inside a container, and operates asynchronously (requires polling). It doesn't mention side effects like app restarts or error handling, but covers core behavior well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. First sentence states purpose and scope, second provides critical usage guidance. Every element earns its place and is front-loaded appropriately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides good context about the asynchronous nature and file modifications. It could mention potential side effects (e.g., app downtime) or error scenarios, but covers the essential workflow adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing complete parameter documentation. The description adds no additional parameter semantics beyond what's in the schema, so baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Install'), target resource ('npm packages in an app'), and scope ('one or more'). It distinguishes from siblings like 'list_packages' (read-only) and 'remove_package' (uninstall).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use an alternative tool: 'Use get_install_status to poll for completion.' This provides clear guidance on workflow sequencing and distinguishes from synchronous operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_appsBInspect

List all your Bonto apps with their current status, subdomain, and resource limits.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states what data is returned (status, subdomain, resource limits). It misses behavioral details such as pagination, sorting, rate limits, authentication requirements, or error handling. For a list operation with zero annotation coverage, this leaves significant gaps in understanding how the tool behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('List all your Bonto apps') and specifies key return attributes. Every word contributes to understanding the tool's purpose without any fluff or redundancy, making it optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (listing multiple apps with attributes) and lack of annotations or output schema, the description is incomplete. It mentions return attributes but doesn't cover format, ordering, or potential limitations like result truncation. For a list tool with no structured output info, more context on behavior and results is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters with 100% schema description coverage, so no parameter documentation is needed. The description appropriately avoids redundant parameter info, earning a baseline score of 4 for not adding unnecessary details beyond the empty schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('Bonto apps') with specific attributes (status, subdomain, resource limits). It distinguishes from siblings like 'get_app' (single app) and 'get_app_status' (status only) by emphasizing comprehensive listing. However, it doesn't explicitly mention how it differs from all siblings, keeping it at 4 instead of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_app' for a single app or 'get_app_status' for status checks. It lacks context about prerequisites (e.g., authentication state) or exclusions, offering only a basic functional statement without usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_filesCInspect

List the contents of a directory in an app. Returns file names, types (file/directory), and sizes.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`path`	No	Directory path to list (default: /)

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the return data (file names, types, sizes), which is helpful, but omits critical behavioral details like pagination, error handling, permissions required, or rate limits. For a read operation with no annotations, this leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action and return values. Every word earns its place with no redundancy or fluff, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a tool with potential complexity (e.g., directory listing may involve permissions, pagination). It covers basic purpose and returns but misses behavioral context and output details, leaving the agent under-informed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents both parameters ('app' and 'path'). The description adds no additional parameter semantics beyond what's in the schema, such as format examples or constraints. Baseline 3 is appropriate as the schema handles the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('contents of a directory in an app'), specifying what the tool does. It distinguishes from siblings like 'search_files' (which likely filters) and 'read_file' (which reads content), but doesn't explicitly contrast them. The purpose is specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'search_files' or 'list_apps'. The description implies usage for directory listing, but lacks explicit context, prerequisites, or exclusions. It's a basic statement without operational guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_packagesAInspect

List the npm packages currently installed in an app (from package.json) along with the configured Node.js version.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It describes what the tool does (listing packages and Node.js version) but lacks details on permissions required, rate limits, whether it's a read-only operation (implied but not stated), error handling, or output format. For a tool with no annotations, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the tool's purpose and scope without unnecessary words. It is front-loaded with the core action ('List the npm packages') and includes essential details ('currently installed in an app', 'from package.json', 'along with the configured Node.js version') in a clear and concise manner.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (listing installed packages and Node.js version), no annotations, and no output schema, the description is incomplete. It covers the purpose and parameter alignment but lacks details on behavioral aspects like permissions, output format, or error conditions. For a tool with no structured data beyond the input schema, more context would be beneficial for an AI agent to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'app' documented as 'App ID (UUID) or subdomain'. The description adds context by specifying that it lists packages for 'an app', aligning with the parameter but not providing additional semantics beyond what the schema already covers. Since there is only one parameter and high schema coverage, a baseline of 4 is appropriate as the description doesn't need to compensate for gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('List'), the resource ('npm packages currently installed in an app'), and includes additional context ('from package.json' and 'along with the configured Node.js version'). It distinguishes itself from siblings like 'search_packages' (which likely searches rather than lists installed packages) and 'list_apps' (which lists apps rather than packages).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by specifying it lists packages 'currently installed in an app', suggesting it should be used when you need to see installed packages for a specific app. However, it does not explicitly state when to use this tool versus alternatives like 'search_packages' or 'get_version_file_content' (which might retrieve package.json directly), nor does it provide exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_versionsAInspect

List the commit history (version snapshots) for an app. Returns commits in reverse chronological order with their hash, date, and message. Use the hash with other version tools to inspect or restore past states.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`limit`	No	Maximum number of versions to return (default: 50, max: 500)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses the return order ('reverse chronological order') and output structure ('hash, date, and message'), which is valuable behavioral context. However, it doesn't mention pagination behavior, rate limits, authentication requirements, or error conditions that would be important for a list operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two well-structured sentences with zero waste. The first sentence states purpose and output format, the second provides usage guidance. Every word earns its place, and the most important information (what it does) comes first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only list operation with no annotations and no output schema, the description provides good coverage of purpose, output format, and usage context. However, it could benefit from mentioning pagination behavior (especially with the limit parameter) or authentication requirements given the sibling tools include login operations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters. The description doesn't add any parameter-specific information beyond what's in the schema. The baseline of 3 is appropriate when the schema does all the parameter documentation work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('List the commit history'), resource ('for an app'), and output format ('Returns commits in reverse chronological order with their hash, date, and message'). It distinguishes from siblings like 'get_version_file_content' or 'restore_version' by focusing on listing version metadata rather than inspecting or restoring specific versions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('to inspect or restore past states') and references sibling tools ('Use the hash with other version tools'), but doesn't explicitly state when NOT to use it or name specific alternatives. It implies usage for getting version overviews rather than detailed file content.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

loginAInspect

Generate a 6-digit device code for authentication. The user must enter this code on the Bonto website while logged in with their SideQuest account.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool generates a code for authentication and requires user action on a website, which is useful behavioral context. However, it lacks details on rate limits, error conditions, or what happens after code entry, leaving gaps for a mutation-related tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with zero waste: the first states the purpose, and the second explains the required user action. It is front-loaded with the core functionality and efficiently conveys necessary information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (authentication-related with no output schema), the description is moderately complete. It explains what the tool does and the user workflow but lacks details on output format, success/failure states, or integration with other tools like 'check_login'. With no annotations, it should provide more behavioral context for a mutation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description appropriately does not discuss parameters, focusing instead on the tool's purpose and usage. A baseline of 4 is applied since it compensates well for the lack of parameters by explaining the tool's function clearly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate a 6-digit device code for authentication.' It specifies the exact action (generate), resource (device code), and format (6-digit), distinguishing it from sibling tools like 'check_login' or 'set_token' which handle different authentication aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use this tool: for authentication requiring a device code entry on the Bonto website with a SideQuest account. However, it does not explicitly state when not to use it or name alternatives (e.g., 'check_login' for verifying login status), which prevents a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_fileAInspect

Read the contents of a file in an app. Maximum file size: 1MB. Binary files are not supported. By default, reads the entire file starting from the beginning. You can optionally specify a line offset and limit (especially handy for long files), but it's recommended to read the whole file by not providing these parameters. Results are returned using cat -n format, with line numbers starting at 1.

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`path`	Yes	File path to read (e.g. "index.js", "src/app.ts")
`limit`	No	Number of lines to read. Only provide if the file is too large to read at once.
`offset`	No	Line number to start reading from (1-indexed). Only provide if the file is too large to read at once.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does an excellent job disclosing key behavioral traits: file size limit (1MB), binary file restriction, default behavior (reads entire file from beginning), and output format (cat -n with 1-indexed line numbers). It doesn't mention error conditions or performance characteristics, but covers most essential operational details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise - four sentences that each earn their place: purpose statement, constraints, parameter guidance, and output format. No wasted words, front-loaded with the core functionality, and efficiently covers all necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read operation with no annotations and no output schema, the description provides excellent coverage of constraints, behavior, and output format. It doesn't explicitly describe the return structure beyond format, but given this is a read_file tool with clear constraints and format specification, it's nearly complete. The absence of error case documentation is the only minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds some context about the optional parameters' purpose ('especially handy for long files') and usage recommendation, but doesn't provide additional semantic meaning beyond what's in the schema descriptions. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Read the contents of a file') and resource ('in an app'), distinguishing it from sibling tools like write_file, upload_file, and delete_file. It provides a complete picture of what the tool does beyond just the name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use optional parameters ('especially handy for long files') and recommends default usage ('recommended to read the whole file by not providing these parameters'). However, it doesn't explicitly state when NOT to use this tool versus alternatives like get_version_file_content or search_files.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_git_remoteCInspect

Remove the upstream Git remote from an app.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the action ('Remove') but does not cover critical aspects like whether this is destructive, requires specific permissions, affects app functionality, or has side effects. This is a significant gap for a mutation tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded and wastes no space, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity as a mutation operation with no annotations and no output schema, the description is insufficient. It lacks details on behavioral traits, error conditions, or what happens after removal, leaving the agent with incomplete information for safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the 'app' parameter documented as 'App ID (UUID) or subdomain'. The description does not add any extra meaning beyond this, such as examples or constraints, so it meets the baseline of 3 where the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Remove') and resource ('upstream Git remote from an app'), making the purpose specific and understandable. However, it does not explicitly differentiate from sibling tools like 'set_git_remote' or 'git_info', which would require a more detailed comparison to achieve a score of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'set_git_remote' for modifying remotes or other Git-related tools. It lacks context on prerequisites, exclusions, or typical scenarios, leaving usage unclear beyond the basic action.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_packageAInspect

Remove an npm package from an app. Updates package.json and runs npm install to clean up node_modules. Use get_install_status to poll for completion.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`name`	Yes	Package name to remove (e.g. "lodash")

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior: it removes a package, updates package.json, runs npm install to clean node_modules, and requires polling for completion. This covers key operational aspects, though it doesn't mention error conditions or permission requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and front-loaded: the first sentence states the core action, the second explains side effects, and the third provides usage guidance. Every sentence earns its place with no wasted words, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description does well by explaining the tool's effects (updates package.json, runs npm install) and completion polling. However, it lacks details on return values, error handling, or prerequisites (e.g., app must be running), leaving minor gaps in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('app' as App ID/subdomain, 'name' as package name). The description doesn't add any parameter-specific details beyond what the schema provides, such as format examples for 'app' or validation rules for 'name'. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Remove an npm package from an app') and identifies the resource ('npm package', 'app'). It distinguishes from sibling tools like 'install_packages' (adds packages) and 'list_packages' (lists packages), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Use get_install_status to poll for completion'), which helps the agent understand post-execution workflow. However, it doesn't specify when NOT to use it or mention alternatives like 'update_app' for broader changes, leaving some contextual gaps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rename_fileAInspect

Rename or move a file or directory in an app. Cannot rename protected paths (package.json, node_modules, .git).

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`new_path`	Yes	New file or directory path (e.g. "src/new-name.js")
`old_path`	Yes	Current file or directory path (e.g. "src/old-name.js")

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds useful context about protected paths (e.g., 'package.json'), which is a key behavioral trait not in the schema. However, it lacks details on permissions, error handling, or whether the operation is atomic/reversible, leaving gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core action ('Rename or move a file or directory in an app') and efficiently adds a critical constraint in a second sentence. Every sentence earns its place with no wasted words, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mutation with no annotations and no output schema), the description is incomplete. It covers the basic action and a key constraint but lacks details on return values, error cases, or side effects. For a tool that modifies file systems, more behavioral context would be beneficial to ensure safe usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents all parameters ('app', 'old_path', 'new_path') with clear descriptions. The description does not add meaning beyond this, such as path format examples or validation rules, but the baseline is 3 since the schema provides adequate parameter information.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('rename or move'), resource ('a file or directory'), and scope ('in an app'), making the purpose specific and actionable. It distinguishes from siblings like 'delete_file' (deletion) and 'write_file' (content modification) by focusing on path changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage by specifying constraints ('Cannot rename protected paths'), which helps determine when not to use it. However, it does not explicitly mention alternatives (e.g., 'create_directory' for moving files to new directories) or when to prefer this over other tools like 'upload_file' for similar tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

restart_appAInspect

Restart a running app. By default uses a soft restart that preserves terminal sessions (restarts the app process, not the container). Set hard=true for a full container restart when needed (e.g. after env var changes).

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`hard`	No	Force a full container restart (kills terminal sessions). Default: false (soft restart).

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively explains the default behavior (soft restart preserving terminal sessions) and the alternative (hard restart killing terminal sessions), covering key operational impacts. However, it does not mention potential side effects like downtime duration or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first states the core purpose, and the second explains the parameter option with a practical example. Every sentence adds value without redundancy, making it front-loaded and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is largely complete for guiding usage. It covers the main functionality and parameter implications, but it lacks details on return values or error handling, which would be beneficial for an agent invoking the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds minimal value beyond the schema by briefly mentioning the hard parameter's purpose, but it does not provide additional semantic context or usage examples that aren't already in the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Restart a running app') and distinguishes between soft and hard restart modes. It directly addresses what the tool does with precise terminology ('restarts the app process, not the container'), making its purpose unambiguous and distinct from sibling tools like 'start_app' or 'stop_app'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use the hard restart option ('e.g. after env var changes'), but it does not explicitly state when to use this tool versus alternatives like 'stop_app' followed by 'start_app'. It offers guidance on mode selection but lacks sibling tool differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

restore_versionAInspect

DESTRUCTIVE: Restore an app to a previous version using git reset --hard. This permanently overwrites all current files with the state from the specified commit — any changes made after that commit will be lost and CANNOT be recovered. You MUST confirm with the user before calling this tool. Use list_versions to show the user available versions first.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`hash`	Yes	Commit hash to restore to (from bonto_list_versions)

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description fully discloses critical behavioral traits: it explicitly labels the operation as 'DESTRUCTIVE', details the irreversible consequences ('permanently overwrites... changes... will be lost and CANNOT be recovered'), and specifies user confirmation requirements, covering safety and procedural aspects comprehensively.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the most critical information ('DESTRUCTIVE: Restore...'), uses concise sentences with zero waste, and structures guidance logically from warning to prerequisites, making it highly efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high-risk nature of the tool, no annotations, and no output schema, the description is complete: it covers purpose, destructive behavior, user confirmation, and references to sibling tools, providing all necessary context for safe and correct invocation without redundancy.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds minimal parameter semantics beyond the schema—it mentions 'specified commit' and references 'list_versions' for context, but does not elaborate on parameter usage or constraints, relying on the schema for detailed documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('restore an app to a previous version') and mechanism ('using git reset --hard'), distinguishing it from siblings like 'update_app' or 'get_version_files' by focusing on destructive rollback rather than incremental updates or read-only operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit guidance on when to use ('You MUST confirm with the user before calling this tool') and references an alternative ('Use list_versions to show the user available versions first'), clearly establishing prerequisites and sequencing without ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_filesAInspect

Search for text across all files in an app. Returns matching lines grouped by file with line numbers. Skips node_modules, .git, and binary files. Max 500 results by default.

Supports grep-like options: context lines (-A/-B/-C), file glob filtering (e.g. ".ts", "src/**/.ts"), and output modes (content, files_with_matches, count).

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`glob`	No	File pattern filter, e.g. ".ts", "src//.ts", "*/.{ts,js}". Matches against relative file paths.
`query`	Yes	Text or regex pattern to search for
`regex`	No	Treat query as a regular expression (default: false)
`context`	No	Lines of context before AND after each match (like grep -C). Max 10.
`maxResults`	No	Maximum number of matches to return (default: 500)
`outputMode`	No	Output mode: "content" (default, returns matching lines), "files_with_matches" (returns file paths only), "count" (returns match count per file).
`contextAfter`	No	Lines of context after each match (like grep -A). Max 10.
`caseSensitive`	No	Case-sensitive search (default: false)
`contextBefore`	No	Lines of context before each match (like grep -B). Max 10.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and does so effectively. It describes important behavioral traits: what gets skipped (node_modules, .git, binary files), default result limits (max 500), and output format (matching lines grouped by file with line numbers). It also mentions grep-like capabilities and output modes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise - two paragraphs with zero wasted words. The first sentence establishes core functionality, the second adds important behavioral constraints, and the third describes advanced capabilities. Every sentence earns its place by adding distinct value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with 10 parameters and no output schema, the description provides excellent context about behavior, limitations, and capabilities. It doesn't describe the exact return format structure (though it mentions grouping), and with no output schema, this creates a minor gap. However, it covers most essential context given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all 10 parameters thoroughly. The description adds some context about grep-like options and output modes, but doesn't provide significant additional parameter semantics beyond what's already in the schema descriptions. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Search for text across all files in an app') and resource ('files'), distinguishing it from siblings like 'list_files' (which lists files without searching) or 'read_file' (which reads a single file). It provides concrete details about what gets searched and what gets skipped.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning it searches 'across all files in an app' and skips certain directories/file types, but doesn't explicitly state when to use this tool versus alternatives like 'search_packages' or 'get_logs'. It provides good operational context but lacks explicit comparison to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_packagesAInspect

Search the npm registry for packages. Returns up to 10 results with name, description, and latest version.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Package name or keywords to search for

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the search returns up to 10 results with specific fields, which is useful context. However, it doesn't mention rate limits, authentication needs, error conditions, or pagination behavior, leaving gaps for a tool that interacts with an external registry.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the tool's purpose, action, and output. Every word earns its place with no redundancy or unnecessary elaboration, making it easy to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with no annotations and no output schema, the description provides basic functionality but lacks details on authentication, error handling, or result formatting beyond the mentioned fields. It's minimally adequate given the simple input schema, but could benefit from more context about registry interaction and limitations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'query' documented as 'Package name or keywords to search for'. The description doesn't add any additional parameter semantics beyond what the schema provides, so the baseline score of 3 is appropriate given the schema handles the documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Search'), target resource ('the npm registry for packages'), and scope ('Returns up to 10 results with name, description, and latest version'). It distinguishes itself from sibling tools like 'list_packages' by specifying it searches a registry rather than listing installed packages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for finding npm packages by name or keywords, but provides no explicit guidance on when to use this versus alternatives like 'list_packages' or 'install_packages'. It doesn't mention prerequisites, exclusions, or comparative scenarios with sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_git_remoteBInspect

Set or update the upstream Git remote (GitHub/GitLab) for an app. Supports public repos (no token) and private repos (with a Personal Access Token).

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`url`	Yes	HTTPS URL of the remote repository (e.g. https://github.com/user/repo.git)
`token`	No	Personal Access Token for private repos and push access (optional)
`branch`	No	Branch name to use for pull/push (e.g. "main", "master"). If omitted, auto-detects from remote.

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It mentions authentication needs (token for private repos) which is valuable, but doesn't disclose other behavioral traits: whether this overwrites existing remotes, what happens on failure, if it validates the URL format, or any rate limits. For a mutation tool with zero annotation coverage, this leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste. It front-loads the core purpose and immediately adds key operational context (public/private repo support). Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 4 parameters, no annotations, and no output schema, the description is minimally adequate. It covers the basic purpose and auth context but lacks details on behavior, error handling, or return values. Given the complexity and missing structured data, it should do more to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 4 parameters thoroughly. The description adds marginal value by implying 'token' is optional and contextualizing its use for private repos, but doesn't provide additional syntax, format, or constraint details beyond what the schema specifies. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Set or update') and resource ('upstream Git remote for an app'), specifying the target systems (GitHub/GitLab). It distinguishes from sibling 'remove_git_remote' by being the complementary set/update operation, though it doesn't explicitly contrast with other Git tools like 'git_pull' or 'git_push'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage context by mentioning when to use a token (private repos) vs. not (public repos), but doesn't explicitly state when to choose this tool over alternatives like 'git_info' or 'remove_git_remote', nor does it mention prerequisites (e.g., app must exist). The guidance is helpful but incomplete.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_node_versionAInspect

Change the Node.js runtime version for an app (18, 20, or 22). Updates package.json engines.node and restarts the container with the new Node.js image.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`version`	Yes	Node.js major version to use

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it updates package.json engines.node and restarts the container with the new Node.js image. This reveals the mutation nature and side effects beyond just version setting.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with zero waste. The first sentence states the core purpose with key constraints, and the second sentence explains the implementation details and side effects. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides good context about what changes (package.json) and what happens (container restart). It could mention potential downtime or error conditions, but covers the essential behavioral aspects well.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing good documentation for both parameters. The description adds context about the version being a 'major version' and mentions the specific values, but doesn't significantly enhance the schema's parameter semantics beyond what's already documented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Change the Node.js runtime version'), the target resource ('for an app'), and the allowed values ('18, 20, or 22'). It distinguishes from siblings like 'update_app' or 'restart_app' by focusing specifically on Node.js version management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing to change Node.js versions for an app, but doesn't explicitly state when to use this vs alternatives like 'update_app' or 'restart_app'. No guidance on prerequisites or exclusions is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_tokenAInspect

Restore an authenticated session using a previously saved JWT token. Call this at the start of a new session before any other tools, using a token saved from a prior check_login call. If the token is invalid, fall back to login.

ParametersJSON Schema

Name	Required	Description	Default
`token`	Yes	The JWT token saved from a previous bonto_check_login response

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's purpose (session restoration), prerequisites (token from check_login), and fallback behavior. However, it doesn't mention potential side effects like session expiration or error handling details beyond the fallback.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by usage guidelines and fallback behavior. Every sentence adds value with no redundant information, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no annotations or output schema, the description provides strong contextual completeness by explaining the tool's role in authentication flow, prerequisites, and fallback. A minor gap is the lack of detail on what happens after successful restoration (e.g., session duration).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the baseline is 3. The description adds meaningful context by specifying that the token should be 'saved from a previous bonto_check_login response', which clarifies the parameter's origin and format beyond the schema's generic description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Restore an authenticated session') and the resource involved ('using a previously saved JWT token'). It distinguishes from sibling tools like 'login' by focusing on session restoration rather than initial authentication.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage instructions: 'Call this at the start of a new session before any other tools' and specifies the source of the token ('saved from a prior check_login call'). It also mentions fallback behavior ('If the token is invalid, fall back to login'), giving clear alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

start_appAInspect

Start a stopped or sleeping app by scaling its Docker service to 1 replica.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the action and mechanism but lacks details on permissions required, side effects (e.g., if it affects other services), rate limits, or error conditions. For a mutation tool with zero annotation coverage, this is a significant gap in behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the purpose ('Start a stopped or sleeping app') and includes the mechanism. There is no wasted text, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a mutation with no annotations and no output schema), the description is adequate but incomplete. It covers the action and target but lacks details on outcomes, errors, or dependencies, which are important for an agent to use it correctly in context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the single parameter 'app' documented as 'App ID (UUID) or subdomain'. The description does not add meaning beyond this, such as examples or constraints, so it meets the baseline for high schema coverage without extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Start') and target ('a stopped or sleeping app'), specifying the mechanism ('by scaling its Docker service to 1 replica'). It distinguishes from siblings like 'restart_app' (which implies restarting a running app) and 'stop_app' (the opposite action).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates when to use it ('a stopped or sleeping app'), providing clear context. However, it does not explicitly state when not to use it (e.g., for a running app) or name alternatives like 'restart_app', which slightly limits guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stop_appAInspect

Stop a running app by scaling its Docker service to 0 replicas. The app will become inaccessible to visitors until started again.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses the key behavioral consequence ('The app will become inaccessible to visitors until started again'), which is crucial for a destructive operation. However, it doesn't mention permission requirements, rate limits, or whether the stop is reversible (though implied by 'until started again').

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with zero waste. The first sentence states the action and mechanism, the second explains the consequence. Every word earns its place, and the information is front-loaded appropriately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive operation with no annotations and no output schema, the description provides adequate context about what the tool does and its immediate effect. However, it doesn't cover error conditions, confirmation requirements, or what happens to background processes, leaving some gaps for a mutation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage and only 1 parameter, the schema already fully documents the 'app' parameter. The description adds no additional parameter semantics beyond what's in the schema, but with minimal parameters and complete schema coverage, this meets baseline expectations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Stop a running app') and mechanism ('by scaling its Docker service to 0 replicas'), distinguishing it from siblings like restart_app or delete_app. It precisely identifies the resource being acted upon and the technical implementation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Stop a running app') and implicitly contrasts with start_app by mentioning the app becomes inaccessible until started again. However, it doesn't explicitly name alternatives or provide when-not-to-use guidance beyond the obvious.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_appCInspect

Update an app's settings: name, subdomain, environment variables, resource limits, access control, runtime configuration, and more.

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`name`	No	New display name (1-100 chars)
`env_vars`	No	Environment variables as a JSON string of key-value pairs (replaces all existing env vars)
`always_on`	No	Keep the app running 24/7 without auto-sleep. Requires a billing tier that supports always-on apps.
`cpu_limit`	No	CPU limit (0.5 to 4 cores)
`subdomain`	No	Rename the app subdomain (WARNING: changes the app URL and recreates the Docker service — use with care)
`memory_limit`	No	Memory limit: "512M", "768M", or "1G"
`start_script`	No	Which package.json script to run (e.g. "dev", "serve"). Defaults to "start". Set to empty string to reset to default.
`watch_ignore`	No	Comma-separated paths to ignore for file watching (e.g. "public,dist,output"). Always ignores node_modules.
`sso_auth_mode`	No	"any" = any Bonto user can access; "list" = only users in sso_auth_allowed_users.
`allow_remixing`	No	Allow other Bonto users to create a copy (remix) of this app.
`git_auto_commit`	No	Enable/disable automatic version history snapshots when files change. Default: true.
`sso_auth_enabled`	No	Enable/disable Bonto SSO login requirement. When enabled, visitors must sign in with a Bonto account.
`watch_extensions`	No	Comma-separated file extensions that trigger nodemon restart (e.g. "js,ts"). Leave empty for default (js,ts,json,html,css).
`workspace_mounts`	No	JSON array of app IDs to mount as read-write volumes at /app/workspace/[slug]/. E.g. '["uuid1","uuid2"]'. Pass '[]' to remove all mounts.
`http_auth_enabled`	No	Enable or disable HTTP Basic Auth protection. Set to false to remove password protection without changing stored credentials.
`http_auth_password`	No	Password for HTTP Basic Auth (plaintext, will be hashed). Min 8 chars. Set to enable Basic Auth on this app.
`http_auth_username`	No	Username for HTTP Basic Auth. Set http_auth_password together with this to enable password protection.
`sso_auth_allowed_users`	No	JSON array of Bonto user email addresses allowed when sso_auth_mode is "list". E.g. '["alice@example.com","bob@example.com"]'.
`healthcheck_timeout_secs`	No	Timeout (seconds) for each healthcheck request. Default: 10.
`healthcheck_interval_secs`	No	How often (seconds) to check if the app is alive. Default: 30.
`healthcheck_start_period_secs`	No	Grace period (seconds) before the first healthcheck runs after container start. Default: 30.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool updates settings but doesn't describe critical behaviors: whether changes are immediate or require a restart, if updates are atomic or partial, what permissions are needed, potential side effects (e.g., subdomain changes recreating Docker services as hinted in schema), or error handling. The description is minimal and lacks operational context for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('Update an app's settings') and provides a representative list of fields. There's no wasted text, and it's appropriately sized for a tool with a well-documented schema. However, it could be slightly more structured by grouping related settings or adding brief context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (22 parameters, mutation tool) and lack of annotations or output schema, the description is incomplete. It doesn't address behavioral aspects like how updates are applied, what the response looks like, or error conditions. For a tool with many parameters and no structured safety hints, more descriptive context is needed to guide effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 22 parameters thoroughly. The description adds marginal value by listing example fields (name, subdomain, environment variables, etc.), but this doesn't provide additional semantics beyond what's in the schema (e.g., it doesn't explain interactions between parameters or default behaviors). Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Update') and resource ('app's settings'), and lists specific fields that can be modified (name, subdomain, environment variables, etc.). It distinguishes from siblings like 'create_app' (creation vs. update) and 'restart_app' (runtime vs. configuration), though it doesn't explicitly mention these distinctions. The purpose is specific but could be more precise about what 'update' entails operationally.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an existing app), exclusions (e.g., what can't be updated), or comparisons to siblings like 'get_app' (for viewing settings) or 'restart_app' (for applying changes). Usage is implied by the verb 'update,' but no explicit context or decision criteria are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upload_fileAInspect

Get a direct upload URL for uploading a binary or text file to an app. Returns a one-time URL valid for 5 minutes — use the Bash tool to POST the file with curl (no base64, raw binary). Works for images, fonts, archives, or any file type.

ParametersJSON Schema

Name	Required	Description	Default
`app`	Yes	App ID (UUID) or subdomain
`path`	Yes	Destination file path in the app (e.g. "public/logo.png", "assets/font.woff2")

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: it returns a one-time URL valid for 5 minutes, requires a separate curl command for the upload, and works for various file types. However, it lacks details on permissions, rate limits, or error handling, which are important for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with the first sentence stating the core purpose. Each subsequent sentence adds valuable information (URL validity, usage instructions, file type support) without waste, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mutation with no annotations and no output schema), the description is somewhat complete but has gaps. It explains the upload process and URL behavior but lacks details on response format, error cases, or integration with sibling tools, which could hinder an agent's ability to use it correctly in all contexts.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('app' and 'path') adequately. The description does not add any additional meaning or examples beyond what the schema provides, such as clarifying the 'path' parameter's role in the upload process, but it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Get a direct upload URL for uploading a binary or text file to an app') and distinguishes it from sibling tools like 'write_file' by focusing on URL generation rather than direct file writing. It explicitly mentions the resource (app) and scope (any file type).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use this tool (for uploading files via a one-time URL) and mentions using the Bash tool with curl for the actual upload, but it does not explicitly state when NOT to use it or name alternatives like 'write_file' from the sibling list, which could be a direct file-writing alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

write_fileAInspect

Create or update a file in an app. Parent directories are created automatically. Maximum content size: 1MB. The app will auto-restart when files change.

ParametersJSON Schema

Name	Required	Description
`app`	Yes	App ID (UUID) or subdomain
`path`	Yes	File path to write (e.g. "index.js", "src/app.ts")
`content`	Yes	File content to write

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and discloses key behavioral traits: it creates/updates files, auto-creates parent directories, has a 1MB content limit, and triggers app auto-restart. This covers mutation effects, constraints, and side-effects, though it lacks details on error handling or permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by important behavioral details in three concise sentences. Each sentence adds value (auto-directory creation, size limit, restart effect) with zero waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is fairly complete: it explains the operation, key behaviors, and constraints. However, it doesn't cover error cases (e.g., invalid app ID, path issues) or response format, leaving some gaps in context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents parameters (app, path, content). The description adds no additional parameter semantics beyond what's in the schema, such as format examples or constraints beyond the 1MB limit mentioned generally. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create or update' and the resource 'a file in an app', distinguishing it from sibling tools like 'upload_file' (which might handle external uploads) and 'read_file' (which is read-only). It specifies the exact operation without ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for file creation/updates in apps, with context like automatic directory creation and app restart on changes. However, it doesn't explicitly state when to use this vs. alternatives like 'upload_file' or 'create_directory', nor does it mention prerequisites (e.g., app must exist).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?