Skip to main content
Glama

Server Details

Agent-native web hosting — deploy sites, manage DNS, register domains, scale infrastructure

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
alainsvrd/borealhost-mcp
GitHub Stars
0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.5/5 across 82 of 82 tools scored. Lowest: 3.6/5.

Server CoherenceA
Disambiguation4/5

Most tools have distinct purposes targeting specific resources and actions, such as add_cron vs. add_ssh_key or list_snapshots vs. get_snapshot_usage. However, some overlap exists, like manage_dns and add_domain_dns/delete_domain_dns, which could cause confusion about which to use for DNS operations.

Naming Consistency5/5

Tool names follow a highly consistent verb_noun pattern throughout, such as create_backup, list_domains, update_account, and delete_file. There are no deviations in naming conventions, making the set predictable and readable.

Tool Count2/5

With 82 tools, the count is excessive for a hosting server, making it overwhelming and difficult for agents to navigate. While the domain is broad, many tools could be consolidated or omitted without losing functionality, indicating poor scoping.

Completeness5/5

The tool set provides comprehensive coverage for hosting management, including site provisioning, DNS, backups, snapshots, file operations, database management, billing, and WordPress-specific functions. There are no obvious gaps, and CRUD/lifecycle operations are well-supported across all domains.

Available Tools

95 tools
add_cronAInspect

Add a cron job to a site.

Requires: API key with write scope.

Args: slug: Site identifier schedule: Cron schedule (e.g. "*/5 * * * *", "0 2 * * *") command: Command to execute

Returns: {"added": true, "result": {...}}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
commandYes
scheduleYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the authentication requirement (API key with write scope), which is valuable context. However, it doesn't address other important behavioral aspects like rate limits, error conditions, whether the operation is idempotent, or what happens if a cron job with the same schedule already exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (purpose, requirements, arguments, returns). Each sentence serves a distinct purpose with zero wasted words. The information is front-loaded with the core purpose, followed by essential details in a logical flow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides adequate but incomplete coverage. It explains the purpose, requirements, parameters, and return structure, but lacks details about error handling, side effects, or the specific content of the result object. Given the complexity of adding a cron job (which involves scheduling and execution), more behavioral context would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate for the lack of parameter documentation in the schema. It provides clear explanations for all three parameters: 'slug' as 'Site identifier', 'schedule' with cron format examples, and 'command' as 'Command to execute'. This adds substantial meaning beyond what the bare schema provides, though it could benefit from more detail about slug format constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Add a cron job') and the target resource ('to a site'), distinguishing it from sibling tools like 'list_cron' or 'delete_cron'. It uses precise terminology that leaves no ambiguity about the tool's function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states a prerequisite ('Requires: API key with write scope'), providing clear context for when this tool can be used. However, it doesn't specify when to use this tool versus alternatives like 'list_cron' or 'delete_cron', nor does it mention any exclusions or specific scenarios where it shouldn't be used.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_domain_dnsAInspect

Add a DNS record to a domain.

Requires: API key with write scope.

Args: domain_name: Full domain name (e.g. "example.com") record_type: "A", "AAAA", "CNAME", "MX", "TXT", or "SRV" value: Record value (e.g. "1.2.3.4" for A, "mail.example.com" for MX) subdomain: Subdomain part (e.g. "www", "mail"). Empty for apex domain. ttl: Time to live in seconds (default: 3600) priority: MX priority (required for MX records)

Returns: {"success": true, "record": {"id": "...", "type": "A", "subdomain": "www", "value": "1.2.3.4", "ttl": 3600}}

Errors: VALIDATION_ERROR: Missing value, invalid record type NOT_FOUND: Domain not found

ParametersJSON Schema
NameRequiredDescriptionDefault
ttlNo
valueYes
priorityNo
subdomainNo
domain_nameYes
record_typeYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the write operation ('Add'), authentication requirements ('API key with write scope'), and includes error cases (VALIDATION_ERROR, NOT_FOUND) and return format. However, it doesn't mention rate limits, idempotency, or side effects beyond the immediate addition.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (description, requirements, args, returns, errors) and uses bullet points effectively. While comprehensive, it could be slightly more concise by combining some explanations, but every sentence earns its place by adding necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a DNS record creation tool with 6 parameters, 0% schema coverage, no annotations, and no output schema, the description provides complete context. It covers purpose, prerequisites, all parameters with semantics, return format, and error cases—everything needed for correct tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing detailed parameter explanations. It documents all 6 parameters with examples, clarifies defaults (ttl: 3600), specifies required conditions (priority required for MX), and explains domain/subdomain relationships. This adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Add a DNS record') and resource ('to a domain'), distinguishing it from sibling tools like 'delete_domain_dns' or 'list_domain_dns'. It provides a complete verb+resource+scope statement in the first sentence.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('Add a DNS record to a domain') and provides prerequisites ('Requires: API key with write scope'), but doesn't explicitly mention when NOT to use it or name specific alternatives from the sibling list (e.g., 'delete_domain_dns' for removal).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_firewall_ruleAInspect

Add an IP firewall rule (allow or deny) and reload Nginx.

Supports IPv4, IPv6, and CIDR notation. Max 100 rules per site. If a rule already exists for the IP, the action is updated.

Requires: API key with write scope.

Args: slug: Site identifier ip: IP address or CIDR (e.g. "1.2.3.4", "10.0.0.0/8", "2001:db8::/32") action: "deny" (block) or "allow" (whitelist). Default: "deny"

Returns: {"added": true, "ip": "1.2.3.4", "action": "deny"}

ParametersJSON Schema
NameRequiredDescriptionDefault
ipYes
slugYes
actionNodeny
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behaviors: it reloads Nginx after adding the rule, supports IPv4/IPv6/CIDR, has a max of 100 rules per site, updates existing rules instead of failing, and requires an API key with write scope. It doesn't mention rate limits or error conditions, but covers most critical operational aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with a clear purpose statement first, followed by behavioral details, prerequisites, and parameter explanations. Every sentence earns its place—no fluff. The use of sections (Args, Returns) enhances readability without unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is remarkably complete. It covers purpose, behavior, prerequisites, parameters with examples, and even includes a return value example. Given the complexity (firewall rule management with Nginx reload), it provides all necessary context for safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing detailed parameter semantics. It explains each parameter: 'slug' as site identifier, 'ip' with examples of formats, and 'action' with allowed values and default. This adds significant meaning beyond the bare schema, making parameters completely understandable.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Add an IP firewall rule'), the resource ('Nginx'), and the scope ('allow or deny'). It distinguishes itself from sibling tools like 'remove_firewall_rule' by specifying it's for adding/updating rules, not removing them. The purpose is explicit and well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for adding or updating firewall rules with IP/CIDR inputs. It mentions the sibling 'list_firewall_rules' implicitly by noting the max 100 rules per site, suggesting checking existing rules first. However, it doesn't explicitly state when NOT to use it or name alternatives like 'remove_firewall_rule' for deletion scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_ssh_keyAInspect

Inject your SSH public key into a site's container for direct SSH access.

The key is appended to /home/admin/.ssh/authorized_keys. Only available for VPS/dedicated plans.

Requires: API key with write scope.

Args: slug: Site identifier public_key: SSH public key string. Supported types: ssh-ed25519, ssh-rsa, ecdsa-sha2-nistp256/384/521

Returns: {"success": true, "message": "SSH key added", "ssh_command": "ssh admin@184.107.x.x"}

Errors: VALIDATION_ERROR: Invalid or unsupported key format FORBIDDEN: Plan does not support SSH

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
public_keyYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing: the mutation nature ('inject'), specific file location ('/home/admin/.ssh/authorized_keys'), plan restrictions, authentication requirements, and error conditions. It doesn't mention rate limits or idempotency, but covers most critical behavioral aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with purpose first, then implementation details, requirements, parameters, returns, and errors. Every sentence earns its place, though the parameter documentation could be slightly more integrated with the main flow rather than in a separate 'Args' section.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter mutation tool with no annotations or output schema, the description provides complete context: purpose, usage constraints, authentication needs, parameter details, return format, and error conditions. It leaves no significant gaps for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It provides detailed semantics for both parameters: 'slug' as 'Site identifier' and 'public_key' with specific format requirements and supported key types. This adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('inject your SSH public key'), target resource ('site's container'), and purpose ('for direct SSH access'). It distinguishes from sibling tools like 'get_ssh_info' by focusing on adding rather than retrieving SSH information.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('Only available for VPS/dedicated plans') and prerequisites ('Requires: API key with write scope'). It also implicitly distinguishes from alternatives by specifying the exact behavior (appending to authorized_keys), though it doesn't name specific sibling alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cache_flushAInspect

Flush all caches (Redis + WP object cache).

Requires: API key with write scope.

Args: slug: Site identifier

Returns: {"flushed": true}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing critical behavioral traits: it's a destructive operation ('Flush all caches'), requires specific permissions ('API key with write scope'), and shows the expected return format. It doesn't mention rate limits or error conditions, but covers the essential safety profile.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured with clear sections (action, requirements, args, returns) in just four lines. Every sentence earns its place with zero wasted words, making it easy to scan and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive operation with no annotations and no output schema, the description does well by covering purpose, prerequisites, parameter meaning, and return format. It could be more complete by mentioning potential side effects or error cases, but provides sufficient context for safe invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the single parameter, the description compensates by explaining that 'slug' is a 'Site identifier', adding meaningful context beyond the bare schema. This clarifies what the parameter represents, though it could provide more detail about format or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Flush all caches') and resources involved ('Redis + WP object cache'), distinguishing it from sibling tools like cache_status (monitoring) and cache_toggle (enabling/disabling). It uses precise technical terminology that unambiguously defines the tool's function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit prerequisites ('Requires: API key with write scope') which gives clear context for when this tool can be used. However, it doesn't specify when to use this versus alternatives like cache_toggle or differentiate from other cache-related operations, missing explicit sibling differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cache_statusAInspect

Get cache status (Redis, WP object cache, hit rates).

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"redis_running": true, "object_cache_enabled": true, "hit_rate": 0.95, "memory_used_mb": 12}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the required API key scope ('read scope'), which is useful for authentication needs. However, it doesn't mention other behavioral traits like rate limits, error handling, or whether the operation is read-only (implied by 'Get' but not explicit). The description adds some value but lacks comprehensive behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: it starts with the core purpose, lists requirements, and details arguments and returns in a clear format. Every sentence adds value without redundancy, making it efficient and easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is reasonably complete. It covers purpose, prerequisites, parameters, and return values with examples. However, it lacks explicit confirmation of read-only behavior and doesn't address potential errors or edge cases, which could be beneficial for full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, with one parameter 'slug' documented only as a string. The description adds meaning by specifying 'slug: Site identifier,' clarifying its purpose. However, it doesn't provide details like format or examples. Since schema coverage is low, the description compensates partially but not fully, meeting the baseline for minimal parameter info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get cache status (Redis, WP object cache, hit rates).' It specifies the verb ('Get') and resources ('cache status'), including specific components like Redis and WP object cache. However, it doesn't explicitly differentiate from sibling tools like 'cache_flush' or 'cache_toggle' beyond the general 'get' vs. 'flush/toggle' distinction implied by the names.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage with 'Requires: API key with read scope,' indicating prerequisites. It doesn't explicitly state when to use this tool versus alternatives like 'get_site_status' or 'get_metrics,' but the specificity of 'cache status' implies it's for monitoring caching systems, which is adequate guidance without being exhaustive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cache_toggleAInspect

Enable or disable the WordPress object cache.

Requires: API key with write scope.

Args: slug: Site identifier enable: true to enable, false to disable

Returns: {"enabled": true}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
enableYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing the mutation nature ('Enable or disable'), API key requirements with write scope, and the expected return format. It doesn't mention rate limits, side effects on site performance, or error conditions, but covers core behavioral aspects adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured with a purpose statement, prerequisites, parameter explanations, and return format—all in four concise lines. Every sentence earns its place with no wasted words, and information is front-loaded appropriately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides good coverage: purpose, prerequisites, parameter meanings, and return format. It could mention potential side effects or error cases, but given the tool's relative simplicity, it's largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must fully compensate. It explicitly documents both parameters with clear semantics: 'slug: Site identifier' and 'enable: true to enable, false to disable', adding essential meaning beyond the bare schema titles 'Slug' and 'Enable'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Enable or disable') and the target resource ('WordPress object cache'), distinguishing it from sibling tools like cache_flush and cache_status. It provides a complete, unambiguous purpose statement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'Requires' section specifying API key prerequisites, which provides clear context for when to use the tool. However, it does not explicitly differentiate when to use this tool versus alternatives like cache_flush or provide when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cancel_scheduled_snapshotAInspect

Cancel a scheduled snapshot.

Requires: API key with write scope.

Args: slug: Site identifier schedule_id: UUID of the scheduled snapshot to cancel

Returns: {"success": true, "message": "Scheduled snapshot cancelled"}

Errors: NOT_FOUND: Schedule not found or already executed

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
schedule_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It effectively discloses key behavioral traits: it's a mutation (implied by 'Cancel'), requires specific permissions ('API key with write scope'), and includes error handling ('NOT_FOUND: Schedule not found or already executed'). It lacks details on rate limits or side effects, but covers core operational aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose, followed by requirements, arguments, returns, and errors in a structured format. Every section adds value without redundancy, making it easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mutation with permissions and error cases), no annotations, and no output schema, the description is complete enough. It covers purpose, prerequisites, parameters, return values, and errors, providing all necessary context for an agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning by explaining 'slug' as 'Site identifier' and 'schedule_id' as 'UUID of the scheduled snapshot to cancel', which clarifies the purpose of each parameter beyond the schema's generic titles. It does not cover format details (e.g., UUID structure), but provides essential context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Cancel') and resource ('a scheduled snapshot'), distinguishing it from sibling tools like 'schedule_snapshot', 'create_snapshot', 'delete_snapshot', and 'rollback_snapshot' by focusing on cancellation of scheduled snapshots specifically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context with 'Requires: API key with write scope', indicating when to use based on permissions. However, it does not explicitly state when not to use this tool or name alternatives (e.g., vs. 'delete_snapshot' for unscheduled ones).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

claim_api_keyAInspect

Claim an API key using a claim token from the container.

After calling request_api_key(), read the claim token from ~/.borealhost/.claim_token on your container and pass it here.

The token is single-use — once claimed, it cannot be used again. The API key is automatically activated for this MCP session.

Args: claim_token: The claim token string read from the container file

Returns: {"api_key": "bh_...", "key_prefix": "bh_...", "site_slug": "my-site", "scopes": ["read", "write"], "message": "API key created and activated..."}

Errors: VALIDATION_ERROR: Invalid, expired, or already-claimed token

ParametersJSON Schema
NameRequiredDescriptionDefault
claim_tokenYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: the token is single-use (implying idempotency or state change), the API key is automatically activated for the session (clarifying the outcome), and it includes error handling details (VALIDATION_ERROR cases). However, it lacks information on rate limits, authentication needs, or side effects beyond token consumption.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by usage steps, behavioral notes, and structured sections for Args, Returns, and Errors. Each sentence adds value—no wasted words—and the formatting enhances readability without verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involves token handling and activation), no annotations, and no output schema, the description is highly complete. It covers the purpose, usage workflow, parameter semantics, return values (including example structure), and error cases, providing all necessary context for an agent to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage (only titles), but the description compensates fully by explaining the 'claim_token' parameter: it's a string read from a specific container file, derived from 'request_api_key()', and single-use. This adds crucial context beyond the schema, making the parameter's purpose and source clear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Claim an API key') and resource ('using a claim token from the container'), distinguishing it from siblings like 'request_api_key' (which presumably generates the token) and 'set_api_key' (which might set an existing key). It explicitly mentions the token source and the outcome of activation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: after calling 'request_api_key()' and reading the token from a specific file path (~/.borealhost/.claim_token). It also specifies an alternative ('request_api_key') and includes exclusions (the token is single-use, so not reusable). This clearly differentiates it from sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cloudflare_proxy_statusAInspect

Get Cloudflare proxy (CDN) status for a site.

Shows whether traffic is routed through Cloudflare's CDN (orange cloud) or goes direct to origin (grey cloud / DNS-only).

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"domain": "my-site.borealhost.ai", "has_record": true, "proxied": true, "ip": "1.2.3.4"}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's purpose, output format, and authentication requirements ('Requires: API key with read scope'), though it lacks details on rate limits, error handling, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by usage explanation, requirements, and input/output details. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (single parameter, no output schema, no annotations), the description is largely complete: it covers purpose, usage, authentication, parameter meaning, and return format. However, it lacks explicit error cases or operational constraints like rate limits.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage for its single parameter 'slug'. The description compensates by explaining that 'slug' is a 'Site identifier', adding meaningful context beyond the schema's bare 'Slug' title, though it could provide more detail on format or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get Cloudflare proxy (CDN) status'), identifies the resource ('for a site'), and distinguishes it from siblings by focusing on Cloudflare proxy status rather than other Cloudflare or site-related operations like 'cloudflare_analytics' or 'get_site_status'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Shows whether traffic is routed through Cloudflare's CDN... or goes direct to origin') and mentions prerequisites ('Requires: API key with read scope'), but does not explicitly state when not to use it or name alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cloudflare_purge_cacheAInspect

Purge Cloudflare CDN cache for a site.

Without urls: purges all cached content for the site's subdomain. With urls: purges only the specified URLs (max 30 per call).

Requires: API key with write scope.

Args: slug: Site identifier urls: Optional list of specific URLs to purge (e.g. ["https://my-site.borealhost.ai/style.css"])

Returns: {"purged": true, "scope": "host", "domain": "my-site.borealhost.ai"}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
urlsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and does so effectively. It explains the two operational modes (full purge vs. selective purge), mentions the 30-URL limit per call, and specifies the required API key with write scope. This covers key behavioral aspects like scope, limits, and authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (overview, behavior modes, requirements, args, returns) and every sentence earns its place. It's front-loaded with the core purpose, followed by essential details without redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides excellent completeness. It covers purpose, behavior modes, authentication requirements, parameter semantics, and even includes a return value example. This gives the agent sufficient context to understand and use the tool correctly despite the lack of structured metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing clear semantic meaning for both parameters. It explains that 'slug' is a site identifier and 'urls' is an optional list of specific URLs to purge, including a concrete example. This adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Purge Cloudflare CDN cache') and resource ('for a site'), distinguishing it from sibling tools like 'cache_flush' or 'cache_status' by specifying Cloudflare CDN context. It provides a precise verb+resource combination that leaves no ambiguity about the tool's function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use the tool with or without URLs, explaining the scope difference between purging all cached content vs. specific URLs. However, it doesn't explicitly mention when NOT to use it or name specific alternatives among the sibling tools, though the distinction from other cache-related tools is implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cloudflare_set_proxyAInspect

Enable or disable Cloudflare CDN proxy for a site.

When enabled (orange cloud): traffic goes through Cloudflare's CDN, gets caching, DDoS protection, and SSL termination at the edge. When disabled (grey cloud): traffic goes directly to origin server.

Requires: API key with write scope.

Args: slug: Site identifier proxied: true to enable CDN proxy, false to disable

Returns: {"domain": "my-site.borealhost.ai", "proxied": true, "ip": "1.2.3.4"}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
proxiedYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it explains the operational consequences of both proxy states (traffic routing, caching, DDoS protection, SSL termination), mentions authentication requirements ('API key with write scope'), and describes the return format. It doesn't mention rate limits or error behaviors, but covers the essential mutation effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with a clear purpose statement, usage explanation, requirements, parameter details, and return example—all in well-organized sections. Every sentence adds value with zero wasted text, and key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description does an excellent job covering purpose, behavior, parameters, and even provides a return example. The main gap is lack of explicit error handling or edge case information, but given the tool's straightforward nature, it's nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage (schema only has titles 'Slug' and 'Proxied'), the description fully compensates by explaining both parameters: 'slug' as 'Site identifier' and 'proxied' as 'true to enable CDN proxy, false to disable'. This adds crucial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Enable or disable Cloudflare CDN proxy for a site') and distinguishes it from sibling tools like cloudflare_proxy_status (which likely checks status) and cloudflare_purge_cache (which manages cache). It specifies the exact resource affected (Cloudflare CDN proxy configuration for a site).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (to enable or disable the CDN proxy) and explains the effects of both states (orange vs grey cloud). However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools, though the distinction is implied by the tool's unique purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

complete_checkoutAInspect

Complete checkout with payment and start site provisioning.

The checkout must be in "ready" status.

Two payment methods:

  • "stripe_checkout" (default): Returns a short, chat-safe payment URL. Present payment_url to the human — NOT stripe_checkout_url. The raw Stripe URL has a required #fragment that chat UIs routinely strip when rendering markdown links, which causes Stripe to show "page not found". payment_url is a short BorealHost redirect that preserves the fragment via HTTP 302. Then poll get_checkout_status() until status becomes "completed". The API key appears in the first poll after payment (shown once, then cleared).

  • "stripe_payment_method": Charges a Stripe PaymentMethod directly. Requires payment_method_id. On success, returns the API key immediately.

Args: checkout_id: Checkout session ID payment_method: "stripe_checkout" or "stripe_payment_method" payment_method_id: Stripe PaymentMethod ID (pm_...). Required only for "stripe_payment_method".

Returns (stripe_checkout): {"id": "uuid", "status": "awaiting_payment", "payment_url": "https://borealhost.ai/pay//?s=", "stripe_checkout_url": "https://checkout.stripe.com/c/pay/...", "message": "Present payment_url to the human..."}

Returns (stripe_payment_method): {"id": "uuid", "status": "completed", "api_key": "bh_...", "api_key_message": "Store this API key securely...", "subscription_id": "uuid", "provisioning_job_id": "uuid"}

Errors: VALIDATION_ERROR: Missing payment_method_id for stripe_payment_method FORBIDDEN: Checkout not in "ready" status

ParametersJSON Schema
NameRequiredDescriptionDefault
checkout_idYes
payment_methodNostripe_checkout
payment_method_idNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the tool triggers payment and provisioning, has status prerequisites ('ready' status), handles two distinct payment flows with different outcomes (URL vs. immediate API key), includes post-payment steps (polling), and mentions error conditions (VALIDATION_ERROR, FORBIDDEN). It lacks explicit rate limit or authentication details, but covers most critical operational aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (overview, prerequisites, payment methods, args, returns, errors) and front-loaded key information. Most sentences earn their place by providing essential operational details. It could be slightly more concise in the returns sections, but overall it's efficient and avoids redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (payment processing, provisioning, multiple flows) and lack of annotations or output schema, the description is highly complete. It covers purpose, prerequisites, parameter semantics, detailed return values for both payment methods, error conditions, and integration steps (e.g., polling another tool). This provides all necessary context for an agent to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must fully compensate. It adds substantial meaning beyond the schema: it explains that 'checkout_id' is a session ID, defines the two possible values for 'payment_method' and their implications, specifies that 'payment_method_id' is a Stripe PaymentMethod ID (pm_...) and is required only for 'stripe_payment_method', and clarifies the default behavior. This provides complete parameter context missing from the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('complete checkout with payment and start site provisioning'), identifies the resource ('checkout'), and distinguishes it from siblings like 'create_checkout' (which likely creates a checkout) and 'update_checkout' (which likely updates it). It goes beyond the tool name to explain the outcome.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'The checkout must be in "ready" status.' It also details two payment method alternatives ('stripe_checkout' vs. 'stripe_payment_method') with clear usage instructions for each, including prerequisites like 'payment_method_id' for the latter and post-invocation steps like polling 'get_checkout_status()' for the former.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_alert_ruleAInspect

Create an alert rule to monitor CPU, memory, or disk usage.

When the metric crosses the threshold, a notification is sent via email and/or webhook. Max 10 rules per site.

Requires: API key with write scope.

Args: slug: Site identifier metric: "cpu", "memory", or "disk" (percentage-based) threshold: Threshold value 0-100 (e.g. 90 for 90%) operator: "gt" (greater than) or "lt" (less than). Default: "gt" severity: "warning" or "critical". Default: "warning" cooldown_minutes: Min minutes between repeated alerts. Default: 30 notify_email: Send email notification. Default: true notify_webhook: Optional webhook URL for POST notifications

Returns: {"id": "uuid", "metric": "disk", "threshold": 90, ...}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
metricYes
operatorNogt
severityNowarning
thresholdYes
notify_emailNo
notify_webhookNo
cooldown_minutesNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: it's a write operation (implied by 'Create'), has a rate limit ('Max 10 rules per site'), requires specific permissions ('API key with write scope'), and details notification mechanisms (email/webhook). This covers critical behavioral aspects without contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded, starting with the core purpose, followed by behavioral details, prerequisites, and a clear breakdown of parameters and returns. Each sentence adds essential information without redundancy, making it efficient and easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of an 8-parameter write tool with no annotations or output schema, the description is highly complete. It covers purpose, usage context, behavioral traits, parameter semantics, and even includes a return example. This provides all necessary context for the agent to invoke the tool correctly, compensating for gaps in structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates fully by explaining all 8 parameters in the 'Args' section. It clarifies meanings (e.g., 'metric' options, 'threshold' range, default values like 'gt' for 'operator'), adding significant value beyond the bare schema titles. This ensures parameters are well-understood despite the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create an alert rule') and the resource ('to monitor CPU, memory, or disk usage'), making the purpose specific and actionable. It distinguishes this tool from siblings like 'list_alert_rules' or 'delete_alert_rule' by focusing on creation rather than listing or deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (e.g., for monitoring specific metrics and sending notifications) and mentions prerequisites ('Requires: API key with write scope'). However, it does not explicitly state when not to use it or name alternatives, such as using 'list_alert_rules' to check existing rules first.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_api_keyAInspect

Create a new API key with specified scopes.

Cannot create keys with higher scopes than the current key. Site-scoped keys restrict access to a single site.

Requires: API key with write scope.

Args: name: Human-readable name for the key (1-100 chars) scopes: Comma-separated scopes. Options: "read", "read,write", "read,write,admin". Default: "read" site_slug: Optional — restrict the key to a single site. Omit for account-wide access.

Returns: {"api_key": "bh_...", "key_id": "uuid", "prefix": "bh_...", "name": "My Key", "scopes": ["read", "write"], "message": "Store this API key securely — it will not be shown again."}

Errors: VALIDATION_ERROR: Invalid name, scopes, or max 25 active keys FORBIDDEN: Cannot create keys with higher scopes than current key

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYes
scopesNoread
site_slugNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does so well. It discloses key behavioral traits: permission requirements ('Requires: API key with write scope'), scope limitations ('Cannot create keys with higher scopes than the current key'), site restrictions, and error conditions (VALIDATION_ERROR, FORBIDDEN). It also notes the one-time nature of the API key display. No contradictions exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, constraints, requirements, args, returns, errors) and front-loaded key information. It's appropriately sized, though slightly verbose; every sentence adds value, such as the error details and return message.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a mutation tool with permissions and constraints), no annotations, and no output schema, the description is highly complete. It covers purpose, usage, parameters, return values, and errors comprehensively, leaving no gaps for the agent to infer behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully, which it does. It adds detailed meaning for all three parameters: 'name' (human-readable, 1-100 chars), 'scopes' (comma-separated options with default), and 'site_slug' (optional for site restriction). This goes well beyond the minimal schema titles.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Create') and resource ('new API key'), and distinguishes it from siblings like 'list_api_keys', 'claim_api_key', 'revoke_api_key', and 'set_api_key' by focusing on creation with scopes and site restrictions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit guidance on when to use (e.g., for creating keys with specified scopes) and when not to use (e.g., cannot create keys with higher scopes than current key, site-scoped keys restrict access). It also mentions prerequisites ('Requires: API key with write scope'), though it doesn't name specific alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_b2_snapshotAInspect

Create a B2 cloud-backed snapshot (zero local disk, async).

Streams container data directly to Backblaze B2 via restic. No local disk impact — billed separately at cost+5%. Runs in background — returns immediately with status "creating". Poll list_snapshots() to check when status becomes "completed". Only available for VPS plans.

Requires: API key with write scope.

Args: slug: Site identifier description: Optional description (max 200 chars)

Returns: {"id": "uuid", "name": "...", "status": "creating", "storage_type": "b2", "message": "B2 cloud snapshot started. Poll list_snapshots()..."}

Errors: VALIDATION_ERROR: Not a VPS plan or max snapshots reached

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
descriptionNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and excels by disclosing key behavioral traits: it's async ('Runs in background — returns immediately'), has billing implications ('billed separately at cost+5%'), requires polling ('Poll list_snapshots()'), and includes error conditions ('VALIDATION_ERROR: Not a VPS plan or max snapshots reached'). It also specifies the return format and storage type.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (overview, requirements, args, returns, errors) and every sentence adds value. It's front-loaded with the core functionality and avoids redundancy while maintaining comprehensive coverage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of an async cloud snapshot tool with no annotations and no output schema, the description provides complete context: purpose, usage constraints, behavioral details, parameter semantics, return format, and error conditions. It leaves no gaps for the agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining both parameters: 'slug' as 'Site identifier' and 'description' as 'Optional description (max 200 chars)'. It adds crucial semantic information beyond the bare schema, including constraints and optionality.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create a B2 cloud-backed snapshot') and distinguishes it from sibling tools like 'create_snapshot' by specifying it's zero local disk, async, and uses Backblaze B2 via restic. It explicitly differentiates from other snapshot-related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Only available for VPS plans') and when not to use it (non-VPS plans). It also mentions prerequisites ('Requires: API key with write scope') and alternatives by directing users to 'list_snapshots()' for checking completion status.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_backupAInspect

Create a manual backup (runs asynchronously).

The backup starts in the background. Poll list_backups() to check status.

Requires: API key with write scope.

Args: slug: Site identifier

Returns: {"id": "uuid", "status": "pending", "message": "Backup started. Poll list_backups() to check status."}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behavioral traits: the asynchronous execution ('runs asynchronously', 'starts in the background'), the need for polling to check status, and authentication requirements ('API key with write scope'). It doesn't mention rate limits or error conditions, but covers the essential operational behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and front-loaded: the first sentence states the core purpose, followed by operational details, prerequisites, parameter explanation, and return format. Every sentence earns its place with zero waste, making it highly efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides excellent coverage: purpose, asynchronous behavior, polling instructions, authentication requirements, parameter meaning, and return format. The only minor gap is lack of error condition documentation, but it's otherwise highly complete given the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the single parameter, the description adds significant value by explaining that 'slug' is a 'Site identifier'. This provides essential semantic context that the schema's bare 'Slug' title doesn't convey. The description doesn't provide format examples or constraints, but gives meaningful interpretation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create a manual backup') and resource ('backup'), distinguishing it from siblings like 'create_snapshot' or 'restore_backup' by specifying it's a manual backup process. It provides explicit differentiation through its asynchronous nature and polling requirement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: for creating manual backups that run asynchronously. It also specifies an alternative action ('Poll list_backups() to check status') for monitoring completion, and mentions prerequisites ('Requires: API key with write scope'), giving comprehensive usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_checkoutAInspect

Start a new checkout session to purchase a hosting plan.

No authentication needed. After creating, call update_checkout to set buyer info, then complete_checkout to pay.

Args: sku: Plan SKU in format bh_{plan_slug}_{monthly|annual}. Examples: "bh_site_starter_monthly", "bh_site_pro_annual", "bh_site_managed_monthly", "bh_site_business_annual". Call list_plans() to discover all available plan slugs.

Returns: {"id": "uuid", "sku": "bh_site_starter_monthly", "plan_slug": "site_starter", "billing_period": "monthly", "status": "not_ready", "buyer_email": "", "requested_slug": "", "created_at": "iso8601", "checkout_secret": "base64-token"}

Errors: VALIDATION_ERROR: Invalid SKU format or unknown plan RATE_LIMITED: Max 10 checkouts per IP per hour

ParametersJSON Schema
NameRequiredDescriptionDefault
skuYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and delivers well: it discloses that no authentication is needed, outlines the multi-step workflow, specifies rate limits (10 checkouts per IP per hour), and describes error conditions. It doesn't mention whether the operation is idempotent or has side effects beyond creating a session.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (purpose, authentication, workflow, args, returns, errors). Every sentence earns its place—no wasted words. The front-loaded purpose statement immediately communicates the tool's function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations, no output schema, and minimal schema coverage, the description provides complete context: purpose, usage sequence, parameter details with examples, return value structure, and error conditions. It fully compensates for the lack of structured metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage for its single parameter, but the description fully compensates: it explains the SKU parameter's purpose, format (bh_{plan_slug}_{monthly|annual}), provides concrete examples, and references list_plans() for discovery. This adds substantial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Start a new checkout session') and resource ('to purchase a hosting plan'), distinguishing it from siblings like update_checkout and complete_checkout. It explicitly defines the tool's scope as initiating a purchase flow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool (to start a checkout), when not to use it (no authentication needed), and what alternatives to use next (update_checkout then complete_checkout). It also references list_plans() for discovering available options.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_directoryAInspect

Create a directory in a site's container.

Creates parent directories if they don't exist.

Requires: API key with write scope.

Args: slug: Site identifier path: Relative path of the directory to create

Returns: {"success": true, "path": "uploads/2024", "message": "Directory created"}

Errors: NOT_FOUND: Unknown slug FORBIDDEN: Protected system path

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYes
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it creates parent directories automatically, requires specific authentication (API key with write scope), and lists potential errors (NOT_FOUND, FORBIDDEN). However, it doesn't mention rate limits, idempotency, or whether the operation is reversible.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections: purpose statement, behavioral note, requirement, parameters, return value, and errors. Each sentence earns its place with no redundant information, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides good coverage: purpose, behavior, auth requirements, parameters, return format, and error cases. It's nearly complete but could benefit from mentioning whether the operation is idempotent or what happens if the directory already exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the 2 parameters, the description compensates by explaining both parameters: 'slug' as 'Site identifier' and 'path' as 'Relative path of the directory to create'. This adds crucial meaning beyond the bare schema, though it could provide examples or format details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create a directory') and target resource ('in a site's container'), with the additional detail about creating parent directories. It distinguishes itself from sibling tools like 'delete_file' or 'list_files' by focusing on directory creation rather than file operations or listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing to create directories, but provides no explicit guidance on when to use this tool versus alternatives like 'upload_file' or 'write_file' for file operations. It mentions a prerequisite ('Requires: API key with write scope') which gives some context, but lacks comparison with sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_ftp_accountAInspect

Create an FTP account on a site.

The user is chrooted to the specified directory. Password must be at least 8 characters. Username must be lowercase alphanumeric.

Requires: API key with write scope.

Args: slug: Site identifier username: FTP username (lowercase, max 32 chars) password: Password (min 8 chars) home_dir: Chroot directory (default: /var/www/html)

Returns: {"success": true, "username": "ftpuser", "home_dir": "/var/www/html"}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
home_dirNo/var/www/html
passwordYes
usernameYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: the user is chrooted to a directory, password and username constraints, and authentication requirements ('API key with write scope'). However, it lacks details on error handling, rate limits, or idempotency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the main purpose, followed by detailed constraints and parameters in a clear format. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description does a good job covering inputs, constraints, and authentication needs. It includes a return value example, which helps compensate for the lack of output schema. However, it could be more complete by addressing potential errors or side effects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It adds significant meaning beyond the schema by explaining each parameter's purpose, constraints (e.g., 'username must be lowercase alphanumeric', 'password must be at least 8 characters'), and default values ('home_dir' defaults to '/var/www/html'), which are not in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create an FTP account') and the resource ('on a site'), making the purpose specific and unambiguous. It distinguishes itself from sibling tools like 'remove_ftp_account' by focusing on creation rather than deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: when needing to create an FTP account on a site with specific requirements. It mentions prerequisites ('Requires: API key with write scope') but does not explicitly state when not to use it or name alternatives among siblings, such as 'list_ftp_accounts' for viewing existing accounts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_snapshotAInspect

Create a local container snapshot (async).

Runs in background — returns immediately with status "creating". Poll list_snapshots() to check when status becomes "completed" or "failed".

Available for VPS, dedicated, and cloud plans (any plan with max_snapshots > 0). Local snapshots are stored on the host disk and count against disk quota.

Requires: API key with write scope.

Args: slug: Site identifier description: Optional description (max 200 chars)

Returns: {"id": "uuid", "name": "snap-...", "status": "creating", "storage_type": "local", "message": "Snapshot started. Poll list_snapshots() to check status."}

Errors: VALIDATION_ERROR: Max snapshots reached or insufficient disk quota

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
descriptionNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and excels: it discloses async behavior with polling requirement, storage type ('local'), impact on disk quota, required permissions ('API key with write scope'), and error conditions like quota limits. This goes beyond basic functionality to cover operational traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured and front-loaded with key info (action, async nature). Each sentence adds value: polling instructions, eligibility, storage details, auth requirements, args, returns, and errors. No wasted words, making it efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is highly complete: it covers purpose, usage, behavior, parameters, return values (including example output), and error cases. This provides all necessary context for safe and effective tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It adds crucial semantics: 'slug' is explained as 'Site identifier', and 'description' gets constraints ('Optional', 'max 200 chars') and a default (implied via 'Optional'). This provides clear meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create a local container snapshot'), resource ('snapshot'), and operational mode ('async'), distinguishing it from sibling tools like create_b2_snapshot (which implies a different storage type) and schedule_snapshot (which implies scheduling rather than immediate creation).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided: use when 'max_snapshots > 0' for VPS/dedicated/cloud plans, and not to use if quota limits are reached (implied via errors). It also specifies when to use list_snapshots() for polling instead of expecting immediate completion, and mentions API key requirements, though it doesn't name specific alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

database_search_replaceAInspect

Search and replace in WordPress database (e.g. URL migration).

Handles serialized data safely. Use dry_run=true first to preview changes.

Requires: API key with write scope.

Args: slug: Site identifier old: String to search for (e.g. "http://old-domain.com") new: Replacement string (e.g. "https://new-domain.com") dry_run: Preview only without making changes (default: true)

Returns: {"replacements": 42, "tables_affected": 5, "dry_run": true}

ParametersJSON Schema
NameRequiredDescriptionDefault
newYes
oldYes
slugYes
dry_runNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does well by mentioning safety ('Handles serialized data safely'), authentication requirements ('API key with write scope'), and the dry-run capability. However, it doesn't mention rate limits, error conditions, or what happens when the tool fails mid-execution.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise. It starts with the core purpose, adds important behavioral notes, provides usage guidance, then details parameters and returns. Every sentence earns its place with zero wasted words, and the information is front-loaded appropriately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description does an excellent job covering purpose, usage, parameters, and authentication. The only gap is the lack of explicit output schema documentation, though the 'Returns:' section provides a helpful example. Given the complexity of database search-and-replace operations, a bit more detail on error handling would make it perfect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining all 4 parameters in detail. Each parameter gets clear semantic meaning: 'slug: Site identifier', 'old: String to search for (e.g. "http://old-domain.com")', 'new: Replacement string', and 'dry_run: Preview only without making changes (default: true)' with practical examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Search and replace in WordPress database') and resource ('WordPress database'), distinguishing it from siblings like 'optimize_database' or 'execute_query'. It provides concrete examples ('e.g. URL migration') that clarify the use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides usage guidance: 'Use dry_run=true first to preview changes' gives a clear when-to-use recommendation, and 'Requires: API key with write scope' specifies prerequisites. It distinguishes this tool from read-only siblings by emphasizing the write scope requirement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

decommissionAInspect

Delete a site and schedule resource cleanup (7-day grace period).

WARNING: This is destructive. The site will be inaccessible immediately but data is retained for 7 days before permanent deletion.

Best practice: create a snapshot before decommissioning.

Requires: API key with admin scope.

Args: slug: Site identifier

Returns: {"success": true, "message": "Site scheduled for deletion", "grace_period_days": 7}

Errors: NOT_FOUND: Unknown slug

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does so comprehensively. It discloses critical behavioral traits: destructive nature, immediate inaccessibility, 7-day grace period before permanent deletion, required admin scope, and error conditions. This provides the agent with complete understanding of the tool's behavior and consequences.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by warnings, best practices, requirements, and technical details. Every sentence earns its place by providing critical information without redundancy. The formatting with clear sections (Args, Returns, Errors) enhances readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive operation with no annotations and no output schema, the description provides complete context: purpose, behavioral consequences, prerequisites, parameter meaning, return value format, and error conditions. This gives the agent everything needed to understand when and how to use this tool safely and effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and only one parameter, the description adds essential meaning by explaining 'slug' as 'Site identifier'. While it doesn't provide format examples or validation rules, this basic semantic clarification compensates adequately for the schema's lack of documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Delete a site and schedule resource cleanup') and distinguishes it from siblings by focusing on site deletion rather than other operations like snapshot management or file deletion. It goes beyond just restating the name by explaining the two-phase deletion process.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided on when to use this tool: for deleting sites with a 7-day grace period. It also offers best practice advice ('create a snapshot before decommissioning') and explicitly states prerequisites ('Requires: API key with admin scope'), helping the agent understand usage context and requirements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_accountAInspect

Permanently anonymize the account. Cancels subscriptions, deactivates keys.

WARNING: This is irreversible. The account will be soft-deleted and all personal data anonymized. All sites will be decommissioned.

Requires: API key with admin scope.

Returns: {"success": true, "message": "Account anonymized"}

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description carries the full burden of behavioral disclosure. It effectively describes key behavioral traits: the irreversible nature ('WARNING: This is irreversible'), the soft-delete and anonymization process, decommissioning of sites, and the required admin scope. This covers safety, permissions, and side effects comprehensively.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core action, followed by warnings, requirements, and return value. Each sentence earns its place by providing critical information without fluff, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high-stakes nature of this tool (destructive, irreversible), no annotations, and no output schema, the description is complete. It covers purpose, usage guidelines, behavioral transparency (including warnings and requirements), and even specifies the return format, which compensates for the lack of output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description appropriately does not discuss parameters, focusing instead on behavioral aspects. A baseline of 4 is applied since no parameters exist, and the description adds value elsewhere without redundancy.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('permanently anonymize the account') and resource ('the account'), distinguishing it from siblings like 'update_account' or 'decommission'. It goes beyond just restating the name by detailing the effects (cancels subscriptions, deactivates keys).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool: for permanent anonymization of accounts. It also provides prerequisites ('Requires: API key with admin scope') and warnings about irreversibility, which helps differentiate it from less destructive alternatives like 'update_account' or 'decommission' in the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_alert_ruleAInspect

Delete an alert rule.

Requires: API key with write scope.

Args: slug: Site identifier rule_id: UUID of the alert rule to delete

Returns: {"deleted": true, "id": "uuid"}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
rule_idYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the required API key scope, which is useful for authentication needs, but lacks details on potential side effects (e.g., irreversible deletion), error handling, or rate limits. It adds some behavioral context but could be more comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the purpose, followed by prerequisites and parameter details. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (destructive operation with 2 parameters, no annotations, no output schema), the description is mostly complete: it covers purpose, prerequisites, parameters, and return value. However, it lacks explicit confirmation that deletion is irreversible and does not detail error responses or confirm the tool's idempotency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It explains both parameters: 'slug' as 'Site identifier' and 'rule_id' as 'UUID of the alert rule to delete', adding clear meaning beyond the bare schema. However, it does not specify format details (e.g., UUID version) or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Delete') and resource ('an alert rule'), distinguishing it from sibling tools like 'create_alert_rule' and 'list_alert_rules'. It directly answers what the tool does without being tautological.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use it by specifying prerequisites ('Requires: API key with write scope'), but does not explicitly mention when not to use it or name alternatives like 'list_alert_rules' for checking existing rules before deletion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_cronAInspect

Delete a cron job by line number.

Get line numbers from list_cron().

Requires: API key with write scope.

Args: slug: Site identifier line_number: Line number of the cron entry to delete

Returns: {"deleted": true}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
line_numberYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing the destructive nature ('Delete'), authentication requirements ('API key with write scope'), and expected return format ('{"deleted": true}'). It doesn't mention rate limits or error behaviors, but covers key operational aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections: purpose, prerequisites, parameters, and return value. Every sentence adds value with no redundancy. The information is front-loaded and efficiently presented.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with 2 parameters, 0% schema coverage, and no output schema, the description provides complete context: purpose, usage guidance, authentication needs, parameter meanings, and return format. It adequately compensates for the lack of structured annotations and schema descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains both parameters: 'slug' as 'Site identifier' and 'line_number' as 'Line number of the cron entry to delete' with context about obtaining line numbers from 'list_cron()'. This adds meaningful semantics beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Delete a cron job by line number'), identifies the resource ('cron job'), and distinguishes from the sibling 'list_cron' by explaining the relationship. It goes beyond a tautology by specifying the deletion mechanism.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool ('Get line numbers from list_cron()') and provides prerequisites ('Requires: API key with write scope'). It clearly differentiates from 'list_cron' and implicitly from other cron-related tools by focusing on deletion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_domain_dnsAInspect

Delete a DNS record from a domain.

Requires: API key with write scope.

Args: domain_name: Full domain name (e.g. "example.com") record_id: ID of the DNS record to delete (from list_domain_dns)

Returns: {"success": true, "message": "DNS record deleted"}

Errors: NOT_FOUND: Domain or record not found

ParametersJSON Schema
NameRequiredDescriptionDefault
record_idYes
domain_nameYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and adds valuable behavioral context: it discloses the required API key with write scope, specifies the return format and success message, and lists error conditions (NOT_FOUND). However, it lacks details on rate limits, idempotency, or confirmation prompts.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections (description, requirements, args, returns, errors), front-loaded purpose, and no wasted sentences. Each part earns its place by providing necessary information efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with no annotations and no output schema, the description is quite complete: covers purpose, prerequisites, parameters, return values, and errors. Minor gaps include lack of confirmation behavior or side effects, but it provides sufficient context for safe use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It clearly explains both parameters: 'domain_name' as the full domain name with an example, and 'record_id' with its source ('from list_domain_dns'), adding essential meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Delete a DNS record from a domain'), identifies the resource ('DNS record'), and distinguishes it from siblings like 'add_domain_dns' and 'list_domain_dns' by specifying the destructive operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides clear context for when to use this tool ('Delete a DNS record from a domain') and references a sibling tool ('list_domain_dns') for obtaining the record_id, but does not explicitly state when NOT to use it or name alternatives for similar operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_fileAInspect

Delete a file or directory from a site's container.

Directories are deleted recursively. Protected system paths (e.g. /etc, /usr) cannot be deleted.

Requires: API key with write scope.

Args: slug: Site identifier path: Relative path to delete

Returns: {"success": true, "path": "...", "message": "Deleted"}

Errors: NOT_FOUND: Path doesn't exist FORBIDDEN: Protected system path

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYes
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behavioral traits: it's a destructive operation (implied by 'Delete'), directories are deleted recursively, protected system paths cannot be deleted, authentication requirements (API key with write scope), and error conditions (NOT_FOUND, FORBIDDEN). It doesn't cover rate limits or exact response format details, but provides substantial context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (purpose, behavior, requirements, args, returns, errors), each sentence adds value, and it's front-loaded with the core purpose. No wasted words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with 2 parameters, 0% schema coverage, no annotations, and no output schema, the description does a strong job: it explains the action, constraints, auth needs, parameters, return values, and errors. The main gap is lack of output schema, but the description compensates by documenting the return structure. Slightly more detail on path format or examples would make it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explicitly documents both parameters ('slug: Site identifier', 'path: Relative path to delete'), adding clear meaning beyond the bare schema. However, it doesn't provide examples or format details (e.g., path syntax), leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Delete a file or directory'), the resource ('from a site's container'), and distinguishes from siblings like 'list_files' or 'read_file' by specifying the destructive nature. It goes beyond just restating the name by explaining scope and behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (to delete files/directories from a site container) and mentions prerequisites ('Requires: API key with write scope'). However, it doesn't explicitly state when NOT to use it or name specific alternatives among siblings, though the context implies alternatives like 'list_files' for viewing or 'upload_file' for adding.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_snapshotAInspect

Delete a snapshot (local or B2).

Requires: API key with write scope.

Args: slug: Site identifier snapshot_id: UUID of the snapshot to delete

Returns: {"success": true, "message": "Snapshot deleted"}

Errors: NOT_FOUND: Snapshot not found

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
snapshot_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses key behavioral traits: the destructive nature ('Delete'), authentication requirements ('API key with write scope'), and error conditions ('NOT_FOUND: Snapshot not found'). It also hints at the return format, though not fully detailed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections (Requires, Args, Returns, Errors), making it easy to scan. It's front-loaded with the core action. Some redundancy exists (e.g., 'Delete' in description and tool name), but overall it's efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with no annotations and no output schema, the description is quite complete. It covers purpose, prerequisites, parameters, return values, and errors. Minor gaps include lack of details on side effects (e.g., irreversible deletion) or rate limits.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning beyond the schema by explaining 'slug' as 'Site identifier' and 'snapshot_id' as 'UUID of the snapshot to delete', which clarifies their roles and formats effectively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete') and resource ('a snapshot'), specifying it can be 'local or B2'. It distinguishes from siblings like 'create_snapshot' and 'rollback_snapshot' by focusing on deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit prerequisites ('Requires: API key with write scope') and context for when to use (deleting snapshots). However, it doesn't specify when NOT to use or mention alternatives like 'rollback_snapshot' for reverting instead of deleting.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deployAInspect

Trigger a deployment for a site.

Requires: API key with write scope. This may take up to 60 seconds.

Args: slug: Site identifier

Returns: {"success": true, "message": "Deployment triggered"}

Errors: NOT_FOUND: Unknown slug

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing authentication requirements ('API key with write scope'), timing behavior ('may take up to 60 seconds'), and error conditions ('NOT_FOUND: Unknown slug'). It doesn't mention rate limits or idempotency, but covers key operational aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and front-loaded: purpose first, then requirements, timing, parameters, returns, and errors. Every sentence earns its place with zero wasted words, making it easy to scan and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations or output schema, the description is quite complete—covering purpose, auth, timing, parameters, returns, and errors. It could be slightly more comprehensive by mentioning side effects or idempotency, but it provides sufficient context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the single parameter 'slug', the description compensates by explaining it as 'Site identifier' in the Args section. This adds essential meaning beyond the bare schema, though it could provide more context about slug format or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Trigger a deployment') and resource ('for a site'), distinguishing it from siblings like 'get_site_status' (read-only) or 'create_snapshot' (different operation). It precisely communicates the tool's function without ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use it ('Requires: API key with write scope') and timing expectations ('may take up to 60 seconds'), but doesn't explicitly mention when NOT to use it or name specific alternatives among the many sibling tools (e.g., 'rollback_snapshot' for undoing deployments).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_detailAInspect

Get full domain details including DNS and infrastructure status.

Requires: API key with read scope.

Args: domain_name: Full domain name (e.g. "example.com")

Returns: {"domain": "example.com", "status": "active", "expires_at": "iso8601", "auto_renew": true, "nameservers": ["ns1.borealhost.ai", "ns2.borealhost.ai"], "dns_records": [...], "linked_site": "my-site"}

Errors: NOT_FOUND: Domain not owned by this account

ParametersJSON Schema
NameRequiredDescriptionDefault
domain_nameYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It effectively discloses key behavioral traits: it's a read operation (implied by 'Get'), requires specific authentication ('API key with read scope'), and includes error handling ('NOT_FOUND: Domain not owned by this account'). It adds context on what data is returned (DNS, infrastructure status) and ownership constraints, though it could mention rate limits or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, followed by prerequisites, args, returns, and errors in a structured format. Every section adds value without redundancy, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (detailed retrieval with authentication and error handling), no annotations, no output schema, and low schema coverage, the description is mostly complete. It covers purpose, prerequisites, parameters, return values, and errors. However, it lacks details on output structure (e.g., nested objects in 'dns_records') or additional behavioral aspects like pagination or caching.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 0% description coverage, so the description must compensate. It adds meaning by explaining the parameter ('domain_name: Full domain name') and providing an example ('e.g. "example.com"'), clarifying the expected format. However, it does not detail constraints like domain format validation or length limits.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Get') and resource ('full domain details'), including scope ('including DNS and infrastructure status'). It distinguishes itself from siblings like 'list_domains' (which likely lists domains) and 'search_domain' (which might search for domains) by focusing on detailed retrieval for a specific domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: to retrieve comprehensive details for a specific domain. It mentions prerequisites ('Requires: API key with read scope'), which helps guide usage. However, it does not explicitly state when not to use it or name alternatives (e.g., 'list_domains' for a list or 'search_domain' for searching), missing full differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_settingsAInspect

Update domain settings (auto-renew, WHOIS privacy, registrar lock).

Only provided (non-None) fields are updated.

Requires: API key with write scope.

Args: domain_name: Full domain name (e.g. "example.com") auto_renew: Enable/disable automatic renewal whois_privacy: Enable/disable WHOIS privacy protection locked: Enable/disable registrar lock (prevents unauthorized transfers)

Returns: {"success": true, "domain": "example.com", "auto_renew": true, "whois_privacy": true, "locked": true}

Errors: NOT_FOUND: Domain not found or not owned by account

ParametersJSON Schema
NameRequiredDescriptionDefault
lockedNo
auto_renewNo
domain_nameYes
whois_privacyNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does well by specifying the mutation nature ('Update'), the partial update behavior ('Only provided (non-None) fields are updated'), authentication requirements ('API key with write scope'), and error conditions ('NOT_FOUND: Domain not found or not owned by account'). It doesn't mention rate limits or other constraints, but covers the essential behavioral aspects for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections: purpose statement, behavioral note, requirements, parameters, return values, and errors. Every sentence earns its place by providing essential information. The front-loaded purpose statement immediately communicates the tool's function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 4 parameters, 0% schema coverage, no annotations, and no output schema, the description provides comprehensive coverage. It explains what the tool does, when to use it, behavioral characteristics, all parameter meanings, return format, and error conditions. This is complete enough for an agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing clear documentation for all 4 parameters. Each parameter is listed in the Args section with meaningful explanations: 'domain_name' specifies format requirements, and the three boolean parameters clearly explain what they control. This adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Update') and resource ('domain settings'), listing the three specific settings that can be modified (auto-renew, WHOIS privacy, registrar lock). This distinguishes it from sibling tools like 'domain_detail' (which likely reads settings) and 'register_domain' (which creates new domains).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: when needing to modify domain settings. It explicitly states the prerequisite 'Requires: API key with write scope.' However, it doesn't explicitly mention when NOT to use it or name specific alternatives among the sibling tools, though the context implies it's for updates rather than reads or creation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

execute_queryAInspect

Execute a SQL query on a site's database.

Supports SELECT, INSERT, UPDATE, DELETE, and DDL statements. Results are limited to 1000 rows for SELECT queries.

Requires: API key with write scope.

Args: slug: Site identifier database: Database name query: SQL query string

Returns: {"columns": ["id", "title"], "rows": [[1, "Hello"], ...], "affected_rows": 0, "query_time_ms": 12}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
queryYes
databaseYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: the tool supports various SQL operations, imposes a row limit on SELECT queries, requires an API key with write scope, and returns structured results including columns, rows, affected rows, and query time. This covers critical aspects like permissions, limitations, and output format, though it could add more on error handling or transaction behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded, with the core purpose stated first, followed by key details (supported statements, limits, requirements), and ending with parameter and return value explanations. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (executing arbitrary SQL queries) and the lack of annotations and output schema, the description is largely complete. It covers purpose, usage constraints, parameters, and return format. However, it could be more comprehensive by addressing potential errors, security implications, or transactional behavior, which are relevant for a tool with such broad capabilities.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% description coverage. It clearly explains each parameter: 'slug' as the site identifier, 'database' as the database name, and 'query' as the SQL query string. This compensates fully for the schema's lack of descriptions, providing essential context for proper usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Execute a SQL query on a site's database.' It specifies the verb ('Execute'), resource ('SQL query'), and target ('site's database'), making it distinct from sibling tools like 'database_search_replace' or 'optimize_database' which have different functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage: it lists supported SQL statement types (SELECT, INSERT, UPDATE, DELETE, DDL) and notes the 1000-row limit for SELECT queries. However, it does not explicitly state when to use this tool versus alternatives like 'database_search_replace' or 'optimize_database', nor does it mention prerequisites beyond the API key requirement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_app_statusAInspect

Get app installation status and log.

Poll this after install_app() to track progress.

Requires: API key with read scope.

Args: slug: Site identifier app_id: App ID from install_app() response

Returns: {"id": "uuid", "app_name": "forge", "status": "running"|"installing"|"failed", "install_log": "..."}

Statuses: "installing", "running", "stopped", "failed", "uninstalled"

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
app_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well: it discloses authentication requirements ('API key with read scope'), polling behavior, and the tool's read-only nature (implied by 'Get' and polling context). It could mention rate limits or error handling but covers key behavioral aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with purpose, followed by usage, requirements, parameters, and returns in a logical structure. Every sentence adds value with zero waste, efficiently covering multiple dimensions in minimal space.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter tool with no annotations or output schema, the description is complete: it explains purpose, usage, auth, parameters, return structure, and status values. No gaps remain for the agent to understand and invoke this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It clearly explains both parameters: 'slug' as 'Site identifier' and 'app_id' as 'App ID from install_app() response', adding crucial context not in the schema's generic titles ('Slug', 'App Id').

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Get app installation status and log') and identifies the resource ('app'). It distinguishes from siblings like 'install_app' (which it references) and 'get_site_status' (different resource).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('Poll this after install_app() to track progress') and provides prerequisites ('Requires: API key with read scope'). It clearly differentiates from the installation tool and implies this is for monitoring rather than action.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_billing_portalAInspect

Get a Stripe billing portal URL for managing payment methods and invoices.

Returns a URL (not a redirect) that the human can open in a browser.

Requires: API key with read scope.

Args: flow: Optional. Set to "payment_method_update" to go directly to the payment method update page.

Returns: {"url": "https://billing.stripe.com/p/session/..."}

ParametersJSON Schema
NameRequiredDescriptionDefault
flowNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behavioral traits: it returns a URL (not a redirect), requires an API key with read scope, and specifies the human-facing nature. It could improve by mentioning rate limits or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with each sentence earning its place: purpose statement, behavioral detail, requirement, parameter explanation, and return value. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 optional parameter) and no output schema, the description is nearly complete, covering purpose, behavior, requirements, parameters, and returns. It could slightly improve by mentioning authentication details beyond 'API key with read scope'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaningful semantics for the single parameter 'flow', explaining its optional nature and providing an example value ('payment_method_update') with its effect, which goes beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Get') and resource ('Stripe billing portal URL'), and distinguishes it from siblings by specifying it returns a URL for managing payment methods and invoices, which no other sibling tool addresses.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (to get a billing portal URL for payment/invoice management) and mentions an API key requirement, but does not explicitly state when not to use it or name alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_checkout_statusAInspect

Poll a checkout session for status updates.

Call this after complete_checkout to track payment and provisioning.

Polling strategy:

  • First 60 seconds: every 5 seconds

  • After 60 seconds: every 15 seconds

  • Stop after 10 minutes if not completed

Checkout statuses (in order):

  • "not_ready": Missing required fields (slug)

  • "ready": All fields set, awaiting payment

  • "awaiting_payment": Stripe checkout page opened, waiting for human

  • "in_progress": Payment received, site being provisioned

  • "completed": Site ready — API key included (shown once, then cleared)

  • "canceled": Checkout was abandoned

  • "failed": Payment or provisioning failed

Terminal statuses: "completed", "canceled", "failed".

Args: checkout_id: Checkout session ID

Returns (when completed): {"id": "uuid", "status": "completed", "api_key": "bh_...", "api_key_message": "Store this API key securely...", "subscription_id": "uuid", "completed_at": "iso8601"}

Note: The api_key field appears ONCE in the first poll after completion, then is permanently cleared. Store it immediately.

ParametersJSON Schema
NameRequiredDescriptionDefault
checkout_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: polling strategy (timing and stop conditions), status lifecycle (including terminal statuses), and critical handling of the api_key (appears once then cleared). However, it doesn't mention error handling or rate limits, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded, starting with the core purpose and usage, then detailing polling strategy, statuses, and returns. Every sentence adds value—no fluff—and information is organized logically for quick comprehension, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (polling with status transitions) and lack of annotations or output schema, the description is complete. It covers purpose, usage, behavior, parameters, and return values in detail, providing all necessary context for an agent to invoke it correctly without relying on external documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, but the description compensates fully by explaining the single parameter: 'checkout_id: Checkout session ID.' It adds essential context beyond the schema's basic type, clarifying what the parameter represents and its role in the polling process.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: 'Poll a checkout session for status updates.' It specifies the verb ('poll') and resource ('checkout session'), and distinguishes it from siblings by referencing 'complete_checkout' as a prerequisite, making its role clear and specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines: 'Call this after complete_checkout to track payment and provisioning.' It clearly indicates when to use this tool (after a specific sibling tool) and implies when not to use it (e.g., not for initial checkout creation), offering direct context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_database_infoAInspect

Get WordPress database information (size, tables, row counts).

Requires: API key with read scope. WordPress sites only.

Args: slug: Site identifier

Returns: {"database": "wp_mysite", "size_mb": 45.2, "tables": 12, "total_rows": 15432}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively communicates that this is a read-only operation (implied by 'Get'), specifies authentication requirements ('API key with read scope'), and restricts usage to 'WordPress sites only'. It also provides a concrete example of the return format, though it lacks details on error handling or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by prerequisites, parameter details, and a return example. Each section is concise and directly relevant, with no wasted sentences or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (single parameter, no output schema, no annotations), the description is largely complete. It covers purpose, prerequisites, parameter semantics, and return format. However, it could improve by mentioning potential errors or limitations, such as handling invalid slugs or site accessibility issues.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must fully compensate. It clearly explains the single parameter 'slug' as 'Site identifier' in the Args section, adding essential meaning beyond the schema's generic 'Slug' title. This is sufficient given only one parameter exists.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get WordPress database information') and enumerates the exact resources returned ('size, tables, row counts'). It distinguishes itself from sibling tools like 'list_databases' or 'list_tables' by focusing on detailed metrics for a specific site rather than listing multiple databases or tables.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states prerequisites ('Requires: API key with read scope. WordPress sites only.'), providing clear context for when to use this tool. However, it does not mention alternatives or when not to use it compared to siblings like 'get_site_status' or 'database_search_replace', which could offer overlapping functionality.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_logsAInspect

Retrieve container logs (error, access, or PHP).

Requires: API key with read scope.

Args: slug: Site identifier log_type: "error" (Nginx/Apache errors), "access" (HTTP request log), or "php" (PHP-FPM errors, WordPress sites only) lines: Number of lines to retrieve (1–500, default: 100) search: Optional keyword filter — only lines containing this string

Returns: {"log_type": "error", "lines": ["2024-01-15 ... error ...", ...], "count": 42, "truncated": false}

Errors: NOT_FOUND: Unknown slug VALIDATION_ERROR: Invalid log_type or lines out of range

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
linesNo
searchNo
log_typeNoerror
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does so effectively by disclosing key behavioral traits: it requires specific authentication ('API key with read scope'), describes error conditions ('NOT_FOUND', 'VALIDATION_ERROR'), and outlines the return format. It does not mention rate limits or destructive effects, but covers essential operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized, with each section ('Requires:', 'Args:', 'Returns:', 'Errors:') adding value without redundancy. Sentences are front-loaded with key information, and there is no wasted text, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, no annotations, no output schema), the description is complete enough. It covers authentication needs, parameter details, return values with an example, and error cases, providing all necessary context for an agent to invoke the tool correctly without relying on structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% description coverage. It explains each parameter's purpose (e.g., 'slug: Site identifier', 'log_type' options with details, 'lines' range and default, 'search' as a keyword filter), compensating fully for the schema's lack of documentation and providing clear semantics for all four parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Retrieve container logs') and resources ('error, access, or PHP'), distinguishing it from sibling tools like 'read_file' or 'get_metrics' by focusing on container logs. It precisely identifies the types of logs available, making its scope unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage by specifying prerequisites ('Requires: API key with read scope') and detailing the log_type options, which helps in selecting this tool. However, it does not explicitly mention when not to use it or name alternatives among siblings, such as 'get_app_status' for other site data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_metricsAInspect

Get traffic and performance metrics for a site.

Requires: API key with read scope.

Args: slug: Site identifier days: Number of days of history (1–90, default: 7)

Returns: {"requests": [...], "bandwidth": [...], "errors": [...], "period": {"start": "iso8601", "end": "iso8601"}}

Errors: NOT_FOUND: Unknown slug VALIDATION_ERROR: days out of range

ParametersJSON Schema
NameRequiredDescriptionDefault
daysNo
slugYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and excels by disclosing key behavioral traits: authentication requirements ('API key with read scope'), error conditions ('NOT_FOUND', 'VALIDATION_ERROR'), and the structure of return data with examples, which helps the agent understand operational constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, requirements, args, returns, errors), front-loaded key information, and every sentence adds value without redundancy, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description provides complete context: it covers purpose, prerequisites, parameters, return format with examples, and error handling, which is sufficient for a read-only tool with two parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate, and it does so effectively by explaining both parameters: 'slug' as 'Site identifier' and 'days' with its range and default value, adding crucial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Get') and resource ('traffic and performance metrics for a site'), distinguishing it from sibling tools like 'get_logs' or 'get_site_status' that focus on different data types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides clear context for when to use this tool (to retrieve metrics for a site) and includes prerequisites ('Requires: API key with read scope'), but does not explicitly mention when not to use it or name alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_resource_snapshotAInspect

Get current resource usage (CPU, memory, disk, load average).

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"cpu_percent": 12.5, "memory_mb": 384, "memory_total_mb": 512, "disk_used_gb": 3.2, "disk_total_gb": 10, "load_1m": 0.5, "load_5m": 0.3, "load_15m": 0.2}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing authentication requirements ('API key with read scope'). It also implies read-only behavior through 'Get' and shows the return format, though it doesn't mention rate limits, permissions beyond scope, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with purpose statement, requirements, parameter documentation, and return example in four clear sections. Every sentence earns its place with no wasted words, and key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter read operation with no output schema, the description is complete: it explains purpose, requirements, parameter meaning, and provides detailed return format with example values. No annotations exist, so the description fully covers behavioral aspects needed for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explicitly documents the single parameter 'slug' as 'Site identifier' and provides the complete return structure with all fields and example values, adding significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get current resource usage') and resources involved (CPU, memory, disk, load average). It distinguishes from siblings by focusing on real-time system metrics rather than other operations like backups, deployments, or database management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'Requires: API key with read scope' which provides clear context about prerequisites. However, it doesn't specify when to use this tool versus alternatives like 'get_metrics' or 'get_site_status' that might overlap in functionality.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_site_statusAInspect

Get detailed status of a hosted site including resources, domains, and modules.

Requires: API key with read scope.

Args: slug: Site identifier (the slug chosen during checkout)

Returns: {"slug": "my-site", "plan": "site_starter", "status": "active", "domains": ["my-site.borealhost.ai"], "modules": {...}, "resources": {"memory_mb": 512, "cpu_cores": 1, "disk_gb": 10}, "created_at": "iso8601"}

Errors: NOT_FOUND: Unknown slug or not owned by this account

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool as a read operation (implied by 'Get'), specifies authentication requirements ('API key with read scope'), and outlines error conditions ('NOT_FOUND: Unknown slug or not owned by this account'), adding valuable context beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by sections for requirements, arguments, returns, and errors. Each sentence earns its place by providing critical information without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (1 parameter, no output schema, no annotations), the description is complete. It covers purpose, prerequisites, parameter details, return value example, and error handling, providing all necessary context for an agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It clearly explains the single parameter 'slug' as 'Site identifier (the slug chosen during checkout)', adding essential meaning not present in the schema. This fully addresses the parameter semantics gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Get') and resource ('detailed status of a hosted site'), listing key components like resources, domains, and modules. It distinguishes from siblings like 'get_app_status' by focusing on site-level status rather than app-level.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage with the 'Requires: API key with read scope' statement, indicating prerequisites. However, it does not explicitly state when to use this tool versus alternatives like 'get_app_status' or 'domain_detail', nor does it mention exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_snapshot_usageAInspect

Get snapshot disk usage and quota info for a site.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"disk_quota_gb": 200, "max_snapshots": 5, "snapshot_count": 2, "local_snapshot_bytes": 1234, "b2_snapshot_bytes": 5678, "can_create": true}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool as a read operation ('Get'), specifies authentication requirements ('API key with read scope'), and outlines the return format with example data. It does not mention rate limits or error handling, but covers key behavioral aspects adequately for a read-only tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded, starting with the core purpose, followed by requirements, arguments, and returns in clear sections. Every sentence adds value without redundancy, making it efficient and easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is largely complete. It covers purpose, authentication, parameters, and return values with an example. However, it lacks details on error cases or edge behaviors, which could enhance completeness for a production tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, but the description compensates by explaining the single parameter 'slug' as 'Site identifier'. This adds meaningful context beyond the schema's basic type information. Since there is only one parameter, the description provides sufficient semantic clarity, though it could elaborate on format or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Get snapshot disk usage and quota info') and identifies the target resource ('for a site'). It distinguishes itself from siblings like 'list_snapshots' (which likely lists snapshots) or 'get_site_status' (which provides general status) by focusing specifically on usage metrics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Get snapshot disk usage and quota info for a site') and includes a prerequisite ('Requires: API key with read scope'). However, it does not explicitly state when not to use it or name specific alternatives among the siblings, such as 'get_site_status' for broader site information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_ssh_infoAInspect

Get SSH connection info for a VPS/dedicated site.

Only available for VPS/dedicated plans (not shared hosting).

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"host": "184.107.x.x", "port": 22, "username": "admin", "ssh_command": "ssh admin@184.107.x.x"}

Errors: NOT_FOUND: Unknown slug FORBIDDEN: Plan does not support SSH (shared plans)

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It effectively discloses key behavioral traits: the tool is read-only (implied by 'Get'), requires specific permissions ('API key with read scope'), and has plan-based access restrictions. It also outlines error conditions ('NOT_FOUND', 'FORBIDDEN'), adding valuable context beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured and front-loaded with the core purpose, followed by prerequisites, arguments, returns, and errors in clear sections. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is complete. It covers purpose, usage constraints, parameters, return values with an example, and error cases, providing all necessary context for an agent to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaningful semantics by explaining that 'slug' is a 'Site identifier', which clarifies the parameter's purpose beyond the schema's generic 'Slug' title. However, it doesn't detail format or examples for the slug.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Get') and resource ('SSH connection info for a VPS/dedicated site'). It distinguishes itself from siblings by focusing on SSH information retrieval, unlike tools for backups, snapshots, or domain management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('Only available for VPS/dedicated plans') and when not to use ('not shared hosting'). It also specifies prerequisites ('Requires: API key with read scope'), providing clear context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_stack_infoAInspect

Get detailed system stack information (OS, PHP, DB, web server versions).

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"os": "Debian 12", "kernel": "6.1.0", "php": "8.3.4", "mysql": "10.11.6-MariaDB", "nginx": "1.24.0", "wordpress": "6.5"}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior by specifying the authentication requirement ('API key with read scope') and providing a concrete example of the return value, which helps the agent understand what data to expect. However, it lacks details on potential errors, rate limits, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by requirements, arguments, and returns in a clear format. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (1 parameter, no output schema, no annotations), the description is mostly complete. It covers the purpose, authentication, parameter semantics, and provides an example return value. However, it could improve by mentioning potential error cases or limitations, such as what happens if the slug is invalid.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It adds meaning by explaining that the 'slug' parameter is a 'Site identifier', which clarifies its purpose beyond the schema's generic 'Slug' title. Since there is only one parameter, this additional context is sufficient to guide usage effectively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'detailed system stack information', specifying the exact components (OS, PHP, DB, web server versions). It distinguishes itself from sibling tools like get_database_info or get_site_status by focusing specifically on system stack details.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some usage context by stating 'Requires: API key with read scope', which implies when authentication is needed. However, it does not explicitly guide when to use this tool versus alternatives like get_site_status or get_metrics, nor does it mention any exclusions or prerequisites beyond the API key.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

install_appAInspect

Install an app template on a VPS/Cloud site.

Starts a background installation. Poll get_app_status() for progress.

Requires: API key with write scope. VPS or Cloud plan only.

Args: slug: Site identifier template: App template slug. Available: django, laravel, nextjs, nodejs, nuxtjs, rails, static, forge app_name: Short name for the app (2-50 chars, lowercase alphanumeric + hyphens). Used as subdomain: {app_name}.{site_domain} db_type: Database type. "none", "mysql", or "postgresql" (depends on template) domain: Custom domain override (default: {app_name}.{site_domain}) display_name: Human-friendly name (default: derived from app_name)

Returns: {"id": "uuid", "app_name": "forge", "status": "installing", "message": "Installation started. Poll for progress."}

Errors: FORBIDDEN: Plan does not support apps (shared plans) VALIDATION_ERROR: Invalid template, app_name, or duplicate name

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
domainNo
db_typeNonone
app_nameYes
templateYes
display_nameNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing: background/asynchronous nature, authentication requirements (write scope), plan restrictions (VPS/Cloud only), error conditions, and return format. It doesn't mention rate limits or idempotency, but covers most critical behavioral aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections (purpose, usage, args, returns, errors). Some redundancy exists (app_name constraints appear twice), but overall efficient with each sentence adding value. Could be slightly more front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex mutation tool with 6 parameters, 0% schema coverage, and no output schema, the description provides everything needed: clear purpose, usage guidelines, parameter details, return format, error conditions, and integration instructions (polling). No significant gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description provides comprehensive parameter documentation: explains each parameter's purpose, provides template enumeration, character limits for app_name, default values, domain substitution pattern, and database type dependencies. This fully compensates for the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Install an app template'), target resource ('on a VPS/Cloud site'), and distinguishes from siblings like 'deploy' or 'toggle_module' by focusing on template-based app installation. It provides a complete verb+resource+scope statement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('Starts a background installation. Poll get_app_status() for progress'), prerequisites ('Requires: API key with write scope. VPS or Cloud plan only'), and exclusions ('FORBIDDEN: Plan does not support apps (shared plans)'). It clearly distinguishes this from monitoring tools like get_app_status.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_alert_rulesAInspect

List user-configurable alert rules for a site.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: [{"id": "uuid", "metric": "disk", "operator": "gt", "threshold": 90, "severity": "warning", "enabled": true, "cooldown_minutes": 30, "notify_email": true}]

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It adds value by specifying the authentication requirement ('API key with read scope') and the return format with example data. However, it lacks details on potential behaviors like pagination, error handling, rate limits, or whether the operation is read-only (implied by 'list' but not explicitly stated). The description does not contradict any annotations, as none exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the purpose clearly, followed by prerequisite and parameter/return details in a structured format. Every sentence earns its place by adding necessary information. It could be slightly more concise by integrating the 'Args' and 'Returns' sections more fluidly, but overall it is efficient and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a simple read operation with one parameter), no annotations, no output schema, and low schema coverage, the description is fairly complete. It covers purpose, authentication, parameter semantics, and return format with an example. However, it could improve by addressing potential behavioral aspects like pagination or error cases, but for a basic list tool, it provides enough context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It adds meaning by explaining the 'slug' parameter as 'Site identifier', which clarifies its purpose beyond the schema's generic 'Slug' title. Since there is only one parameter, this is sufficient to understand its role. However, it does not provide additional details like format examples or constraints, which could be helpful.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List user-configurable alert rules for a site.' It specifies the verb ('list'), resource ('alert rules'), and scope ('for a site'), which is specific and actionable. However, it does not explicitly differentiate from sibling tools like 'create_alert_rule' or 'delete_alert_rule', though the distinction is implied by the verb 'list' versus 'create'/'delete'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool: it lists alert rules for a site, and it includes a prerequisite ('Requires: API key with read scope.'). This gives guidance on authentication needs. However, it does not explicitly state when not to use it or mention alternatives, such as using 'create_alert_rule' for adding rules instead of listing them, though the sibling tool names make this somewhat obvious.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_api_keysAInspect

List all API keys for the account.

Shows key metadata (name, prefix, scopes, last used) but never the full key value.

Requires: API key with read scope.

Returns: [{"id": "uuid", "name": "My Key", "prefix": "bh_a2...", "scopes": ["read", "write"], "is_active": true, "created_at": "iso8601", "last_used_at": "iso8601"|null, "site_slug": null|"my-site"}]

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively adds context beyond basic functionality: it specifies what metadata is shown (e.g., 'name, prefix, scopes, last used') and explicitly states what is not shown ('never the full key value'), clarifies authentication requirements ('Requires: API key with read scope'), and details the return format with a clear example, covering output behavior comprehensively.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, followed by key details in bullet-like sections ('Shows...', 'Requires:', 'Returns:'). Every sentence earns its place by providing essential information without waste, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a read operation with no inputs but specific output and requirements), the description is complete. It covers purpose, behavioral traits (e.g., what data is included/excluded), prerequisites, and a detailed return example. Since there is no output schema, the description effectively fills that gap, making it sufficient for an agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% description coverage, so the schema fully documents the lack of inputs. The description does not add parameter information, which is unnecessary here. Since there are no parameters, a baseline score of 4 is appropriate, as the description compensates by focusing on output and requirements without redundancy.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List all API keys for the account.' It specifies the verb ('List'), resource ('API keys'), and scope ('for the account'), distinguishing it from sibling tools like 'create_api_key' or 'revoke_api_key' that perform different operations on the same resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage with 'Requires: API key with read scope,' indicating prerequisites. However, it does not explicitly state when to use this tool versus alternatives (e.g., 'claim_api_key' or 'set_api_key'), nor does it mention exclusions or comparisons to other list tools like 'list_apps' or 'list_domains'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_appsAInspect

List installed apps on a site.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"apps": [{"id": "uuid", "app_name": "forge", "template_slug": "forge", "status": "running", "domain": "forge.mysite.borealhost.ai"}]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the API key requirement, which is useful context for authentication. However, it does not disclose other behavioral traits such as rate limits, pagination, error handling, or whether the operation is read-only or has side effects. The description adds some value but leaves gaps in behavioral transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections: purpose, requirements, arguments, and returns. It uses bullet points and JSON formatting efficiently. However, the 'Returns' section could be more concise by summarizing instead of showing a full JSON example, and some sentences could be tightened for brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is fairly complete. It covers the purpose, authentication requirement, parameter meaning, and return format. However, it lacks details on error cases, pagination, or how to handle multiple sites, which could be relevant for a list operation. The absence of an output schema means the description must explain returns, which it does adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It provides a clear explanation of the 'slug' parameter as 'Site identifier,' which adds meaningful semantics beyond the schema's basic 'Slug' title. Since there is only one parameter, this is sufficient to earn a high score, though it could include more details like format or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List installed apps on a site.' It specifies the verb ('List') and resource ('installed apps on a site'), making it easy to understand what the tool does. However, it does not explicitly differentiate from sibling tools like 'get_app_status' or 'install_app', which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'Requires' section stating 'API key with read scope,' which provides some context for when to use the tool. However, it lacks explicit guidance on when to use this tool versus alternatives like 'get_app_status' or 'install_app', and does not mention any exclusions or prerequisites beyond the API key. This implies usage but does not fully clarify alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_backupsAInspect

List all backups for a site (automatic and manual).

Requires: API key with read scope.

Args: slug: Site identifier

Returns: [{"id": "uuid", "backup_type": "auto"|"manual", "status": "completed", "size_bytes": 1234, "size_display": "1.2 Mo", "timestamp": "iso8601", "notes": "..."}]

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses authentication requirements ('API key with read scope') and the scope of data returned ('all backups for a site'), but doesn't mention rate limits, pagination behavior, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with purpose statement, requirements, and clear sections for Args and Returns. Every sentence adds value with no wasted words, and key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only listing tool with no output schema, the description provides purpose, authentication requirements, parameter explanation, and detailed return format. It could improve by mentioning any limitations (e.g., maximum backups returned) but is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates by explaining the single parameter 'slug' as 'Site identifier', adding meaningful context beyond the schema's generic 'Slug' title. This is sufficient for the single parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('all backups for a site'), specifying both automatic and manual backups. It distinguishes from siblings like 'list_snapshots' and 'restore_backup' by focusing specifically on backups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context ('for a site') and mentions prerequisites ('Requires: API key with read scope'), but doesn't explicitly state when to use this tool versus alternatives like 'list_snapshots' or 'create_backup'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_cronAInspect

List cron jobs on a site.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"jobs": [{"line": 1, "schedule": "*/5 * * * *", "command": "/usr/bin/php /var/www/html/wp-cron.php"}, ...]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool as a read operation ('List'), specifies authentication requirements ('API key with read scope'), and provides a clear example of the return format, covering key behavioral aspects without contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded, starting with the core purpose, followed by requirements, arguments, and returns in a clear, bullet-like format. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one parameter, no output schema, no annotations), the description is largely complete. It covers purpose, prerequisites, parameters, and return format. However, it lacks details on error handling or pagination, which could be relevant for a list operation, preventing a perfect score.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description must compensate. It adds meaning by explaining that 'slug' is a 'Site identifier,' which clarifies the parameter's purpose beyond the schema's generic 'Slug' title. Since there's only one parameter, this is sufficient for a high score, though it doesn't detail format or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('List') and resource ('cron jobs on a site'), making it immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'get_site_status' or other list_* tools, which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some usage guidance by stating 'Requires: API key with read scope,' which indicates prerequisites. However, it doesn't specify when to use this tool versus alternatives (e.g., other list_* tools or site status tools) or any exclusions, leaving the context implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_databasesAInspect

List all databases on a site's container.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"databases": ["wordpress", "app_db", ...]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively adds context by specifying authentication requirements ('API key with read scope') and the return format (a JSON object with a 'databases' array). It does not cover potential limitations like rate limits or pagination, but provides essential operational details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose, followed by requirements, arguments, and returns in a structured format. Every sentence earns its place with no redundant information, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one parameter, no output schema, no annotations), the description is largely complete. It covers purpose, authentication, parameters, and return values. A minor gap is the lack of explicit mention of error cases or behavioral constraints like pagination, but it provides enough context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning by explaining that 'slug' is a 'Site identifier,' which clarifies the parameter's purpose beyond the schema's generic 'Slug' title. With only one parameter, this is sufficient to achieve a high score, though it could include format examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('List all databases') and resource ('on a site's container'), distinguishing it from sibling tools like 'get_database_info' or 'list_tables' which have different scopes. It precisely defines what the tool does without being vague or tautological.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'Requires: API key with read scope,' providing clear context for when to use this tool. However, it does not mention when not to use it or name specific alternatives among sibling tools, such as 'get_database_info' for detailed information on a single database.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_domain_dnsAInspect

List all DNS records for a domain.

Returns DNS records at the domain level (independent of site-level manage_dns). Use this for domains that may not be linked to a site.

Requires: API key with read scope.

Args: domain_name: Full domain name (e.g. "example.com")

Returns: [{"id": "record-id", "type": "A", "subdomain": "www", "value": "1.2.3.4", "ttl": 3600}]

Errors: NOT_FOUND: Domain not found or not owned by account

ParametersJSON Schema
NameRequiredDescriptionDefault
domain_nameYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes authentication requirements ('API key with read scope'), error conditions ('NOT_FOUND: Domain not found or not owned by account'), and the return format. However, it doesn't mention rate limits, pagination, or other operational constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (purpose, usage, requirements, args, returns, errors). Every sentence adds value—no redundant information—and key details are front-loaded, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is complete. It covers purpose, usage, authentication, parameters, return values, and errors, providing all necessary context for an agent to invoke the tool correctly without relying on structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% description coverage. It explains the 'domain_name' parameter with an example ('e.g. "example.com"') and clarifies it must be a 'Full domain name'. This compensates well for the schema's lack of documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('List') and resource ('DNS records for a domain'). It distinguishes from sibling tools by explicitly contrasting with 'manage_dns' for site-level DNS management, making the scope unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('for domains that may not be linked to a site') and when to use an alternative ('independent of site-level manage_dns'). It also includes prerequisites ('Requires: API key with read scope'), offering comprehensive usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_domainsAInspect

List all domains owned by the authenticated user.

Requires: API key with read scope.

Returns: [{"domain": "example.com", "status": "active", "expires_at": "iso8601", "auto_renew": true, "linked_site": "my-site"}]

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: it requires authentication ('API key with read scope') and describes the return format with an example. It does not mention rate limits, pagination, or error handling, but for a read-only list tool with zero annotations, this is reasonably comprehensive, though not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by prerequisites and return details. Every sentence adds value: the purpose, authentication requirement, and example output structure. There is no wasted text, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (0 parameters, no output schema, no annotations), the description is largely complete: it covers purpose, authentication, and return format. However, it lacks details on potential behavioral aspects like pagination or error cases, which could be relevant for a list operation. With no output schema, the example return format is helpful but not a full specification.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters, with 100% schema description coverage (empty schema). The description does not need to add parameter semantics, so a baseline of 4 is appropriate. It correctly omits parameter details, focusing on other aspects like authentication and output.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('all domains owned by the authenticated user'), making the purpose specific and unambiguous. It distinguishes this tool from siblings like 'domain_detail' (which likely provides details for a single domain) and 'search_domain' (which likely filters domains), establishing clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('List all domains') and includes a prerequisite ('Requires: API key with read scope'), providing clear context for usage. However, it does not explicitly mention when not to use it or name specific alternatives (e.g., 'search_domain' for filtered queries), which prevents a score of 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_filesAInspect

List files and directories in a site's container.

Path scoping depends on the plan:

  • Shared plans: rooted at wp-content/ (WordPress content directory)

  • VPS/dedicated plans: full filesystem access

Requires: API key with read scope.

Args: slug: Site identifier path: Relative path to list (empty for root of accessible area)

Returns: {"path": "/", "entries": [{"name": "index.php", "type": "file", "size": 1234, "modified": "iso8601"}, {"name": "uploads", "type": "directory", "modified": "iso8601"}]}

Errors: NOT_FOUND: Unknown slug or path doesn't exist

ParametersJSON Schema
NameRequiredDescriptionDefault
pathNo
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses behavioral traits: path scoping differences by plan type, authentication requirements (API key with read scope), and error conditions (NOT_FOUND for unknown slug or path). However, it doesn't mention rate limits, pagination, or sorting behavior, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections: purpose, plan-specific scoping, requirements, args, returns, and errors. Each sentence earns its place—no fluff. Front-loaded with the core purpose, followed by essential details in logical order.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description provides comprehensive context: purpose, usage guidelines, parameter semantics, authentication needs, plan-specific behavior, return format example, and error conditions. This is complete enough for a read operation tool with 2 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It fully explains both parameters: 'slug' as 'Site identifier' and 'path' as 'Relative path to list (empty for root of accessible area)'. This adds crucial meaning beyond the bare schema, clarifying usage and defaults.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and the resources 'files and directories in a site's container'. It distinguishes from siblings like read_file (reads content) and delete_file (deletes) by focusing on directory listing. The specific scope ('in a site's container') adds precision.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'List files and directories in a site's container'. Provides context on path scoping differences between shared vs. VPS/dedicated plans, and mentions prerequisites: 'Requires: API key with read scope.' This gives clear guidance on applicability and requirements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_firewall_rulesAInspect

List IP allow/deny firewall rules for a site.

Rules are implemented as Nginx allow/deny directives per container.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"rules": [{"ip": "1.2.3.4", "action": "deny"}, {"ip": "10.0.0.0/8", "action": "allow"}]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively adds context beyond basic functionality by specifying authentication requirements ('API key with read scope') and implementation details ('Nginx allow/deny directives per container'). However, it lacks information on rate limits, error handling, or pagination, which could be relevant for a list operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose, followed by implementation details, prerequisites, and input/output examples. Each sentence adds value without redundancy, and the structure is logical and easy to parse, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a read operation with authentication), no annotations, and no output schema, the description is largely complete. It covers purpose, implementation, prerequisites, parameters, and return format with an example. However, it could improve by mentioning potential limitations like pagination or error cases, slightly reducing completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% description coverage. It explains that 'slug' is a 'Site identifier', clarifying the parameter's purpose and usage. This compensates fully for the schema's lack of documentation, making the parameter semantics clear and actionable.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('List IP allow/deny firewall rules') and resource ('for a site'), distinguishing it from sibling tools like 'add_firewall_rule' and 'remove_firewall_rule'. It provides precise scope by mentioning the implementation details ('Nginx allow/deny directives per container'), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying 'Requires: API key with read scope', which indicates prerequisites. However, it does not explicitly state when to use this tool versus alternatives like 'add_firewall_rule' or 'remove_firewall_rule', nor does it provide exclusions or comparative guidance, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_ftp_accountsAInspect

List FTP accounts on a site.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"accounts": [{"username": "ftpuser", "home": "/var/www/html", "uid": 1001}]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses authentication requirements ('API key with read scope') and the return format, which adds value. However, it does not mention potential behavioral traits like rate limits, pagination, or error conditions, leaving gaps for a read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, requirements, args, returns) and uses bullet-like formatting. It is front-loaded with the core purpose and avoids unnecessary verbosity, though the 'Args' and 'Returns' labels could be more integrated into the flow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one parameter, no output schema, no annotations), the description is reasonably complete. It covers purpose, prerequisites, parameter semantics, and return format. However, it lacks details on error handling or example usage, which could enhance completeness for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and only one parameter, the description compensates by explaining 'slug' as 'Site identifier' in the Args section. This adds meaning beyond the schema's minimal title 'Slug', making the parameter's purpose clear and adequate for the single input.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List FTP accounts on a site.' It specifies the verb ('List') and resource ('FTP accounts'), but does not differentiate from sibling tools like 'remove_ftp_account' or 'create_ftp_account' beyond the action itself, which is implied but not explicitly stated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'Requires' clause indicating API key prerequisites, which provides some context for when to use it. However, it lacks explicit guidance on when to choose this tool over alternatives like 'list_domains' or 'list_files', or any exclusions for when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_modulesAInspect

List AI modules and their enabled/disabled state for a site.

Also returns the list of modules available for the site's plan.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"modules": {"chatbot": true, "seo": false, "translation": false, "content": false}, "available": ["chatbot", "seo", "translation", "content"]}

Errors: NOT_FOUND: Unknown slug

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it's a read operation (implied by 'List'), requires specific authentication ('API key with read scope'), and handles errors ('NOT_FOUND: Unknown slug'). It also specifies the return structure with examples. However, it doesn't mention rate limits, pagination, or other operational constraints, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, followed by additional details in a structured format (Args, Returns, Errors). Every sentence adds value, with no wasted words, making it easy to scan and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is complete. It covers purpose, authentication, parameters, return values with examples, and error handling. This provides all necessary context for an AI agent to use the tool effectively without needing additional structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description must compensate. It adds meaning by explaining that 'slug' is a 'Site identifier', which clarifies the parameter's purpose beyond the schema's generic 'Slug' title. Since there's only one parameter, this is sufficient to achieve a high score, though it doesn't provide format examples or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List AI modules and their enabled/disabled state for a site. Also returns the list of modules available for the site's plan.' It specifies both the verb ('List') and resources ('AI modules', 'enabled/disabled state', 'modules available for the site's plan'), distinguishing it from sibling tools like 'toggle_module' which modifies modules rather than listing them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: to retrieve module information for a specific site. It mentions a prerequisite ('Requires: API key with read scope') but does not explicitly state when not to use it or name alternatives among siblings (e.g., 'toggle_module' for modifying modules). This gives good guidance but lacks explicit exclusions or comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_php_versionsAInspect

List available PHP versions and the currently active one.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"versions": [{"version": "8.1", "active": false}, {"version": "8.2", "active": false}, {"version": "8.3", "active": true}]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses authentication requirements ('API key with read scope') and hints at read-only behavior by using 'List,' but lacks details on rate limits, error handling, or whether it's a safe operation. It adds some context but misses key behavioral traits for a tool with no annotation support.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by prerequisites, parameters, and a clear return example. Every sentence adds value without redundancy, making it efficient and well-structured for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one parameter, no output schema, no annotations), the description is mostly complete. It covers purpose, prerequisites, parameters, and return format with an example. However, it lacks details on error cases or behavioral limits, which would enhance completeness for a tool with no annotation support.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains the 'slug' parameter as 'Site identifier,' adding meaning beyond the schema's generic 'Slug' title. However, it does not provide examples or constraints for the slug format, leaving some ambiguity. With only one parameter, this is sufficient for a high score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('List available PHP versions and the currently active one') and identifies the resource (PHP versions for a site). It distinguishes from sibling tools like 'switch_php' which changes the active version, making the purpose unambiguous and well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by specifying 'Requires: API key with read scope,' which indicates prerequisites. However, it does not explicitly mention when to use this tool versus alternatives like 'get_site_status' or 'get_stack_info,' which might also provide PHP version information, leaving some ambiguity about tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_plansAInspect

List available hosting plans with pricing and resources.

No authentication needed.

Args: track: Filter by plan track. Valid values: "single_site", "agency". Leave empty to list all tracks. include_deprecated: Include deprecated plans (default: false)

Returns: [{"slug": "site_starter", "name": "Starter", "track": "single_site", "hosting_type": "shared", "price": {"monthly": 5, "annual": 2, "currency": "CAD"}, "resources": null, "features": {"max_sites": 1, "ai_modules": [...], "ai_agents": [], "free_domain_annual": false}}, ...]

ParametersJSON Schema
NameRequiredDescriptionDefault
trackNo
include_deprecatedNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It explicitly states 'No authentication needed' which is valuable behavioral information. It also shows the return format with detailed examples, though it doesn't mention rate limits, pagination, or error conditions. The description adds significant value beyond what's in the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, authentication note, args, returns) and every sentence earns its place. It's appropriately sized for a tool with 2 parameters and detailed return values, with no wasted words or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is quite complete. It covers authentication requirements, parameter usage, and provides a detailed return example. The main gap is lack of guidance on when to use this versus other listing tools, but otherwise it provides good context for agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing detailed parameter semantics. It explains both parameters clearly: 'track' with valid values and behavior when empty, and 'include_deprecated' with its default value. The description adds substantial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('List') and resource ('available hosting plans'), and distinguishes it from siblings by specifying it returns pricing and resource information. It's not a tautology of the name and provides concrete details about what information is returned.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. While it mentions filtering parameters, it doesn't explain when this tool is appropriate compared to other listing tools like list_apps, list_domains, or list_subscriptions. There's no context about prerequisites or typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_pluginsAInspect

List installed WordPress plugins with status.

Requires: API key with read scope. WordPress sites only.

Args: slug: Site identifier

Returns: {"plugins": [{"name": "akismet", "status": "active", "version": "5.3", "update_available": false}, ...]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description carries the full burden of behavioral disclosure. It adds valuable context beyond the basic purpose: it specifies prerequisites ('API key with read scope'), platform constraints ('WordPress sites only'), and includes a detailed example of the return format. This covers key behavioral aspects like authentication needs and output structure, though it could mention rate limits or pagination.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded, with the core purpose in the first sentence, followed by prerequisites and a clear example of the return value. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (one parameter, no output schema, no annotations), the description is largely complete. It covers purpose, prerequisites, parameter semantics, and return format. The only minor gap is the lack of explicit mention of potential errors or edge cases, but overall it provides sufficient context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, with one parameter 'slug' undocumented. The description compensates by adding meaning: it clarifies that 'slug' is a 'Site identifier,' which provides essential context not in the schema. However, it does not specify the format or constraints of the slug, leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List installed WordPress plugins with status.' It specifies the verb ('List'), resource ('installed WordPress plugins'), and scope ('with status'), distinguishing it from siblings like list_themes or list_modules. This is specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'WordPress sites only' and 'Requires: API key with read scope.' However, it does not explicitly differentiate from potential alternatives (e.g., manage_plugin for plugin management) or state when not to use it, so it falls short of a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_snapshotsAInspect

List all snapshots and scheduled snapshots for a site.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"snapshots": [{"id": "uuid", "name": "snap-...", "status": "completed", "storage_type": "local"|"b2", "size_bytes": 1234, "size_display": "1.2 Mo", "created_at": "iso8601"}], "scheduled": [...]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds value by specifying the authentication requirement ('API key with read scope') and detailing the return structure, which helps the agent understand the output format. However, it lacks information on potential side effects, rate limits, error handling, or pagination behavior, leaving gaps in behavioral context for a tool that lists data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the tool's purpose, followed by requirements, arguments, and returns in a structured format. Every sentence adds value, such as the authentication note and return details, with no wasted words. It could be slightly more concise by integrating the 'Args' and 'Returns' sections more seamlessly, but overall it is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is fairly complete. It covers the purpose, authentication requirement, parameter semantics, and return structure, which is sufficient for an agent to use the tool correctly. However, it lacks details on behavioral aspects like error cases or limitations, which slightly reduces completeness for a listing operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 0% description coverage, so the description must compensate. It adds meaning by explaining 'slug' as 'Site identifier,' which clarifies the parameter's purpose beyond the schema's minimal title 'Slug.' Since there is only one parameter and the description provides this semantic detail, it effectively compensates for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List all snapshots and scheduled snapshots for a site.' It specifies the verb ('List') and resource ('snapshots and scheduled snapshots'), and distinguishes it from sibling tools like 'create_snapshot' or 'delete_snapshot'. However, it does not explicitly differentiate from similar listing tools like 'list_backups' or 'list_files', which slightly limits its clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some usage context by stating 'Requires: API key with read scope,' which implies this tool is for read-only operations and requires authentication. However, it does not explicitly say when to use this tool versus alternatives (e.g., 'list_backups' or 'get_snapshot_usage'), nor does it mention any exclusions or prerequisites beyond the API key. The guidance is implied but not comprehensive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_subscriptionsAInspect

List all subscriptions with plan details, pricing, status, and site slug.

Requires: API key with read scope.

Returns: [{"id": "uuid", "plan_slug": "site_starter", "plan_name": "Starter", "status": "active", "billing_period": "monthly", "price": {"amount": 500, "currency": "cad"}, "site_slug": "my-site", "created_at": "iso8601"}]

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description carries the full burden. It discloses important behavioral traits: the requirement of an API key with read scope (addressing auth needs) and the return format with example data. It does not mention rate limits or pagination, but it provides sufficient context for a read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose, followed by prerequisites and return format in a structured manner. Every sentence adds value without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (simple read operation with no parameters) and lack of annotations and output schema, the description is mostly complete. It covers purpose, prerequisites, and return format. However, it could mention if there are limitations like pagination or sorting, but it's adequate for the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description does not add parameter details, which is appropriate, but it could have noted the lack of parameters explicitly. Baseline is 4 for 0 parameters, as it doesn't need to compensate for gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('List') and resource ('all subscriptions'), and it distinguishes itself from siblings by specifying it returns plan details, pricing, status, and site slug, unlike other list tools such as list_apps or list_domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit prerequisites ('Requires: API key with read scope'), which gives some context for when to use it. However, it does not differentiate when to use this tool versus alternatives like list_plans or other list tools, leaving usage implied rather than explicitly guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_tablesAInspect

List tables in a database.

Requires: API key with read scope.

Args: slug: Site identifier database: Database name

Returns: {"tables": [{"name": "wp_posts", "rows": 1234, "size_mb": 5.2}, ...]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
databaseYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the requirement for an API key with read scope, which adds useful context about authentication needs. However, it lacks details on behavioral traits such as rate limits, pagination, error handling, or what happens if the database doesn't exist. The description adds some value but is incomplete for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by requirements, arguments, and return format in a clear, bullet-like style. Each sentence earns its place by providing essential information without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is fairly complete: it states the purpose, requirements, parameters, and return format. However, it lacks details on error cases, limitations, or behavioral aspects like pagination, which could be important for a list operation. With no output schema, the return example is helpful but not exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explicitly lists and describes the two parameters ('slug' as 'Site identifier' and 'database' as 'Database name'), adding meaning beyond the input schema, which has 0% description coverage. This compensates well for the schema's lack of details, though it doesn't cover all potential nuances (e.g., format constraints or examples). With 2 parameters and good compensation, a score of 4 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List tables in a database.' It specifies the verb ('List') and resource ('tables in a database'), making the action explicit. However, it does not differentiate from sibling tools like 'list_databases' or 'list_files' beyond the resource type, which is why it doesn't earn a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes 'Requires: API key with read scope,' which implies a prerequisite context for usage. However, it does not explicitly state when to use this tool versus alternatives (e.g., 'list_databases' for databases or 'execute_query' for querying tables), nor does it provide exclusions or comparisons. This makes the guidance implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_themesAInspect

List installed WordPress themes with status.

Requires: API key with read scope. WordPress sites only.

Args: slug: Site identifier

Returns: {"themes": [{"name": "twentytwentyfour", "status": "active", "version": "1.0", "update_available": false}, ...]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully communicates that this is a read operation (implied by 'List'), specifies authentication requirements ('API key with read scope'), and indicates platform constraints ('WordPress sites only'). It also shows the return format, though not in a formal output schema. It doesn't mention rate limits or pagination behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections: purpose statement, requirements, arguments, and return format. Every sentence earns its place, and information is front-loaded with the core purpose stated first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read operation with no annotations and no output schema, the description provides good coverage: purpose, requirements, parameter explanation, and example return format. It could potentially include more about error cases or pagination, but given the tool's relative simplicity, it's reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the single parameter 'slug', the description compensates by providing the parameter name in the Args section and clarifying it as 'Site identifier'. This adds meaningful context beyond the bare schema, though it could be more detailed about the slug format or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('List installed WordPress themes') and resource ('WordPress themes'), including the scope ('with status'). It distinguishes from sibling tools like 'list_plugins' or 'manage_theme' by focusing on listing themes with their status information.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool ('WordPress sites only') and prerequisites ('Requires: API key with read scope'). However, it doesn't explicitly mention when NOT to use it or name specific alternatives among the sibling tools for similar operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manage_dnsAInspect

Create or delete DNS records for a site.

Requires: API key with write scope.

Args: slug: Site identifier action: "create" or "delete" record_type: "A", "AAAA", "CNAME", "MX", "TXT", or "SRV" subdomain: Subdomain part (e.g. "www", "mail"). Leave empty for the apex/root domain. value: Record value. Required for "create". Examples: A: "1.2.3.4", CNAME: "example.com", MX: "mail.example.com", TXT: "v=spf1 include:_spf.google.com ~all" ttl: Time to live in seconds (default: 3600)

Returns: {"success": true, "record": {"type": "A", "subdomain": "www", "value": "1.2.3.4", "ttl": 3600}}

Errors: VALIDATION_ERROR: Missing value for create, invalid record type NOT_FOUND: Unknown slug

ParametersJSON Schema
NameRequiredDescriptionDefault
ttlNo
slugYes
valueNo
actionYes
subdomainNo
record_typeYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well: it discloses the write operation nature ('create or delete'), authentication requirements ('API key with write scope'), and includes error cases ('VALIDATION_ERROR', 'NOT_FOUND'). It also provides return value examples. It doesn't mention rate limits or idempotency behavior, but covers core operational aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections (description, requirements, args, returns, errors). Every sentence earns its place, though the value examples section is somewhat dense. Could be slightly more front-loaded with the core purpose, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 6 parameters, 0% schema coverage, no annotations, and no output schema, the description provides excellent completeness: it covers purpose, prerequisites, all parameters with examples, return format, and error cases. The agent has everything needed to correctly invoke this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing detailed parameter documentation: it explains all 6 parameters with clear semantics, examples, defaults, and conditional requirements ('Required for "create"'). The value parameter includes specific examples for different record types, adding significant value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create or delete DNS records for a site' - a specific verb (create/delete) + resource (DNS records) + scope (for a site). This distinguishes it from sibling tools like 'list_domains' or 'search_domain' which are read-only operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'Requires: API key with write scope' provides clear context about prerequisites. However, it doesn't explicitly state when to use this tool versus alternatives like 'domain_detail' or 'list_domains', nor does it provide exclusion guidance for when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manage_pluginAInspect

Install, activate, deactivate, or delete a WordPress plugin.

Requires: API key with write scope.

Args: slug: Site identifier action: "install", "activate", "deactivate", or "delete" plugin: Plugin slug (e.g. "akismet", "jetpack", "woocommerce")

Returns: {"action": "install", "plugin": "jetpack", "result": {...}}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
actionYes
pluginYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the required API key scope (write), which is valuable context for permissions. However, it doesn't describe other behavioral traits like whether actions are reversible, potential side effects (e.g., site downtime during plugin activation), rate limits, or error handling. The description provides basic operational context but lacks depth for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with purpose first, then requirements, parameters, and return format. Every sentence earns its place: the first states the core functionality, the second specifies prerequisites, and the remaining sections document inputs and outputs without redundancy. It's appropriately sized for a tool with three parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (plugin management with multiple actions), no annotations, 0% schema coverage, and no output schema, the description does a good job covering essentials. It explains what the tool does, prerequisites, all parameters, and provides a return example. However, for a mutation tool with significant potential impact, it could better describe behavioral aspects like confirmation prompts, idempotency, or error scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It clearly explains all three parameters: 'slug' as site identifier, 'action' with its four possible values, and 'plugin' with examples. This adds significant meaning beyond the bare schema. The only minor gap is that 'slug' could benefit from more specific formatting examples, but overall parameter semantics are well-covered.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs (install, activate, deactivate, delete) and resource (WordPress plugin). It distinguishes itself from sibling tools like 'list_plugins' or 'manage_theme' by focusing on plugin management operations rather than listing or theme management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context with the 'Requires: API key with write scope' statement, indicating prerequisites for usage. However, it doesn't explicitly state when to use this tool versus alternatives (e.g., when to use 'manage_plugin' vs 'install_app' or other WordPress-specific tools), nor does it mention exclusions or specific scenarios where this tool should not be used.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manage_themeAInspect

Install, activate, or delete a WordPress theme.

Requires: API key with write scope.

Args: slug: Site identifier action: "install", "activate", or "delete" theme: Theme slug (e.g. "twentytwentyfour", "astra")

Returns: {"action": "install", "theme": "astra", "result": {...}}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
themeYes
actionYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the required API scope (write) and lists possible actions, but lacks details on side effects (e.g., whether deleting a theme is irreversible), error conditions, or rate limits. It adds some behavioral context but is incomplete for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by requirements, args, and returns. Every sentence adds value with no wasted words, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is moderately complete. It covers purpose, prerequisites, parameters, and a return example, but lacks details on error handling, side effects, or output structure, which are important for safe usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains all three parameters: 'slug' as 'Site identifier', 'action' with its enum values, and 'theme' with examples. This adds significant meaning beyond the bare schema, though it could clarify parameter formats further.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific actions (install, activate, delete) and the resource (WordPress theme). It distinguishes itself from sibling tools like 'list_themes' by focusing on management operations rather than listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states the prerequisite 'Requires: API key with write scope,' which provides clear context for when to use this tool. However, it does not mention when not to use it or name specific alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

optimize_databaseAInspect

Optimize WordPress database tables (reduces bloat).

Requires: API key with write scope.

Args: slug: Site identifier

Returns: {"optimized": true, "tables_optimized": 12}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool 'optimizes' (implying a write/mutation operation) and requires 'API key with write scope' (indicating authentication needs). It also hints at behavioral outcomes ('reduces bloat') and provides a return example. However, it lacks details on rate limits, side effects, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by prerequisites, arguments, and returns. Each section is concise and adds value without redundancy. The use of clear headings ('Requires:', 'Args:', 'Returns:') enhances readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a mutation operation with no annotations and no output schema), the description is reasonably complete. It covers purpose, prerequisites, parameter semantics, and return values. However, it could benefit from more behavioral context (e.g., idempotency, side effects) to fully compensate for the lack of structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning by explaining that 'slug' is a 'Site identifier', which clarifies the parameter beyond the schema's generic 'Slug' title. Since there is only one parameter, this is sufficient to achieve a high score, though it doesn't cover format or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('optimize WordPress database tables') and resource ('database tables'), with the parenthetical '(reduces bloat)' providing additional context about the effect. It distinguishes itself from siblings like 'database_search_replace' or 'list_tables' by focusing on performance optimization rather than data manipulation or listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'Requires: API key with write scope' provides clear context about prerequisites, but it does not explicitly state when to use this tool versus alternatives (e.g., compared to 'cache_flush' or other performance-related tools). The description implies usage for database maintenance but lacks explicit when-not or alternative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_fileAInspect

Read the contents of a file from a site's container.

Max file size: 512KB. Binary files are rejected — use the site's file manager or SSH for binary files.

Requires: API key with read scope.

Args: slug: Site identifier path: Relative path to the file

Returns: {"path": "wp-config.php", "content": "<?php ...", "size": 1234, "encoding": "utf-8"}

Errors: NOT_FOUND: File doesn't exist VALIDATION_ERROR: File is binary or exceeds 512KB

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYes
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully describes key behavioral traits: file size limits, binary file rejection, authentication requirements, and error conditions. However, it doesn't mention rate limits, pagination, or caching behavior, which keeps it from a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (purpose, constraints, requirements, args, returns, errors). Every sentence earns its place by providing essential information without redundancy. The most critical information (what the tool does) appears first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read operation with no annotations and no output schema, the description provides comprehensive context: purpose, constraints, requirements, parameter semantics, return format example, and error conditions. It gives the agent everything needed to correctly invoke this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing clear parameter explanations in the 'Args' section. It defines 'slug' as 'Site identifier' and 'path' as 'Relative path to the file', adding essential semantic meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Read the contents of a file') and resource ('from a site's container'), distinguishing it from sibling tools like list_files (which lists files) or write_file (which writes files). It provides a complete, unambiguous purpose statement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when NOT to use this tool ('Binary files are rejected — use the site's file manager or SSH for binary files') and provides clear constraints ('Max file size: 512KB'). It also mentions prerequisites ('Requires: API key with read scope'), giving comprehensive usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

registerAInspect

Register a new agent account and get an API key.

No authentication needed. The returned API key grants read+write access to all BorealHost API endpoints. Store it securely — it cannot be retrieved again.

The key is automatically activated for this session — all subsequent tool calls will use it. No extra configuration needed.

If no email is provided, a synthetic agent identity is created (agent-{uuid}@api.borealhost.ai). If an email is provided, it links to an existing or new human account.

Args: name: Human-readable name for this API key (default: "Agent Key") email: Optional email to link to a human account

Returns: {"api_key": "bh_...", "key_id": "uuid", "prefix": "bh_...", "scopes": ["read", "write"], "account_id": "uuid", "message": "Store this API key securely..."}

Errors: RATE_LIMITED: Max 5 registrations per IP per hour VALIDATION_ERROR: Invalid email format

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoAgent Key
emailNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and excels. It discloses critical behavioral traits: authentication requirements ('No authentication needed'), security implications ('Store it securely — it cannot be retrieved again'), session behavior ('automatically activated for this session'), rate limits ('Max 5 registrations per IP per hour'), and account linking logic. This goes well beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: the first sentence states the core purpose, followed by key behavioral details, then parameter and return explanations. Every sentence adds value—no fluff or repetition. The use of sections (Args, Returns, Errors) enhances readability without verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (account creation with security implications), no annotations, and no output schema, the description is remarkably complete. It covers purpose, usage, behavior, parameters, return values (including full JSON structure), and error conditions. This provides all necessary context for an agent to invoke it correctly and handle outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It does: the 'Args' section explains both parameters with meaningful context—'name' as a human-readable label with default, and 'email' as optional for linking to human accounts, including behavior when omitted. This adds substantial semantics beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Register a new agent account and get an API key.' It specifies the verb ('register'), resource ('agent account'), and outcome ('get an API key'), distinguishing it from siblings like 'request_api_key' or 'claim_api_key' which imply different workflows.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage: 'No authentication needed' and 'If no email is provided... If an email is provided...' It implicitly distinguishes from tools like 'set_api_key' or 'rotate_key' by focusing on initial registration. However, it lacks explicit when-not-to-use guidance or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_domainAInspect

Register a new domain with WHOIS contact info and Stripe billing.

The domain cost is charged to the user's active subscription. Free domain if plan includes free_domain_annual + annual billing + first domain.

Requires: API key with write scope.

Args: domain: Full domain name (e.g. "example.ca", "mybusiness.com") first_name: Registrant first name last_name: Registrant last name email: Registrant email address phone: Phone number in E.164 format: "+1.5145551234" address1: Street address (e.g. "123 Rue Principale") city: City (e.g. "Montreal") state: Province/state code (e.g. "QC", "ON", "BC") postal_code: Postal/ZIP code (e.g. "H2X 1Y4") country: ISO 3166-1 alpha-2 country code (default: "CA") period: Registration period in years (1–10, default: 1) ca_legal_type: Required for .ca domains. CIRA legal types: "CCO" (Canadian citizen), "RES" (permanent resident), "CCT" (corporation), "GOV" (government), "EDU" (education), "ASS" (association), "HOP" (hospital), "PRT" (partnership), "TDM" (trademark), "TRD" (trade union), "PLT" (political party), "LAM" (library/archive/museum), "MAJ" (Her Majesty), "INB" (Indian band), "ABO" (Aboriginal peoples), "LGR" (legal representative)

Returns: {"domain": "example.ca", "status": "registered", "expires_at": "iso8601", "message": "Domain registered successfully"}

Errors: VALIDATION_ERROR: Missing required fields, invalid phone format, missing ca_legal_type for .ca domains NOT_FOUND: Domain not available (already registered by someone else)

ParametersJSON Schema
NameRequiredDescriptionDefault
cityYes
emailYes
phoneYes
stateYes
domainYes
periodNo
countryNoCA
address1Yes
last_nameYes
first_nameYes
postal_codeYes
ca_legal_typeNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behavioral traits: it's a mutation tool (implied by 'Register'), involves billing ('charged to the user's active subscription'), has authentication requirements ('API key with write scope'), and outlines error conditions like 'VALIDATION_ERROR' and 'NOT_FOUND'. It could improve by mentioning rate limits or idempotency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded with the core purpose and key guidelines. It uses clear sections (Args, Returns, Errors) for structure, though some sentences in the parameter details could be slightly more concise. Overall, it avoids waste and is well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (12 parameters, mutation tool, billing integration) and lack of annotations or output schema, the description is highly complete. It covers purpose, usage, parameters, return values, and errors comprehensively, providing all necessary context for an AI agent to invoke the tool correctly without relying on structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It adds extensive meaning beyond the schema by explaining each parameter's purpose, format (e.g., 'Phone number in E.164 format'), defaults ('default: "CA"'), constraints ('1–10, default: 1'), and special cases ('Required for .ca domains' with detailed legal types). This effectively documents all 12 parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Register a new domain'), the resources involved ('with WHOIS contact info and Stripe billing'), and distinguishes it from siblings like 'domain_detail' or 'search_domain' by emphasizing creation rather than querying. It's not a tautology of the name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use it (e.g., 'Free domain if plan includes free_domain_annual + annual billing + first domain') and prerequisites ('Requires: API key with write scope'), but does not explicitly mention when not to use it or name alternatives among siblings like 'manage_dns' for DNS-related tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_firewall_ruleAInspect

Remove an IP firewall rule and reload Nginx.

Requires: API key with write scope.

Args: slug: Site identifier ip: IP address or CIDR to remove (must match exactly)

Returns: {"removed": true, "ip": "1.2.3.4"}

ParametersJSON Schema
NameRequiredDescriptionDefault
ipYes
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the action (removal and Nginx reload), specifies permission requirements ('API key with write scope'), and provides the return format. However, it lacks details on potential side effects, error conditions, or rate limits, which would be helpful for a destructive operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by requirements, parameters, and return value. Each sentence adds value without redundancy, making it efficient and easy to parse. No wasted words or unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a destructive operation with 2 parameters, no annotations, and no output schema), the description is mostly complete. It covers purpose, requirements, parameters, and return format. However, it could improve by mentioning potential errors or confirming the rule exists before removal, but it's sufficient for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must fully compensate. It clearly explains both parameters: 'slug' as 'Site identifier' and 'ip' as 'IP address or CIDR to remove (must match exactly)', adding crucial semantic context beyond the bare schema. This is essential for correct tool invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Remove an IP firewall rule and reload Nginx'), identifies the resource ('IP firewall rule'), and distinguishes it from sibling tools like 'add_firewall_rule' and 'list_firewall_rules' by specifying the removal operation. It goes beyond just restating the tool name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Remove an IP firewall rule') and mentions prerequisites ('Requires: API key with write scope'), but does not explicitly state when not to use it or name alternatives (e.g., 'add_firewall_rule' or 'list_firewall_rules'). It implies usage for removal operations without contrasting with other firewall-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_ftp_accountAInspect

Remove an FTP account from a site.

Requires: API key with write scope.

Args: slug: Site identifier username: FTP username to remove

Returns: {"removed": true, "username": "ftpuser"}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
usernameYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the authentication requirement (API key with write scope) and shows the return format, which is valuable. However, it doesn't mention potential side effects (e.g., whether removal is permanent, if it affects site functionality, or error conditions), leaving some behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, requirements, args, returns), uses bullet-like formatting, and every sentence adds value. It's front-loaded with the core action and efficiently communicates necessary information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter destructive tool with no annotations and no output schema, the description does a good job: it states the action, prerequisites, parameter meanings, and return example. It could be more complete by mentioning irreversible nature or error cases, but given the simplicity of the operation, it's largely adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It clearly explains both parameters: 'slug' as 'Site identifier' and 'username' as 'FTP username to remove'. This adds meaningful context beyond the bare schema, though it could provide more detail about format constraints (e.g., slug format, username rules).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Remove') and resource ('FTP account from a site'), making the purpose immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'delete_account' or 'delete_file', which could have overlapping contexts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear prerequisites ('Requires: API key with write scope'), which helps determine when this tool can be used. It doesn't explicitly mention when NOT to use it or name specific alternatives among siblings, but the context is sufficiently clear for a deletion operation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

request_api_keyAInspect

Request an API key for a site you are running on (challenge-response).

This starts a two-step verification flow:

  1. A claim token is written to your container at ~/.borealhost/.claim_token (mode 600, owner admin — only readable if you're on the container)

  2. Read that file and call claim_api_key(token) within 1 hour

This proves you have access to the container without storing any secrets on disk permanently. The claim token is single-use and ephemeral.

No authentication needed — the proof is reading the file from the container.

Args: site_slug: The site identifier (your BorealHost site slug)

Returns: {"status": "pending", "site_slug": "my-site", "expires_in_seconds": 3600, "claim_path": "~/.borealhost/.claim_token", "instructions": "Read the claim token and call claim_api_key()..."}

Errors: VALIDATION_ERROR: Unknown site slug or no active subscription RATE_LIMITED: Too many pending claim tokens

ParametersJSON Schema
NameRequiredDescriptionDefault
site_slugYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and excels. It discloses key behavioral traits: the two-step flow, file creation details (path, permissions, owner), time constraints ('within 1 hour'), security rationale ('proves you have access... without storing secrets permanently'), and error conditions (VALIDATION_ERROR, RATE_LIMITED). This provides comprehensive operational context beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured and front-loaded: the first sentence states the core purpose, followed by a numbered list for the flow, key behavioral notes, and structured sections for args, returns, and errors. Every sentence adds value—no fluff or repetition—making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step flow, security implications) and lack of annotations/output schema, the description is remarkably complete. It covers purpose, usage steps, behavioral details, parameter meaning, return structure, and error cases. No output schema exists, but the description fully documents the return format and instructions, leaving no gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, but the description compensates well. It explains the single parameter 'site_slug' as 'The site identifier (your BorealHost site slug)', adding contextual meaning. However, it doesn't provide examples or format details (e.g., slug patterns), leaving minor gaps. Since there's only one parameter, the baseline is high, and the added semantics are helpful.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Request an API key for a site you are running on (challenge-response).' It specifies the verb ('request'), resource ('API key'), and context ('site you are running on'), distinguishing it from siblings like 'set_api_key' or 'rotate_key' which manage existing keys rather than initiating a request flow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: it outlines a two-step verification flow, specifies prerequisites ('No authentication needed — the proof is reading the file from the container'), and indicates when to use it (to obtain an API key via challenge-response). It implicitly distinguishes from alternatives like 'set_api_key' by focusing on initial key acquisition rather than configuration.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

restore_backupAInspect

Restore a site from a backup.

WARNING: This is destructive. The current state of the site will be replaced. Runs asynchronously — may take several minutes.

Requires: API key with admin scope.

Args: slug: Site identifier backup_id: UUID of the backup to restore from

Returns: {"success": true, "message": "Restore started..."}

Errors: VALIDATION_ERROR: Backup not found or not in completed state

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
backup_idYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and excels by disclosing critical behavioral traits: it's destructive (replaces current site state), asynchronous (may take several minutes), requires admin scope, and includes error conditions (VALIDATION_ERROR). This goes beyond basic functionality to cover safety, timing, permissions, and failure modes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by warnings, requirements, and technical details. Every sentence earns its place: the first states the action, the warning highlights risks, the async note sets expectations, the requirement specifies prerequisites, and the Args/Returns/Errors sections provide essential documentation without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (destructive, async, admin-scoped) and lack of annotations or output schema, the description is highly complete. It covers purpose, behavioral traits, parameters, return values, and errors, providing all necessary context for safe and effective use without relying on structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaningful semantics: 'slug' is explained as 'Site identifier' and 'backup_id' as 'UUID of the backup to restore from', clarifying their roles beyond the schema's generic titles. However, it doesn't detail format constraints (e.g., slug pattern or UUID version), leaving minor gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Restore a site from a backup'), identifies the resource ('site'), and distinguishes it from siblings like 'create_backup', 'list_backups', or 'rollback_snapshot' by focusing on restoration from existing backups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit usage guidance with 'WARNING: This is destructive' and 'Requires: API key with admin scope', indicating when to use it (for restoration with proper permissions) and implicitly when not to use it (e.g., for non-destructive operations like listing backups). It also distinguishes from alternatives by specifying it restores from backups, unlike 'deploy' or 'rollback_snapshot'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

revoke_api_keyAInspect

Revoke (deactivate) an API key. The key stops working immediately.

Requires: API key with write scope.

Args: key_id: UUID of the key to revoke (from list_api_keys or whoami)

Returns: {"success": true, "message": "API key revoked"}

Errors: NOT_FOUND: Key not found or already revoked

ParametersJSON Schema
NameRequiredDescriptionDefault
key_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behavioral traits: it's a destructive operation ('stops working immediately'), has authentication requirements ('API key with write scope'), and includes error handling ('NOT_FOUND: Key not found or already revoked'). It could improve by mentioning rate limits or idempotency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core action, followed by requirements, arguments, returns, and errors in a structured format. Every sentence earns its place without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (destructive operation with one parameter), no annotations, and no output schema, the description is complete enough. It covers purpose, usage, parameters, return values, and errors, providing all necessary context for safe invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% description coverage. It explains that 'key_id' is a 'UUID of the key to revoke' and specifies sources ('from list_api_keys or whoami'), fully compensating for the schema's lack of documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('revoke/deactivate') and resource ('API key'), and distinguishes it from siblings like 'create_api_key', 'list_api_keys', and 'rotate_key' by specifying it makes a key stop working immediately.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Requires: API key with write scope') and implies usage by referencing 'key_id' from 'list_api_keys or whoami', but does not explicitly state when not to use it or name alternatives like 'rotate_key' for key management.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rollback_snapshotAInspect

Rollback a site to a previous snapshot.

WARNING: This is destructive. The current state of the container will be replaced with the snapshot contents.

Requires: API key with admin scope.

Args: slug: Site identifier snapshot_id: UUID of the snapshot to rollback to

Returns: {"success": true, "message": "Rolled back to snapshot ..."}

Errors: NOT_FOUND: Snapshot not found or not in completed state

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
snapshot_idYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and excels by disclosing critical behavioral traits: it explicitly warns of destructiveness ('The current state of the container will be replaced'), specifies authentication requirements ('API key with admin scope'), and outlines error conditions ('NOT_FOUND: Snapshot not found or not in completed state'). This goes beyond basic function to inform risks and prerequisites.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by warnings, requirements, and technical details. Every sentence earns its place: the first states the action, the warning highlights risk, the requirement specifies auth, and the sections on args, returns, and errors provide essential usage info without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (destructive operation with 2 params, no annotations, no output schema), the description is complete. It covers purpose, behavioral risks, auth needs, parameter meanings, return format, and error cases, providing all necessary context for safe and correct invocation without relying on structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning by explaining 'slug' as 'Site identifier' and 'snapshot_id' as 'UUID of the snapshot to rollback to', providing context not in the schema's bare titles. However, it doesn't detail format constraints (e.g., UUID format), leaving minor gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Rollback a site to a previous snapshot') and identifies the resource ('site'). It distinguishes from siblings like 'create_snapshot', 'delete_snapshot', and 'restore_backup' by focusing on reverting to a snapshot rather than creating, removing, or restoring from a backup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance with 'WARNING: This is destructive' and 'Requires: API key with admin scope', indicating when to use (for reverting sites with proper permissions) and when not to use (if preservation of current state is needed). It implicitly contrasts with non-destructive siblings like 'list_snapshots' or 'get_site_status'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rotate_keyAInspect

Atomically rotate an API key. Old key is immediately invalidated.

Creates a new key with the same name, scopes, and rate limits. The new key is returned once — store it immediately.

Requires: API key with write scope.

Args: key_id: UUID of the API key to rotate (get from whoami())

Returns: {"api_key": "bh_...", "key_id": "uuid", "prefix": "bh_...", "scopes": ["read", "write"], "message": "Key rotated. Store securely."}

Note: The old key stops working immediately. Update BOREALHOST_API_KEY right away.

ParametersJSON Schema
NameRequiredDescriptionDefault
key_idYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses key behaviors: atomic operation, immediate invalidation of old key, creation of new key with same attributes, one-time return of new key, and requirement for write scope. It also warns about rate limits and secure storage, adding valuable context beyond basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with core action, followed by details, args, returns, and notes. Every sentence adds value—no redundancy or fluff—making it efficient and easy to parse for an agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations or output schema, the description is complete: it covers purpose, usage, behavior, parameters, return format, and critical warnings. Given the complexity (key rotation with security implications), it provides all necessary context for safe and correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description compensates fully. It explains 'key_id' as 'UUID of the API key to rotate' and provides usage guidance 'get from whoami()', adding crucial meaning not in the schema. This clarifies parameter purpose and sourcing.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('atomically rotate an API key'), resource ('API key'), and scope ('old key is immediately invalidated'). It distinguishes from siblings like 'set_api_key' or 'request_api_key' by focusing on rotation rather than creation or assignment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided: 'Requires: API key with write scope' specifies prerequisites, and 'Note: The old key stops working immediately. Update BOREALHOST_API_KEY right away' indicates critical timing and actions. It implicitly contrasts with 'set_api_key' by handling key replacement rather than setting.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

run_malware_scanAInspect

Run a ClamAV malware scan on a site's container.

Scans the web root (or specified path) for malware, viruses, and trojans. ClamAV is installed automatically if not present. Excludes node_modules, vendor, .git, and cache directories.

May take up to 5 minutes for large sites.

Requires: API key with write scope.

Args: slug: Site identifier path: Directory to scan (default: /var/www/html)

Returns: {"infected_files": [{"path": "/var/www/html/shell.php", "threat": "Php.Malware.Agent"}], "scanned_count": 1234, "infected_count": 1, "scan_time_s": 45.2}

ParametersJSON Schema
NameRequiredDescriptionDefault
pathNo
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing important behavioral traits: automatic dependency installation ('ClamAV is installed automatically if not present'), exclusions ('Excludes node_modules, vendor, .git, and cache directories'), performance expectations ('May take up to 5 minutes for large sites'), and authentication requirements ('Requires: API key with write scope'). It doesn't mention error handling or rate limits, keeping it from a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and front-loaded: first sentence states the core purpose, subsequent sentences add crucial details in logical order (what it scans, dependencies, exclusions, timing, requirements), then clearly documents parameters and return values. Every sentence earns its place with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter tool with no annotations and no output schema, the description provides complete context: clear purpose, usage guidance, behavioral transparency, parameter explanations, AND a detailed return value example. The output example specifically documents the response structure, compensating for the missing output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for 2 parameters, the description compensates well by explaining both parameters: 'slug: Site identifier' and 'path: Directory to scan (default: /var/www/html)'. It provides the default value and clarifies what 'slug' represents. The only minor gap is not specifying path format constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Run a ClamAV malware scan') on a specific resource ('on a site's container'), with details about what it scans for ('malware, viruses, and trojans') and where ('web root or specified path'). It distinguishes itself from sibling tools by focusing on security scanning rather than management, deployment, or monitoring functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Scans the web root (or specified path) for malware') and mentions prerequisites ('Requires: API key with write scope'). However, it doesn't explicitly state when NOT to use it or name specific alternative tools among the siblings for different scanning needs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scaleAInspect

Change a site's hosting plan (upgrade or downgrade).

Requires: API key with admin scope. Best practice: create a snapshot before downgrading.

Args: slug: Site identifier new_plan: Target plan slug (e.g. "site_pro", "site_managed"). Call list_plans() to see available plans.

Returns: {"success": true, "old_plan": "site_starter", "new_plan": "site_pro", "message": "Plan changed successfully"}

Errors: NOT_FOUND: Unknown slug VALIDATION_ERROR: Invalid plan slug or same plan

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
new_planYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well: it discloses authentication requirements ('API key with admin scope'), operational risks ('create a snapshot before downgrading'), and error conditions (NOT_FOUND, VALIDATION_ERROR). It doesn't mention rate limits or idempotency, leaving minor gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, requirements, args, returns, errors), front-loaded with the core action. Every sentence adds value—no fluff or repetition—making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 2 parameters, 0% schema coverage, and no output schema, the description is highly complete: it covers purpose, prerequisites, parameters with examples, return format, and error cases. This provides all necessary context for safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains both parameters: 'slug' as 'Site identifier' and 'new_plan' with examples ('site_pro', 'site_managed') and guidance to use list_plans(). This adds significant meaning beyond the bare schema, though it could clarify slug format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Change a site's hosting plan') and resource ('site'), distinguishing it from siblings like 'list_plans' or 'get_site_status'. It explicitly mentions both upgrade and downgrade operations, providing complete purpose clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: 'Requires: API key with admin scope' specifies prerequisites, and 'Best practice: create a snapshot before downgrading' offers operational advice. It also references an alternative tool ('Call list_plans() to see available plans') for complementary information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

schedule_snapshotAInspect

Schedule a snapshot for future execution.

Requires: API key with write scope. Max 3 pending schedules per site.

Args: slug: Site identifier scheduled_at: ISO 8601 datetime (must be in the future) description: Optional description (max 200 chars)

Returns: {"id": "uuid", "scheduled_at": "iso8601", "status": "scheduled"}

Errors: VALIDATION_ERROR: Invalid datetime, not in future, or too many pending

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
descriptionNo
scheduled_atYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and effectively discloses key behavioral traits: it specifies authentication requirements ('API key with write scope'), rate limits ('Max 3 pending schedules per site'), and error conditions (e.g., 'VALIDATION_ERROR'). However, it lacks details on idempotency, retry behavior, or side effects, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, requirements, args, returns, errors), uses bullet-like formatting for readability, and every sentence adds value without redundancy. It is front-loaded with the core purpose and efficiently conveys necessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is highly complete: it covers purpose, prerequisites, constraints, parameters, return values, and error cases. This provides sufficient context for an AI agent to understand and invoke the tool correctly, despite the lack of structured metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Given 0% schema description coverage, the description fully compensates by detailing all three parameters: 'slug' as 'Site identifier', 'scheduled_at' with format and constraints ('ISO 8601 datetime (must be in the future)'), and 'description' with optionality and length limit ('Optional description (max 200 chars)'). This adds essential meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Schedule a snapshot') and resource ('for future execution'), distinguishing it from sibling tools like 'create_snapshot' (immediate creation) and 'cancel_scheduled_snapshot' (cancellation). It precisely communicates the tool's function without redundancy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states prerequisites ('Requires: API key with write scope') and constraints ('Max 3 pending schedules per site'), and it implicitly distinguishes usage from alternatives by focusing on future scheduling versus immediate actions like 'create_snapshot'. This provides clear context for when to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_domainAInspect

Check domain availability and get pricing.

Requires: API key with read scope.

Args: domain: Full domain name (e.g. "example.com", "mybiz.ca")

Returns: {"domain": "example.com", "available": true, "price": {"amount": 15.99, "currency": "CAD", "period": "1 year"}, "premium": false}

Note: .ca domains require ca_legal_type when registering.

ParametersJSON Schema
NameRequiredDescriptionDefault
domainYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well: it discloses authentication requirements ('API key with read scope'), implies a read-only operation (checking availability), and hints at pricing behavior. However, it lacks details on rate limits, error conditions, or whether the check is real-time vs cached.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (purpose, requirements, args, returns, note), uses bullet-like formatting, and every sentence adds value. No redundant information or fluff is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 1 parameter, no annotations, and no output schema, the description is quite complete: it covers purpose, prerequisites, parameter meaning, return format example, and a domain-specific note. It could improve by explicitly stating it's read-only or adding error handling info, but it's largely sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates by explaining the 'domain' parameter with a clear definition ('Full domain name') and examples ('example.com', 'mybiz.ca'). It doesn't cover format constraints like TLD validity, but adds meaningful context beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Check domain availability and get pricing') and identifies the resource ('domain'). It distinguishes itself from sibling tools like 'list_domains' (which likely lists owned domains) and 'register_domain' (which performs registration).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Requires: API key with read scope' specifies prerequisites, and 'Note: .ca domains require ca_legal_type when registering' indicates a constraint for future actions. It implicitly distinguishes from 'register_domain' by focusing on checking availability rather than registration.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_api_keyAInspect

Set your BorealHost API key for this session.

Call this if you already have an API key (from a previous registration, checkout completion, or the BorealHost panel). All subsequent tool calls will use this key for authentication.

No need to call this after register() — the key is set automatically.

Args: api_key: Your BorealHost API key (format: bh_<48 hex chars>)

Returns: {"success": true, "message": "API key set for this session", "key_prefix": "bh_..."}

ParametersJSON Schema
NameRequiredDescriptionDefault
api_keyYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: it sets the key for the session, affects authentication for subsequent tool calls, and includes a return format example. However, it lacks details on error handling or session duration, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the purpose, followed by usage context, prerequisites, and parameter details. Every sentence adds value with zero waste, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no output schema, no annotations), the description is complete. It covers purpose, usage, parameters, and return values, providing all necessary context for an AI agent to invoke it correctly without relying on structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% coverage. It explains the parameter 'api_key' with details on its source and format ('Your BorealHost API key (format: bh_<48 hex chars>)'), fully compensating for the schema's lack of documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Set your BorealHost API key for this session.' It specifies the verb ('Set') and resource ('BorealHost API key'), and distinguishes it from siblings like 'register' by noting it's for existing keys, not registration.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Call this if you already have an API key (from a previous registration, checkout completion, or the BorealHost panel).' It also specifies when not to use it: 'No need to call this after register() — the key is set automatically,' and implies alternatives like 'register' for new keys.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ssl_infoAInspect

Get SSL certificate information for a site.

Returns certificate details, expiry date, and issuer.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"issuer": "Let's Encrypt", "domain": "example.com", "expires_at": "iso8601", "days_remaining": 60, "force_https": true}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and adds valuable behavioral context: it discloses the required API key with read scope (auth needs), describes the return format (certificate details, expiry date, issuer), and includes an example output. However, it lacks details on rate limits, error handling, or whether the operation is read-only.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: it starts with the core purpose, followed by return details, requirements, and parameter explanation in a structured format. Every sentence adds value without waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (single parameter, no output schema, no annotations), the description is mostly complete: it covers purpose, returns, requirements, and parameter semantics. However, it could improve by mentioning potential errors or constraints like rate limits.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It adds meaning by explaining that 'slug' is a 'Site identifier,' which clarifies the parameter's purpose beyond the schema's generic 'Slug' title. Since there is only one parameter, this is sufficient for good understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Get') and resource ('SSL certificate information for a site'), and it distinguishes itself from siblings by focusing on SSL certificate details rather than other site operations like domain management or backups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying 'Requires: API key with read scope,' which suggests when authentication is needed, but it does not explicitly state when to use this tool versus alternatives (e.g., compared to 'domain_detail' or 'get_site_status') or provide exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ssl_renewAInspect

Force SSL certificate renewal via certbot.

Requires: API key with write scope.

Args: slug: Site identifier

Returns: {"renewed": true, "expires_at": "iso8601"}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively communicates that this is a write operation ('Force SSL certificate renewal'), specifies authentication requirements ('Requires: API key with write scope'), and describes the return format. It doesn't mention potential side effects, rate limits, or error conditions, but provides solid foundational information.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and front-loaded with the core purpose in the first sentence. Each subsequent section ('Requires:', 'Args:', 'Returns:') adds essential information without redundancy. Every sentence earns its place, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a write operation with authentication requirements), no annotations, and no output schema, the description does well by explaining the action, prerequisites, parameter meaning, and return format. It could be more complete by mentioning potential errors or confirming it's idempotent, but covers the essentials effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter documentation. The description adds the meaning of the 'slug' parameter ('Site identifier'), which is valuable context. However, with only 1 parameter documented out of 1 total, it doesn't fully compensate for the complete lack of schema documentation, meeting the baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Force SSL certificate renewal') and the method ('via certbot'), distinguishing it from sibling tools like 'ssl_info' which likely provides information rather than performing an action. It specifies the exact resource being acted upon (SSL certificates) with a clear verb.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool by stating 'Force SSL certificate renewal', implying it should be used when certificates need renewal. However, it doesn't explicitly mention when NOT to use it or name specific alternatives among the sibling tools (like 'ssl_info' for checking status).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

switch_phpAInspect

Switch the active PHP version for a site.

Requires: API key with write scope.

Args: slug: Site identifier version: Target PHP version (e.g. "8.3", "8.2", "8.1")

Returns: {"version": "8.3", "result": {...}}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
versionYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the required API key scope, which is useful for authentication needs. However, it does not mention potential side effects (e.g., site downtime, compatibility issues), rate limits, or error handling, leaving gaps in behavioral context for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by prerequisites, parameters, and return format in a structured, bullet-like format. Every sentence earns its place with no redundant information, making it highly efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a mutation with 2 parameters) and lack of annotations or output schema, the description does well by covering purpose, prerequisites, parameters, and return format. However, it could improve by detailing error cases or confirming the mutation's effect (e.g., 'Changes take effect immediately'), making it slightly incomplete for full agent guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It explicitly documents both parameters ('slug' and 'version') with clear meanings and provides an example for 'version' ('e.g., "8.3", "8.2", "8.1"'). This adds significant value beyond the bare schema, though it could specify format constraints (e.g., slug pattern) more precisely.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Switch'), target resource ('active PHP version for a site'), and scope ('for a site'), distinguishing it from sibling tools like 'list_php_versions' or 'get_site_status'. It provides a complete verb+resource+scope statement that is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit prerequisites ('Requires: API key with write scope'), which provides clear context for when to use this tool. However, it does not mention when not to use it or name specific alternatives (e.g., 'list_php_versions' to check available versions first), so it falls short of a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

toggle_moduleAInspect

Enable or disable an AI module on a site.

The module must be in the plan's available module list.

Requires: API key with write scope.

Args: slug: Site identifier module_name: Module to toggle. Available modules: "chatbot" (AI chat widget), "seo" (SEO optimization), "translation" (content translation), "content" (AI content generation)

Returns: {"module": "chatbot", "enabled": true, "message": "Module enabled"}

Errors: NOT_FOUND: Unknown slug or module not in plan VALIDATION_ERROR: Invalid module name

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
module_nameYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes prerequisites (API key with write scope), constraints (module must be in plan's list), and error conditions (NOT_FOUND, VALIDATION_ERROR). It also specifies the return format, though it lacks details on side effects or rate limits. This is strong for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections (purpose, prerequisites, args, returns, errors) and uses bullet points for clarity. However, the 'Args' section could be more integrated into the flow, and some redundancy exists (e.g., repeating 'module' in returns). Overall, it's efficient but not perfectly streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a mutation with prerequisites and constraints), no annotations, no output schema, and 2 parameters, the description is complete. It covers purpose, usage conditions, parameters with examples, return format, and error cases, providing all necessary context for an agent to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It fully documents both parameters: 'slug' as 'Site identifier' and 'module_name' with a detailed list of available modules and their purposes (e.g., 'chatbot' for AI chat widget). This adds significant meaning beyond the bare schema, making parameter usage clear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Enable or disable an AI module on a site.' It specifies the verb ('enable or disable'), resource ('AI module'), and target ('on a site'), distinguishing it from sibling tools like 'list_modules' (which only reads) or 'install_app' (which installs applications rather than toggling modules).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage: 'The module must be in the plan's available module list' and 'Requires: API key with write scope.' However, it does not explicitly state when not to use this tool or name alternatives (e.g., 'list_modules' to check availability first), which prevents a score of 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_accountAInspect

Update account profile fields (email, language, name).

Requires: API key with write scope. Only provided (non-empty) fields are updated.

Args: email: New email address language: Language preference — "fr" (French) or "en" (English) first_name: First name last_name: Last name

Returns: {"success": true, "account": {"email": "...", "language": "fr", "first_name": "...", "last_name": "..."}}

Errors: VALIDATION_ERROR: Invalid email format or language code

ParametersJSON Schema
NameRequiredDescriptionDefault
emailNo
languageNo
last_nameNo
first_nameNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behavioral traits: it requires an API key with write scope, uses partial updates (only non-empty fields), and includes error handling (VALIDATION_ERROR). It also hints at the return structure, though not fully detailed. No contradictions exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose, followed by requirements, update behavior, and structured sections for Args, Returns, and Errors. Every sentence earns its place with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 parameters, mutation operation, no annotations, no output schema), the description is quite complete. It covers purpose, prerequisites, update behavior, parameters, return example, and errors. A slight gap exists in not fully detailing the return structure or all possible errors, but it's largely adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It does so by listing all 4 parameters (email, language, first_name, last_name) with clear semantics: email format, language options ('fr' or 'en'), and name fields. This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates account profile fields (email, language, name), providing a specific verb and resource. It distinguishes itself from siblings like delete_account or whoami by focusing on modification rather than deletion or read-only operations. However, it doesn't explicitly differentiate from potential update-related siblings like update_checkout.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context with 'Requires: API key with write scope' and 'Only provided (non-empty) fields are updated,' which helps determine when to use it. It doesn't explicitly mention alternatives or exclusions, but the context is sufficient for basic usage decisions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_checkoutAInspect

Set buyer email and desired site slug on a checkout session.

The checkout must be in "not_ready" status. Setting requested_slug transitions status to "ready" (required before completing).

Args: checkout_id: Checkout session ID from create_checkout buyer_email: Optional email — if omitted, a synthetic agent identity (agent-{uuid}@api.borealhost.ai) is created at completion requested_slug: Desired site identifier. Must be 3-50 chars, lowercase alphanumeric + hyphens, cannot start/end with hyphen. Must be globally unique.

Returns: {"id": "uuid", "sku": "...", "plan_slug": "...", "billing_period": "monthly", "status": "ready", "buyer_email": "...", "requested_slug": "my-site", "created_at": "iso8601"}

Errors: VALIDATION_ERROR: Invalid slug format or slug already taken FORBIDDEN: Missing checkout_secret NOT_FOUND: Unknown checkout_id

ParametersJSON Schema
NameRequiredDescriptionDefault
buyer_emailNo
checkout_idYes
requested_slugNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the tool mutates a checkout session (implied by 'Set'), requires a specific status ('not_ready'), triggers a status transition to 'ready', and handles errors like validation and authentication issues. It lacks details on rate limits or idempotency, but covers core operational context well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by prerequisites, parameter details, return values, and errors. It's appropriately sized with no redundant sentences, though the error list is slightly verbose but informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mutation with status transitions), no annotations, 0% schema coverage, and no output schema, the description is highly complete. It covers purpose, usage context, parameter semantics, return format, and error cases, providing all necessary context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds significant semantic value beyond the schema: it explains that 'checkout_id' comes from 'create_checkout', details optional 'buyer_email' behavior (synthetic identity if omitted), and specifies format and uniqueness rules for 'requested_slug'. This fully documents all three parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Set buyer email and desired site slug on a checkout session.' It specifies the verb ('Set') and resources ('buyer email' and 'desired site slug'), and distinguishes it from sibling tools like 'create_checkout' (creation) and 'complete_checkout' (finalization).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines: 'The checkout must be in "not_ready" status. Setting requested_slug transitions status to "ready" (required before completing).' It indicates when to use this tool (to transition from 'not_ready' to 'ready') and implies it's a prerequisite for 'complete_checkout'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upload_fileAInspect

Upload a base64-encoded file to a site's container.

Use this for binary files (images, archives, fonts, etc.). For text files, prefer write_file().

Requires: API key with write scope.

Args: slug: Site identifier path: Relative path including filename (e.g. "images/logo.png") content_b64: Base64-encoded file content

Returns: {"success": true, "path": "images/logo.png", "size": 45678}

Errors: VALIDATION_ERROR: Invalid base64 encoding FORBIDDEN: Protected system path

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYes
slugYes
content_b64Yes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the mutation nature ('Upload'), authentication requirements ('API key with write scope'), error conditions ('VALIDATION_ERROR', 'FORBIDDEN'), and return format. However, it doesn't mention rate limits, idempotency, or side effects on existing files.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with a purpose statement, usage guidelines, requirements, parameter explanations, return example, and error conditions—all in compact, well-organized sections. Every sentence adds value with zero waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides comprehensive context: clear purpose, usage guidelines, authentication requirements, parameter semantics, return format example, and error conditions. This adequately compensates for the lack of structured metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing clear semantic explanations for all three parameters: 'slug: Site identifier', 'path: Relative path including filename', and 'content_b64: Base64-encoded file content'. It includes examples and format guidance beyond what the bare schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Upload a base64-encoded file') and target resource ('to a site's container'), with explicit distinction from the sibling tool write_file() for text files. This provides a precise verb+resource pairing and sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('Use this for binary files') and when to prefer an alternative ('For text files, prefer write_file()'). It also includes prerequisites ('Requires: API key with write scope'), providing comprehensive usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

whoamiAInspect

Check the current API key's account info, scopes, and site count.

Requires: BOREALHOST_API_KEY env var (read scope).

Returns: {"user": {"id": "uuid", "email": "...", "date_joined": "iso8601"}, "api_key": {"id": "uuid", "name": "...", "prefix": "bh_...", "scopes": ["read", "write"], "created_at": "iso8601"}, "account": {"sites": 2, "active_subscriptions": 1}}

Errors: UNAUTHORIZED: Missing or invalid API key

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does so effectively. It discloses authentication requirements (API key with read scope), describes the return format in detail with example structure, and lists error conditions (UNAUTHORIZED). It doesn't mention rate limits or caching behavior, but covers the essential operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and front-loaded: purpose first, then requirements, return format, and errors. Every sentence adds essential information with zero waste. The bullet-point format for Returns and Errors enhances readability without unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a zero-parameter authentication/status tool with no annotations and no output schema, the description provides complete context. It explains what the tool does, prerequisites, detailed return format with example structure, and error conditions. No output schema exists, so the description appropriately documents the return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters with 100% schema description coverage. The description appropriately doesn't discuss parameters since none exist. It could theoretically mention that no inputs are required, but this is adequately covered by the schema. The baseline for 0 parameters is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Check') and the exact resources being retrieved ('current API key's account info, scopes, and site count'). It distinguishes itself from sibling tools by focusing on authentication/account status rather than site operations, backups, or deployments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: to check the current API key's information. It provides prerequisites ('Requires: BOREALHOST_API_KEY env var (read scope)') and distinguishes it from alternatives by its unique purpose of returning authentication/account metadata rather than performing operations on sites or resources.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

wp_check_updatesAInspect

Check for available WordPress core, plugin, and theme updates.

Requires: API key with read scope.

Args: slug: Site identifier

Returns: {"core": {"current": "6.5", "update": "6.6"}, "plugins": [{"name": "...", "current": "1.0", "new": "1.1"}], "themes": [...]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses authentication requirements ('API key with read scope') and implies a read-only operation by stating 'Check for available updates,' but lacks details on rate limits, error handling, or what happens if no updates are found. It adds some behavioral context but is incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the purpose, followed by prerequisites, arguments, and return structure. Each section is brief and informative, with no wasted sentences, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (checking updates for multiple components), no annotations, and no output schema, the description is fairly complete. It covers purpose, prerequisites, parameters, and return values with examples. However, it lacks details on error cases or behavioral nuances, leaving minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It adds meaning by explaining that 'slug' is a 'Site identifier,' which clarifies the parameter beyond the schema's generic 'Slug' title. With only one parameter, this provides adequate semantic context, though it could detail format or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Check for available WordPress core, plugin, and theme updates.' It specifies the exact resources (core, plugins, themes) and the action (check for updates), distinguishing it from siblings like 'wp_update_all' which performs updates rather than checking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage with 'Requires: API key with read scope,' indicating prerequisites. However, it does not explicitly state when to use this tool versus alternatives like 'wp_update_all' or other status-checking tools, nor does it mention exclusions or specific scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

wp_update_allAInspect

Update WordPress core, all plugins, and all themes.

Runs all updates in sequence. May take up to 2 minutes.

Requires: API key with write scope.

Args: slug: Site identifier

Returns: {"core": {...}, "plugins": [...], "themes": [...]}

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and adds valuable behavioral context: it discloses the execution time ('May take up to 2 minutes'), authentication requirements ('Requires: API key with write scope'), and the return format. It doesn't mention potential side effects like site downtime or error handling, but covers key operational aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by behavioral details, requirements, and input/output information. Every sentence earns its place with no wasted words, and it uses clear sections for readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mass updates with potential site impact), no annotations, and no output schema, the description does well by covering purpose, behavior, auth, parameters, and return format. It could improve by mentioning prerequisites (e.g., site must be online) or error cases, but it's largely complete for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It explains the single parameter 'slug' as 'Site identifier', adding essential meaning beyond the schema's bare 'Slug' title. However, it doesn't provide examples or format details (e.g., URL slug vs. internal ID), leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Update WordPress core, all plugins, and all themes') and the resource ('WordPress site'), distinguishing it from siblings like 'wp_check_updates' (which only checks) and other non-WordPress tools. It precisely defines the scope of updates performed.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Runs all updates in sequence'), but does not explicitly mention when not to use it or name alternatives. It implies usage for comprehensive updates rather than selective ones, though it doesn't reference sibling tools like 'manage_plugin' or 'manage_theme' for individual updates.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

write_fileAInspect

Write or overwrite a text file in a site's container.

Creates parent directories if they don't exist.

Requires: API key with write scope.

Args: slug: Site identifier path: Relative path to the file content: File content as a UTF-8 string

Returns: {"success": true, "path": "...", "size": 1234}

Errors: NOT_FOUND: Unknown slug FORBIDDEN: Protected system path

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYes
slugYes
contentYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully describes key behaviors: creates parent directories automatically, requires specific API permissions (write scope), and includes error conditions (NOT_FOUND, FORBIDDEN). It also shows the return structure. The main gap is lack of information about rate limits, idempotency, or concurrency considerations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (purpose, behavior, requirements, args, returns, errors) and every sentence adds value. It's front-loaded with the core functionality, followed by important behavioral details, and uses bullet-like formatting for parameters and errors without unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides substantial context: purpose, behavior, authentication requirements, parameters, return format, and error conditions. The main gap is lack of information about what happens with concurrent writes or whether the operation is atomic. However, given the tool's complexity and lack of structured metadata, it's quite comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing clear semantic explanations for all three parameters: 'slug' as site identifier, 'path' as relative file path, and 'content' as UTF-8 string. The description adds essential meaning beyond what the bare schema provides, including encoding information and path relativity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Write or overwrite a text file'), identifies the target resource ('in a site's container'), and distinguishes it from sibling tools like 'read_file' (for reading) and 'delete_file' (for deletion). It goes beyond just restating the name by specifying the file type and location context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (writing text files to site containers) and implicitly distinguishes it from alternatives like 'upload_file' (likely for binary/upload operations) and 'read_file' (for reading). However, it doesn't explicitly state when NOT to use it or name specific alternative tools for different scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.