shiply

by now.shiply

Server Details

Publish any app in one call: SQL database, functions, email, and a custom domain. Flat price.

Status: Healthy
Last Tested: 2026-08-02 22:37
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

B3.4/5.0

Tool DescriptionsA

Average 4.2/5 across 103 of 103 tools scored. Lowest: 3.2/5.

Server CoherenceC

Disambiguation2/5

Many tools have overlapping purposes (e.g., add_domain, add_subdomain, add_custom_domain, add_sending_domain) or similar names like check_domain and check_custom_domain, which can confuse an agent. While descriptions help, the sheer number of closely related tools makes it difficult to consistently select the correct one.

Naming Consistency3/5

Most tools follow a verb_noun pattern, but there are exceptions like 'read_thread' (instead of get_thread) and 'mark_thread_read' (verb_noun_verb). Prefixes like 'data_' are used inconsistently with other tools, but overall the naming is still readable.

Tool Count1/5

With 103 tools, the server is extremely bloated. Even for a comprehensive platform, this count overwhelms any agent and makes the tool surface difficult to navigate. Many tools could be merged or are too granular.

Completeness4/5

The toolset covers a wide range of operations across sites, domains, email, marketplace, projects, contracts, and functions. While some updates (e.g., general contract editing) are missing, the surface is mostly comprehensive for the domain.

Available Tools

114 tools

add_custom_domainRegister a custom domainA

Idempotent

Inspect

Register a registrable domain (e.g. example.com) the user owns and detect its DNS provider. Returns the provider and whether one-click connect is available. Then attach sites with add_subdomain.

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	registrable domain you own, e.g. example.com

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations: 'detect its DNS provider', 'one-click connect', and guidance to use add_subdomain next. Annotations indicate idempotent and open world, which are consistent with registration behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action and returns, no unnecessary words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with output schema and rich annotations, the description fully covers what the tool does, what it returns, and the next step. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline 3 is appropriate. The description repeats schema info ('registrable domain you own') but doesn't add significant new semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Register', the resource 'custom domain', and specifies it detects DNS provider and returns provider/one-click connect availability. It distinguishes from siblings like add_subdomain by noting the subsequent step.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (registering a user-owned domain) and mentions the next step (add_subdomain). It provides context but doesn't explicitly state when not to use or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_domainConnect a custom domainA

Idempotent

Inspect

Serve a site on a domain the user owns. Returns the CNAME to add (hostname → cname.shiply.now); the certificate issues automatically once DNS resolves. Poll with check_domain.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	owned site slug to serve there
`hostname`	Yes	full hostname to serve, e.g. www.example.com

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotent and open-world. The description adds valuable behavioral details: returns CNAME, certificate issues automatically after DNS resolves, and suggests polling with check_domain. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that are front-loaded with purpose, no superfluous words, and clear structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given low complexity and rich schema/annotations, the description covers the tool's operation, return value, and next steps. It could mention prerequisites (e.g., domain ownership) but that is already implied.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and both parameters are well-described. The description adds only minor context (hostname example, slug reference), so it meets the baseline but does not exceed it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool serves a site on a user-owned domain, with a specific verb and resource. It distinguishes from siblings like add_custom_domain by focusing on serving a site and returning the CNAME.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use (to serve a site on a domain) and what to do after (poll with check_domain). It does not explicitly list alternatives or when not to use, but it is adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_sending_domainAdd a sending domainA

Idempotent

Inspect

Register a domain the user owns for outbound demand-test sends. Returns DNS records (SPF, DKIM, MX) to add at the DNS provider. After DNS propagates, call verify_sending_domain. Cannot be a shiply.now subdomain.

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	e.g. mail.yourbrand.com

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations declare openWorldHint=true and idempotentHint=true, and the description adds meaningful behavioral context: the tool returns DNS records (SPF, DKIM, MX) and requires subsequent verification. It does not contradict any annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three well-structured sentences. The first sentence immediately states the primary purpose, and subsequent sentences provide necessary details without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple single-parameter tool with an output schema (implied), the description covers all essential aspects: purpose, required resource, return type, follow-up action, and a constraint. It is fully adequate for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for its only parameter 'domain', with an example ('e.g. mail.yourbrand.com'). The description does not add additional semantics beyond the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Register a domain'), resource ('the user owns'), and purpose ('for outbound demand-test sends'). It also specifies that DNS records will be returned and the next step (call verify_sending_domain). This effectively distinguishes it from sibling tools like add_custom_domain or add_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: it registers a domain for sending and requires DNS propagation and verification. It also explicitly states a constraint ('Cannot be a shiply.now subdomain'). However, it does not mention when not to use this tool or direct alternatives, so a slight deduction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_subdomainPoint a subdomain at a siteA

Idempotent

Inspect

Serve an owned site at . (use subdomain "@" or "" for the apex — apex needs a provider with CNAME flattening/ALIAS, e.g. Cloudflare). Auto-registers the parent domain. Returns the CNAME record to add (host -> cname.shiply.now); the certificate issues automatically once DNS resolves. Poll with check_domain.

ParametersJSON Schema

Name	Required	Description
`slug`	Yes	owned site slug to serve there
`domain`	Yes	registrable parent domain, e.g. example.com
`subdomain`	Yes	subdomain label, or "@"/"" for the apex

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide openWorldHint and idempotentHint, indicating safe side effects and idempotency. The description adds behavioral details: auto-registers parent domain, returns CNAME record, and automatic certificate issuance after DNS resolves. This provides useful context beyond annotations, though it could mention if there are any destructive steps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, covering the main function, apex handling, follow-up action, and return value. It is efficiently written with zero wasted words, front-loaded with the core purpose, and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 required params, no nested objects), the description is complete. It explains the return value (CNAME record) and the subsequent step (poll with check_domain). The presence of an output schema (not shown) further reduces the need to describe return format. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for each parameter. The description reiterates the schema's definitions (e.g., 'owned site slug', 'registrable parent domain') but does not add new meaning or format details. Baseline 3 is appropriate as schema already documents the parameters effectively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Serve an owned site at <subdomain>.<domain>') and specifies the resource (subdomain on a domain). It distinguishes from siblings like add_domain by focusing on subdomain mapping and includes special handling for apex. The verb 'point' and resource 'subdomain' are specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: use when you want to serve a site at a subdomain. It mentions apex requirements and advises polling with check_domain. However, it does not explicitly state when not to use this tool or list alternatives (e.g., add_custom_domain), so it's slightly lacking in exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_suppressionSuppress an email or domainA

Idempotent

Inspect

Add an address (or whole domain) to the user's suppression list. Future confirmations and broadcasts will skip it across every test. Use kind='email' for a single address, kind='domain' for everyone @example.com.

ParametersJSON Schema

Name	Required	Description
`kind`	Yes	'email' for one address, 'domain' for everyone @example.com
`notes`	No	optional reason / note for the suppression
`value`	Yes	e.g. 'spammy@example.com' or 'competitor.com'

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotentHint=true, and the description adds behavioral context by stating that suppressed addresses are skipped across every test for future confirmations and broadcasts. This goes beyond the annotation without contradicting it.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences convey the purpose, effect, and parameter usage with no wasted words. The description is front-loaded with the action and immediately informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown) and good annotations, the description covers the tool's effect adequately. It mentions cross-test impact, but could briefly mention return value format; still, it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description reinforces the schema examples but does not add new parameter details beyond what is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool adds an address or domain to the suppression list, with clear distinction from sibling tools like remove_suppression. The effect on future confirmations and broadcasts is specified, making the purpose unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use kind='email' vs kind='domain', covering the two main use cases. It does not explicitly state when not to use the tool, but the context is sufficient for an AI agent to decide appropriately.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

archive_projectArchive a projectA

Idempotent

Inspect

Move a project to status='archived'. Hidden from the default dashboard list. Optional reason is shown on the project page. Restore later with restore_project.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	project id to archive
`reason`	No	optional reason shown on the project page

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the idempotentHint annotation, the description explains the specific behavioral effect (status change, hidden from dashboard) and the optional reason display. It does not contradict annotations and provides useful context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, zero waste: action stated first, then effect, then restoration option. Front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, output schema exists, idempotentHint) and sibling context, the description covers the key points: what it does, behavioral effect, and how to revert. It does not explain the output format, but that is handled by the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description merely reiterates the parameter purpose ('optional reason is shown on the project page') without adding new meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Move') and resource ('project'), explicitly stating the status change to 'archived' and the effect of hiding from dashboard. It distinguishes from sibling tools like 'archive_thread' and mentions the counterpart 'restore_project'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates when to use the tool (to hide a project from default list without deletion) and provides an alternative for reversal ('Restore later with restore_project'), though it lacks explicit exclusions or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

archive_threadArchive a threadA

Idempotent

Inspect

Soft-archive the thread (sets archivedAt). Hidden from the default inbox view; surface again with list_inbox filter=archived. Reverse with unarchive_thread.

ParametersJSON Schema

Name	Required	Description	Default
`threadId`	Yes	thread id from list_inbox to archive

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds key behavioral details beyond the idempotentHint annotation: it is a 'soft-archive' (mutates a field), it hides the thread from default view but makes it accessible via a filter, and it is reversible. This fully informs the agent of the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, consisting of two short sentences that convey all necessary information without any redundant or extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple nature of the tool (one parameter, straightforward effect), the description completely and adequately explains the tool's purpose, behavior, and context. No additional information is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the parameter 'threadId' is described in the schema as 'thread id from list_inbox to archive'. The description adds no additional semantic information beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'soft-archive' and the resource 'thread', specifying the effect of setting 'archivedAt'. It distinguishes itself from related tools like 'unarchive_thread' and 'list_inbox'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (to hide from default inbox) and how to reverse it with 'unarchive_thread'. It also notes that the thread can be surfaced again with a filter. However, it does not explicitly state when not to use it or provide alternatives beyond the reverse operation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

attach_databaseAttach a database to a siteA

Idempotent

Inspect

Bind an existing database to one owned site's Worker env (the binding name chosen at create_database time). Takes effect on the site's next function/publish deploy.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	database id (from list_databases)
`siteSlug`	Yes	owned site slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotentHint=true. The description adds behavioral context beyond annotations by explaining that the binding takes effect on the next deploy, implying a non-immediate effect. This provides useful transparency without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the action, and contains no unnecessary words. Every sentence adds value, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple attachment tool with two required parameters, high schema coverage, annotations, and an output schema, the description is fully complete. It explains the action, timing, and prerequisite without needing to elaborate on return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers 100% of parameters with clear descriptions (database id from list_databases, owned site slug). The description adds context about the binding name set at creation, but this does not significantly enhance understanding of the parameters themselves. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('bind an existing database to one owned site's Worker env') and identifies the specific resources (database, site). It distinguishes from sibling tools like create_database and delete_database by focusing on binding an existing database rather than creating or deleting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by stating that the binding takes effect on the next deploy and mentions the prerequisite that the binding name was set at create_database time. However, it does not explicitly list alternatives or when not to use this tool, though the context is sufficiently specific.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

attach_variableAttach a variable to a siteA

Idempotent

Inspect

Expose one saved variable to ONE owned site's Worker env (plain_text binding). Opt-in per site — unattached variables are never injected, because the Worker runs the site's own code. Takes effect on the site's next function deploy.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	variable name to attach, e.g. SUPABASE_URL
`slug`	Yes	owned site slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral details beyond annotations: binding type (plain_text), opt-in per site, effect on next function deploy. Consistent with idempotentHint=true. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with no fluff: core action, security context, and deployment timing. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a simple tool with two params and an output schema. Explains binding type, effect timing, and optional nature. Lacks explanation of 'Worker env' implications but output schema likely covers return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions (name, slug) at 100% coverage. Description reinforces 'saved variable' and 'owned site' but adds minimal new semantics beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool exposes a saved variable to one owned site's Worker environment as a plain_text binding. It specifies the resource (variable) and target (site), and distinguishes from siblings like delete_variable, set_variable, or attach_database.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context: use when you need a variable accessible to a site's Worker, and notes unattached variables are never injected. Does not explicitly name alternatives or exclusions, but gives enough guidance for correct use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_custom_domainCheck a custom domainA

Read-only

Inspect

Check whether a custom domain's subdomains are live: re-polls Cloudflare cert status + probes DNS/TLS/HTTPS on each subdomain. Poll this after connect_provider / add_subdomain to confirm the domain is serving. Returns per-subdomain status + tls + http + ready.

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	registered custom domain to check, e.g. example.com

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds significant behavioral context beyond annotations: re-polls cert, probes DNS/TLS/HTTPS, and returns per-subdomain status. No contradiction with read-only and open-world hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no filler. Front-loaded with main action, then additional context. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one required parameter, annotation hints, and existence of output schema (not shown), the description covers what it does and what it returns. No gaps for typical usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Single parameter with schema providing description and example. The description repeats the parameter purpose but adds no extra semantics beyond the schema; however, the example is helpful. With 100% schema coverage, baseline is 3, so 4 for the example.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it checks if a custom domain's subdomains are live, with specific actions (re-polls Cloudflare cert, probes DNS/TLS/HTTPS). It distinguishes from siblings like check_domain by mentioning it is used after connect_provider/add_subdomain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to poll after connect_provider or add_subdomain to confirm domain is serving. While it doesn't list when not to use or alternatives, the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_domainCheck a custom domainA

Read-only

Inspect

Refresh certificate status + live TLS/HTTPS probe for a connected domain (by id from list_domains).

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	connected domain id from list_domains

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint; the description adds context that it performs a live probe, which aligns with openWorldHint and provides transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with key action and resource, no redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With full schema coverage, annotations, and an output schema, the description fully explains the tool's behavior and parameter source; no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; the description merely restates the schema's parameter description ('connected domain id from list_domains'), adding no new meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the action ('refresh certificate status + live TLS/HTTPS probe') and the resource ('connected domain'), and distinguishes from siblings by requiring the domain id from list_domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly implies the prerequisite of having a domain id from list_domains, but does not explicitly state when to use this tool over alternatives like check_custom_domain.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

connect_providerConnect the domain's DNS providerA

Idempotent

Inspect

Start one-click DNS connect for a registered custom domain. For Cloudflare-hosted domains this returns an authorize URL — SHOW THE USER the url as a clickable link; after they authorize, records are written automatically. For other providers, add the records manually.

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	registered custom domain to connect, e.g. example.com

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (openWorldHint, idempotentHint), the description discloses key behaviors: Cloudflare returns an authorize URL and auto-writes records after authorization; other providers require manual record addition. This adds value without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded purpose, no wasted words. Every sentence is essential: one for the action, one for provider-specific instructions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (two scenarios), one parameter, and the presence of an output schema, the description is complete. It covers the key behavioral difference and guides the agent on handling Cloudflare vs. other providers.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter 'domain'. The description adds minimal extra meaning beyond the schema's 'registered custom domain to connect'—it provides an example but no new semantic detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Start one-click DNS connect') and the resource ('registered custom domain'), with specific verb and resource. It distinguishes between Cloudflare and other providers, setting it apart from sibling tools like add_domain or check_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: for Cloudflare hosts, show the authorize URL; for others, add records manually. It does not explicitly state when not to use it, but the context is clear. Sibling tools imply prerequisites like adding a domain first.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_amendCreate an amendment to a signed contractAInspect

Create an amendment to a SIGNED parent contract. Scope delta required; fee delta and target date optional. Returns the draft amendment for editing before send — call contract_send with the returned amendment id to fire it. Cannot amend an amendment (amend the parent instead).

ParametersJSON Schema

Name	Required	Description
`scopeDelta`	Yes	What's changing — visible to customer.
`feeDeltaCents`	No	Optional fee adjustment in cents. Can be negative for descope.
`parentContractId`	Yes	id of the SIGNED parent contract to amend
`targetCompletionDate`	No	Optional ISO date (YYYY-MM-DD) for revised completion.

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool creates a draft amendment (non-destructive creation) and requires a signed parent contract. The idempotentHint=false annotation is honored; the description adds context about the two-step process and state requirement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no fluff. Front-loaded with core purpose, followed by workflow and constraint. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and full parameter descriptions, the description adequately covers the creation workflow, state prerequisites, and integration with contract_send. No gaps for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already covers all parameters (100% coverage), but the description adds workflow context (e.g., 'scope delta required; fee delta and target date optional') and constraint ('negative for descope' reinforces schema). Minor extra value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create' and the resource 'amendment to a SIGNED parent contract'. It distinguishes from siblings like contract_draft and contract_send by specifying the workflow and constraints (e.g., cannot amend an amendment).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use (create amendment to signed parent) and when-not (cannot amend an amendment). Also guides next steps ('call contract_send with the returned amendment id to fire it') and clarifies required/optional parameters.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_draftDraft a contract from a brief_ready projectA

Idempotent

Inspect

Draft a contract from a brief_ready project. Auto-fills 8 fields from the AI brief, Stripe Connect default currency, and dev profile. Returns the contract row with status='draft' so the dev can review fields before sending. After this, edit fields via PATCH /api/v1/contracts/{id} (no MCP edit tool yet), then call contract_send to fire it.

ParametersJSON Schema

Name	Required	Description	Default
`projectId`	Yes	Project to draft a contract for

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that it auto-fills 8 fields, uses default currency and dev profile, and returns a draft status. This adds behavioral context beyond the idempotentHint annotation. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loads the primary purpose, and then concisely adds workflow context. Every sentence provides value with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single parameter, annotations, and presence of an output schema, the description fully explains the tool's behavior, return value (contract row with status='draft'), and subsequent steps (edit and send). No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'projectId' has a schema description ('Project to draft a contract for') that already clarifies its purpose. The description does not add additional semantics beyond what the schema provides, meeting the baseline for 100% coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Draft a contract') and the resource ('a brief_ready project'). It explains the auto-fill behavior and return value, and contrasts with sibling tools like contract_amend and contract_send, providing clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description outlines the workflow: draft from brief_ready project, then edit via PATCH API (noting no MCP edit tool yet), then call contract_send. This provides context for when to use and what to do next, though it does not explicitly state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_pdfGet the signed contract PDF (base64)A

Read-only

Inspect

Get the signed contract PDF as a base64-encoded download. PDF includes the contract, signature certificate, and any signed amendments. Returns { filename, contentType, base64 }. Errors with conflict:contract_not_signed if the parent contract has not been signed yet.

ParametersJSON Schema

Name	Required	Description	Default
`contractId`	Yes	signed contract id (parent or amendment) to render as PDF

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only. The description adds detail about the PDF content (contract, signature, amendments) and the return format, as well as the specific error case. This goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no wasted words. Each sentence adds value: purpose/output format, content details, error condition. Front-loaded with main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description sufficiently covers behavior, return structure, and a key error. Minor omission of other potential errors (e.g., invalid ID), but adequate for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description does not add significant new semantics beyond the schema's description of contractId as 'signed contract id'. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a signed contract PDF as base64, specifying the resource (signed contract PDF) and verb (get). It distinguishes from sibling tools like contract_status or contract_draft by focusing on PDF download.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for signed contracts via the error condition, but does not explicitly state when to use over alternatives like contract_status. No direct comparison with siblings is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_sendSend a draft contract to the customerAInspect

Send a draft contract to the customer. Validates all 8 fields are non-empty, computes content_hash, flips project to contract_sent, fires the customer email. Same handler works for amendment drafts — sending an amendment does not move project state.

ParametersJSON Schema

Name	Required	Description	Default
`contractId`	Yes	draft contract id to send (from contract_draft)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses validation, content_hash computation, state change to contract_sent, and email firing. Also clarifies amendment behavior. No annotation contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose. Every sentence provides essential information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter, output schema, and annotations, the description covers validation, state changes, email, and amendment handling comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter contractId is well described in schema. Description adds context about field validation, enhancing understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Send a draft contract to the customer.' Differentiates from siblings by noting amendment drafts do not move project state, distinguishing from contract_amend and contract_draft.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on when to use: validates all 8 fields are non-empty, and works for both regular and amendment drafts. Lacks explicit when-not alternatives, but implicit guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_statusRead contract state + amendmentsA

Read-only

Inspect

Read the current state of a contract: status, sent_at, viewed_at, signed_at, signer info, content_hash, plus any amendments. Use this to check whether a customer has signed yet. Returns { contract, amendments } — the contract row matches GET /api/v1/contracts/{id}.

ParametersJSON Schema

Name	Required	Description	Default
`contractId`	Yes	contract id to read state for

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description reinforces a read-only behavior ('Read the current state...'). It further discloses the precise shape of the return value ('Returns { contract, amendments }'), adding value beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with purpose and key fields, followed by a usage hint and return structure. Every sentence serves a clear function; no extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given a single-parameter read-only tool with an existing output schema (not shown but referenced), the description is complete. It tells the agent exactly what data is returned and when to use it, leaving no ambiguity for a simple read operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There is only one parameter with 100% schema coverage. The description restates the parameter's function ('contract id to read state for'), which aligns with the schema's description. No additional semantic nuance is added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states a specific verb ('Read') and resource ('current state of a contract') and enumerates key fields (status, timestamps, signer info, content_hash, amendments). It distinguishes itself from sibling tools like contract_amend, contract_draft, contract_send.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states a use case ('Use this to check whether a customer has signed yet'), providing clear context for when to apply the tool. It does not formally outline when not to use it, but the purpose is unambiguous among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_databaseCreate a databaseA

Idempotent

Inspect

Provision a SQL database — D1 (default, free) or Neon Postgres (--postgres, developer plan). Optionally attach it to an owned site's Worker env in the same call (siteSlug); otherwise attach it later with attach_database.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	database name, a-z 0-9 -, 2-63 chars
`binding`	No	Worker binding name, UPPER_SNAKE (defaults to a name derived from `name`)
`provider`	No	defaults to d1
`siteSlug`	No	owned site slug to attach immediately

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide idempotentHint=true. The description adds value by disclosing provider plans (free D1, developer Neon), default behavior, and the attachment integration. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the main action and provider options, then adding the optional attachment detail. Every word is necessary and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 4 parameters, 1 required, and an output schema, the description covers the core workflow. It could mention the binding parameter's role more explicitly, but overall it is sufficient for an agent to select and use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter is documented. The description adds integration context (siteSlug for immediate attachment, provider selection via --postgres) beyond the schema. This enriches understanding without repeating schema details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provisions a SQL database with specific providers (D1 or Neon) and optionally attaches it to a Worker environment. It differentiates from sibling tools like attach_database by indicating that attachment can be done in the same call or later.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear when-to-use guidance: it shows the default provider (D1) and alternative (Neon with --postgres), and explains how to attach immediately via siteSlug or later via attach_database. It does not explicitly state when not to use it, but the optionality is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_driveCreate a driveAInspect

Create a private cloud Drive (plan-limited). Pass client to file it under a client.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	display name for the new drive
`client`	No	optional: file this under a client (the publish/site is grouped in their customer view)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the creation operation and mentions plan limitations, adding context beyond the idempotentHint annotation. However, it does not detail the behavioral impact (e.g., immediate availability, response structure) or error scenarios.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two short sentences that convey the essential information without any filler or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and the simplicity of the tool, the description is fairly complete. It covers the creation action, resource type, plan restriction, and optional client association. Missing details like error handling are minor for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already describes both parameters with 100% coverage. The description adds minor value by mentioning the client parameter's purpose ('file it under a client'), but does not provide new information beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action 'Create a private cloud Drive' and specifies the resource type and a constraint ('plan-limited'). It distinguishes from sibling drive tools like list_drives and drive_delete_file by focusing on creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides guidance on the optional client parameter ('Pass client to file it under a client'), which helps the agent understand when to use that parameter. However, it does not explicitly state when to use this tool versus alternatives or list exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_listingList one of my sites for saleAInspect

Publish (or upsert) a marketplace listing for an owned site. Requires Stripe Connect set up (status='ready' — see get_connect_status). priceCents = whole-dollar between 100 and 999900. termsMode='standard' uses shiply's template; 'custom' requires termsCustom ≥50 chars. jurisdiction is required (e.g. 'California, USA').

ParametersJSON Schema

Name	Required	Description
`pitch`	No	short sales pitch, ≤280 chars
`status`	No	publish state (default draft)
`siteSlug`	Yes	slug of the owned site to list
`termsMode`	Yes	'standard' uses shiply's template; 'custom' requires termsCustom
`priceCents`	Yes	whole-dollar price in cents, 100–999900
`termsCustom`	No	custom terms text, ≥50 chars, required when termsMode='custom'
`jurisdiction`	Yes	governing jurisdiction, e.g. 'California, USA'

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations (only idempotentHint: false) by stating the action is an upsert and listing acceptance criteria (e.g., price range, termsMode constraints). However, it does not disclose side effects like what happens on success/failure, whether it overwrites existing data, or any irreversible consequences.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the main action, followed by a prerequisite and parameter details. Every sentence is essential, and there is no fluff or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown), the description does not need to explain return values. It covers prerequisites, constraints, and parameter details. However, it could be more complete by explaining the upsert behavior (e.g., how it identifies existing listings) and potential side effects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value by clarifying that termsMode='custom' requires termsCustom ≥50 chars (not in schema) and provides example for jurisdiction. This enriches the parameter meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Publish (or upsert) a marketplace listing for an owned site.' It uses a specific verb-resource combination and distinguishes from siblings like update_listing and delete_listing through the 'upsert' hint.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies a prerequisite: 'Requires Stripe Connect set up (status='ready' — see get_connect_status).' This provides clear context for when to use the tool. However, it lacks explicit guidance on when not to use it or how it differs from update_listing, such as whether to use create_listing for new listings and update_listing for existing ones.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_projectCreate a client-intake projectAInspect

Spin up a new customer-intake project on the dev's account. Returns the project row plus intakeUrl — the public link the customer fills out (10-step wizard). If customerEmail is provided, shiply also emails them the intake invite. Use originatedFromSiteId to link a project to an existing site (e.g. 'redesign this site').

ParametersJSON Schema

Name	Required	Description
`label`	Yes	project name shown in the dev dashboard
`customerName`	No	the customer's name
`customerEmail`	No	customer email; if set, shiply emails them the intake invite
`originatedFromSiteId`	No	link this project to an existing site (e.g. a redesign)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses side effects (email sending) and return values (project row, intakeUrl), though no explicit mention of idempotency or failure modes. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with front-loaded main action, no redundant information, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main behavior and return values for a creation tool with an output schema; could mention required parameter 'label' but schema handles that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaningful context beyond schema descriptions by explaining the overall flow and purpose of parameters, such as linking to a site for redesign.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'spin up' and resource 'customer-intake project', distinguishing it from siblings like list_projects, get_project, archive_project, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context for when to use optional parameters (customerEmail triggers email, originatedFromSiteId links to site), but lacks explicit exclusion of alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_testCreate an email demand testAInspect

Provision a demand test in one call: deploys a landing page with a native email-capture form and creates a confirmed-subscriber segment. Returns testId + live siteUrl. Share the siteUrl to collect signups; each signup gets a double-opt-in confirmation. Read progress with get_test_status.

ParametersJSON Schema

Name	Required	Description
`cta`	No	call-to-action button label
`sub`	No	subheadline / supporting line
`idea`	Yes	the product/idea name
`price`	No	price to display, e.g. "$29/mo"
`headline`	Yes	landing-page headline

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the idempotentHint=false annotation, the description discloses the full workflow: deployment, subscriber segment creation, double-opt-in confirmation for signups, and return of testId and siteUrl. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with two sentences that front-load the primary action and follow with essential details. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and the tool's complexity, the description covers all necessary context: what it does, what it returns, and how to use the results. It is self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add extra semantics to the parameters beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provisions a demand test, deploys a landing page with a native email-capture form, creates a confirmed-subscriber segment, and returns testId and siteUrl. It is distinct from sibling tools like list_tests and get_test_status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the use case (one-call provisioning of a demand test) and suggests sharing the siteUrl and monitoring progress with get_test_status. It does not explicitly mention when not to use or alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_export_collectionExport records (capped)A

Read-only

Inspect

Return up to limit records (default 1000, max 5000) from a collection — for snapshotting into agent context. For larger sets use the CLI: shiply data export <slug> <collection>.

ParametersJSON Schema

Name	Required	Description
`slug`	Yes	owned site slug
`limit`	No	max records to return (default 1000, max 5000)
`collection`	Yes	collection name to export

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Consistent with readOnlyHint annotation. Adds concrete behavioral details: limit capping (default 1000, max 5000) and the fact that records are returned up to the limit. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the main action and limits. No wasted words; the alternative CLI command is a concise addition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so return values need not be described. Tool is simple (export records with limit); description covers purpose, limits, and alternatives, making it fully adequate for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description restates the limit parameter's default and max but adds no new semantic meaning beyond what the schema already provides. Does not elaborate on slug or collection beyond their names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb (return/export), resource (records from a collection), and purpose (snapshotting into agent context). Distinguishes from siblings like data_query by explicitly noting the CLI alternative for larger sets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (small exports for agent context) and when not to (larger sets → CLI). Also provides default and max limit values, giving clear usage boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_insertInsert a record into a collectionAInspect

Insert one record into a collection. Goes through the same public visitor endpoint a browser would use — manifest access.insert decides whether it is allowed. Use to seed waitlist data, test forms end-to-end, etc.

ParametersJSON Schema

Name	Required	Description
`slug`	Yes	owned site slug
`record`	Yes	the record fields to insert as key/value pairs
`collection`	Yes	collection name to insert into

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context by noting the endpoint is a public visitor endpoint and that 'manifest access.insert' determines permission. However, it does not discuss idempotency (annotations already note idempotentHint=false) or error handling. Overall, it provides some value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise at two sentences. The first sentence defines the core function, and the second provides context and use cases. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 params, nested object, output schema), the description is fairly complete. It covers purpose, endpoint, and use cases. While it omits details like return format, the presence of an output schema mitigates this. Some minor gaps remain (e.g., behavior on duplicate), but overall reasonable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description does not add significant meaning beyond what the schema already provides for parameters (slug, collection, record). It mentions 'record' but without extra detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Insert one record into a collection.' It uses a specific verb ('Insert') and resource ('record'), and includes example use cases ('seed waitlist data, test forms end-to-end'), effectively distinguishing it from sibling tools like data_export_collection and data_query.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage context ('Use to seed waitlist data, test forms end-to-end') and implies when to use this tool (for inserts). It does not explicitly state when not to use it, but the sibling tools cover reads and exports, making the guidelines adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_list_collectionsList Site Data collectionsA

Read-only

Inspect

List collections declared in an owned site's .shiply/data.json with current record counts. Empty list means the site has no manifest yet — scaffold one with shiply data init.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	owned site slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint: true, and the description aligns with a read operation. It adds valuable context: the tool returns current record counts, and an empty list means no manifest exists. It also suggests a CLI command for scaffolding. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first directly states the purpose, and the second adds actionable context for an edge case. No unnecessary words; information is front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown), the description adequately covers the tool's behavior: it lists collections with counts and explains empty results. It might be enhanced by noting any ownership constraints or limits, but it is sufficient for a simple read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage with a single 'slug' parameter described as 'owned site slug.' The description does not add further semantic detail beyond what the schema provides, meeting the baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and the specific resource 'collections declared in an owned site's .shiply/data.json with current record counts.' It distinguishes itself from siblings like data_insert and data_query by focusing on listing collection metadata. The mention of an empty list hinting at scaffolding further clarifies the tool's purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (to list collections) and interprets an empty result as a signal to scaffold a manifest. However, it does not explicitly state when not to use it or mention alternative tools for data manipulation, though this is implied by the name and sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_queryQuery records from a collectionA

Read-only

Inspect

Page records from an owned site's collection, newest-first. limit ≤ 200 (default 50). cursor from a previous response's nextCursor.

ParametersJSON Schema

Name	Required	Description
`slug`	Yes	owned site slug
`limit`	No	max records to return, ≤200 (default 50)
`cursor`	No	nextCursor from a previous response's page
`collection`	Yes	collection name from data_list_collections

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds behavioral context: returns records newest-first and paginated. Could disclose rate limits or auth needs, but ordering and pagination are valuable additions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no extraneous words. Every sentence provides essential information about pagination, ordering, and constraints. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, annotations, and output schema existence, the description covers pagination mechanism, parameter constraints, and ordering. No major gaps for a paginated read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds defaults for limit (50) and source for cursor (previous nextCursor), plus clarifies slug and collection context beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'page records', the resource 'owned site's collection', and ordering 'newest-first'. It distinguishes from sibling tools like data_insert and data_list_collections by specifying pagination and ordering.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides concrete usage constraints: limit ≤200 (default 50), cursor from previous nextCursor. However, it does not explicitly compare to alternatives like data_export_collection for full exports.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_databaseDelete a databaseA

Destructive

Inspect

PERMANENTLY delete a database and all its data. Irreversible — confirm with the user first.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	database id (from list_databases)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds 'PERMANENTLY' and 'Irreversible' beyond the destructiveHint annotation, providing clarity on the irreversible nature. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the critical warning. No extraneous information. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a one-parameter destructive tool with an output schema and annotation, the description sufficiently covers the action, its permanence, and usage caution. Complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and already describes the 'id' parameter as 'database id (from list_databases)'. Description does not add further meaning beyond that.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'PERMANENTLY delete a database and all its data' with specific verb 'delete' and resource 'database'. Distinguishes from siblings like delete_listing, delete_site, etc., by specifying database.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly warns that the action is irreversible and requires user confirmation, guiding the agent on when to use. Does not explicitly mention alternatives, but no alternative exists for deleting a database.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_listingUnpublish a listing (set to draft)A

Destructive

Inspect

Take a listing off the public marketplace by moving it to status='draft'. Marketplace v1 keeps the row so analytics + future re-listing work — there's no hard delete. Sold listings can't be modified. Use to stop accepting offers without losing pricing history.

ParametersJSON Schema

Name	Required	Description	Default
`siteSlug`	Yes	slug of the listed site to unpublish

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the destructiveHint annotation, description reveals the soft-delete nature (keeps row, enables re-listing), the status change to draft, and the restriction on sold listings, adding valuable behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences front-load the purpose, include key details, and contain no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With a single param fully described, annotations providing destructiveness context, and the description covering behavior, usage, and constraints, the tool is fully specified for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only parameter siteSlug is well-described in schema (100% coverage). Description does not add further meaning to the parameter, which is acceptable given the baseline of 3 for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'take a listing off the public marketplace' and the resource 'listing', distinguishes from hard delete, and notes sold listings can't be modified, differentiating it from siblings like update_listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Use to stop accepting offers without losing pricing history' and notes constraints (sold listings can't be modified), providing clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_siteDelete a siteA

Destructive

Inspect

PERMANENTLY delete a site and all stored files. Irreversible — confirm with the user first.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug to delete

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the destructiveHint annotation by emphasizing permanence, irreversibility, and inclusion of all stored files. It also recommends user confirmation, which annotations alone do not provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that immediately conveys the critical information (permanence, resource, scope) with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with one parameter and an output schema, the description fully explains the irreversible consequences and what is deleted (site and files), meeting completeness needs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The parameter 'slug' is fully described in the schema ('site slug to delete'); the description adds no extra meaning beyond confirming it identifies the site. With 100% schema coverage, this is adequate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'PERMANENTLY delete a site and all stored files,' using specific verb and resource, and distinguishes from sibling delete tools (e.g., delete_database) by specifying 'site' and 'all stored files'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises 'confirm with the user first,' implying caution and appropriate context for irreversible actions. It does not explicitly compare to alternatives but provides clear guidance on when to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_variableDelete a variableB

Destructive

Inspect

Remove one variable by name.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	variable name to delete

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds no behavioral context beyond the existing 'destructiveHint' annotation. It does not disclose what happens upon deletion (e.g., permanent removal, cascading effects) or any permissions required. With the annotation present, the description fails to provide additional useful context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence that immediately conveys the purpose. There is no unnecessary content, and the key action is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple destructive tool with one parameter and an output schema, the description covers the essential purpose. However, a note about permanence or confirmation behavior would improve completeness, given the destructive nature.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions 'by name', which matches the schema's required 'name' parameter, but adds no extra semantics such as naming conventions or constraints beyond the schema description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Remove one variable by name.' clearly states the action and resource. It is specific with the verb 'Remove' and the resource 'variable', and it indicates the key parameter 'by name'. However, it does not explicitly distinguish from sibling tools like 'detach_variable' or 'set_variable', which handle related operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No usage guidance is provided. The description does not indicate when to use this tool over alternatives such as 'detach_variable' or 'set_variable', nor does it mention prerequisites or side effects like irreversibility.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deploy_functionDeploy a Worker function to a siteA

Idempotent

Inspect

Deploy a Worker function to a site. The function runs on every request to .shiply.now and can receive webhooks, run on cron triggers, and access bindings (D1, secrets, env vars). Requires Developer plan. Use when the user wants webhook receivers, cron jobs, or a backend for their site.

ParametersJSON Schema

Name	Required	Description
`lang`	No	source language (default js)
`slug`	Yes	site slug to deploy the function to
`crons`	No	cron triggers to register, ≤20
`source`	Yes	the Worker source code

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the function runs on every request, supports webhooks and crons, and requires a Developer plan. Annotations only provide idempotentHint; the description adds value but lacks details on conflict behavior (e.g., overwriting existing functions) or any side effects beyond deployment.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: two sentences plus a usage line, all front-loaded. Every word adds value with no repetition or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, the description covers the main purpose, capabilities, and usage context. It does not mention prerequisites like the site needing to exist (though implied by the required slug). An output schema exists, so return value details are not needed. Overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter is already described. The description adds overarching context (function capabilities) but does not provide additional meaning to individual parameters beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it deploys a Worker function to a site, specifying the target URL pattern and capabilities like webhooks, cron triggers, and bindings. It distinguishes from siblings like publish_site (static sites) and set_cron (managing crons on existing functions).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use when the user wants webhook receivers, cron jobs, or a backend for their site.' and mentions 'Requires Developer plan.' This provides clear context for when to use, though it does not explicitly mention when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

detach_variableDetach a variable from a siteA

Destructive

Inspect

Stop exposing a variable to a site's Worker env. Takes effect on the site's next function deploy.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	variable name to detach
`slug`	Yes	owned site slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations note destructiveHint=true; description adds crucial timing detail 'Takes effect on the site's next function deploy' beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. Action verb first, then effect timing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with output schema and annotations, description covers the key behavioral aspect (detach and deploy timing). Nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% and both parameters are well-described. Description adds no extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verb 'stop exposing' and resource 'variable to a site's Worker env', clearly distinguishing from sibling tools like attach_variable and delete_variable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidelines for when to use this tool versus alternatives like delete_variable or set_variable. Usage is implied but not clarified.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

drive_delete_fileDelete a drive fileA

Destructive

Inspect

Delete a file from a Drive.

ParametersJSON Schema

Name	Required	Description	Default
`path`	Yes	path of the file to delete
`driveId`	Yes	drive id (drv_…) or "default"

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation already marks the tool as destructive (destructiveHint: true). The description adds minimal behavioral context beyond that; it does not mention irreversibility or recovery options.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence with no unnecessary words or details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple 2-parameter structure, the description is adequate but could be more complete by noting that the operation is permanent or specifying any required permissions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and both parameters are described in the schema. The description adds no additional meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete') and the resource ('a file from a Drive'). It is distinct from sibling tools like drive_list_files and drive_put_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool, any prerequisites (e.g., file existence), or alternatives for similar operations. The description only states the action.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

drive_list_filesList drive filesA

Read-only

Inspect

List files in a Drive (driveId = drv_…, or "default"). Optional prefix filter.

ParametersJSON Schema

Name	Required	Description	Default
`prefix`	No	only list files under this path prefix
`driveId`	Yes	drive id (drv_…) or "default"

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description does not contradict the readOnlyHint annotation, but it adds minimal behavioral context beyond the annotation. It does not mention pagination, ordering, or error handling, which are typical for list operations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence of 13 words, efficiently conveying the core purpose with no extraneous information. Front-loaded and to the point.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and the simple nature of a list operation, the description is mostly complete. It could clarify the 'prefix filter' behavior, but overall it adequately covers the tool's functionality.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description largely restates the schema's parameter descriptions. No additional parameter semantics are provided, so it meets the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List files'), the resource ('in a Drive'), and specifies the driveId format with an optional prefix filter. This distinguishes it from sibling tools like drive_delete_file or drive_put_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like list_project_files. The usage is implied by context but not spelled out. A brief statement about when not to use it would improve clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

drive_put_fileWrite a drive fileA

Idempotent

Inspect

Write a file into a Drive (driveId = drv_… or "default"). content is utf8 or base64. Use for agent memory, notes, context, assets.

ParametersJSON Schema

Name	Required	Description
`path`	Yes	destination path inside the drive, e.g. notes/context.md
`content`	Yes	file contents (utf8 text, or base64 when encoding=base64)
`driveId`	Yes	drive id (drv_…) or "default"
`encoding`	No	default utf8; base64 for binary

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare idempotentHint=true, indicating safe repeated calls. The description adds encoding details (utf8/base64) and drive ID format but does not specify whether the tool overwrites existing files or creates new ones. This missing information is relevant for understanding behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the purpose and key constraints. Every word is relevant, with no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists and annotations provide safety context, the description covers core purpose and use cases. However, it omits overwrite behavior, which is a minor gap for a write tool with idempotent hint. Still, it is largely complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% parameter description coverage, so the schema already explains each parameter. The description reiterates content encoding but adds no meaningful new semantic information beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool writes a file into a Drive, specifies the driveId format, and lists use cases like memory, notes, context, and assets. This distinguishes it well from siblings like drive_list_files and drive_delete_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit use cases (agent memory, notes, context, assets) but does not offer when-not-to-use guidance or name alternative tools for similar tasks. The context is clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

duplicate_siteDuplicate a siteAInspect

Server-side copy of an owned site under a new slug — instantly live. Copies files + title; does NOT copy access settings, domains, or data. Great for iterating on variants.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug to copy
`title`	No	display title for the new copy (defaults to source title)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations only provide idempotentHint=false. Description adds behavioral details: server-side copy, instantly live, and explicit list of copied vs. non-copied attributes. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states core functionality, second lists copied attributes and use case. No redundancy, front-loaded, efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given simple 2-parameter tool with output schema, description covers what, how, and why. Addresses scope (copied vs. non-copied) and typical use case, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. Description reinforces that slug is the new slug and title is optional defaulting to source, but adds limited new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Specific verb 'duplicate' and resource 'site' with clear scope: server-side copy under new slug, instantly live. Distinguishes from siblings by listing what is copied (files+title) and what is not (access, domains, data).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('Great for iterating on variants') and implicitly when not (if you need to copy access settings, domains, or data). Could be improved by naming alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

export_accountExport account dataA

Read-only

Inspect

Return a JSON bundle of the user's profile, sites, Site Data, drives, and metadata (secrets excluded). Data portability.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description adds behavioral context: what data is included (profile, sites, drives, metadata) and specifically excluded (secrets). This goes beyond the annotation to inform the agent of output content and scope.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first sentence packs the essential information, second sentence is a simple tag. No wasted words, and key details are front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and an output schema, the description adequately covers output contents and exclusion of secrets. Could mention potential size or synchronous nature, but current detail is sufficient for a straightforward export.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters defined in the input schema, so description does not need to add parameter info. Baseline score of 4 applies as there is nothing missing.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Return' and clearly defines the resource as a JSON bundle of user account data, listing exact contents. It distinguishes from siblings by focusing on full account export versus individual data retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like data_export_collection or get_account_status. The phrase 'Data portability' implies a use case but does not establish clear conditions or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

feature_siteFeature a site on ExploreA

Idempotent

Inspect

Toggle whether an owned, public site appears in the public shiply Explore gallery (https://shiply.now/explore). Only public-access sites are eligible.

ParametersJSON Schema

Name	Required	Description	Default
`show`	Yes	true to feature on Explore, false to remove
`slug`	Yes	owned, public site slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide idempotentHint. The description uses 'Toggle' which could be misinterpreted as a flip rather than the setter indicated by the input schema (show boolean). It adds the eligibility condition but does not clarify error behavior or side effects beyond what schema suggests.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences cover purpose and constraint with no redundancy. Front-loaded with the action and resource. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so return values are covered. The description covers purpose, eligibility, and context (gallery URL). Missing details about error conditions or side effects like notifications, but acceptable for a simple boolean-setter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline 3. The description adds context about the gallery and eligibility but does not enhance understanding of the parameters beyond their schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool toggles whether a site appears in the public Explore gallery, specifying the verb (toggle) and resource (site on Explore), with a link and eligibility condition (public-access only). It is easily distinguishable from sibling tools like promote_site or publish_site.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates when to use (when you own a public site and want to show/hide it on Explore) but does not explicitly compare with alternatives or state preconditions like site ownership. It mentions eligibility but lacks when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

forward_threadForward an inbox message to a new recipientAInspect

Forward a message from a thread (default: the most recent message) to a NEW recipient with an optional intro note. Body is composed as note + standard '---------- Forwarded message ----------' quote of the original. Goes from the thread's existing shiply alias so replies still route through the inbox. Subject defaults to 'Fwd: '.

ParametersJSON Schema

Name	Required	Description
`to`	Yes	new recipient email address
`note`	No	intro note prepended above the forwarded quote
`subject`	No	subject; defaults to 'Fwd: <original>'
`threadId`	Yes	thread id from list_inbox to forward from
`messageId`	No	specific message to forward (default: most recent)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description details body composition, alias usage, subject default, and optional note, adding significant behavioral context beyond the simple annotations (openWorldHint, idempotentHint). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences front-load key information without waste. Each sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and parameters are well-described, the description covers forwarding behavior, body format, alias routing, and defaults completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 5 parameters have schema descriptions (100% coverage), and the tool description adds rich context like default message selection and subject prefix.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'forward', the resource 'a message from a thread', and specifies default behavior (most recent message). It distinctly separates from siblings like send_email or reply_to_thread.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (forward to a new recipient, using the thread's alias) but doesn't explicitly contrast with alternative tools like send_email or reply_to_thread.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_account_statusAccount plan + capability matrixA

Read-only

Inspect

Get the signed-in account's plan, capabilities, and upgrade URL. Call this FIRST when figuring out what features you have access to — it tells you exactly what's available and what's blocked. The upgrade_url is human-clickable; show it in chat when a feature requires a higher plan. Returns plan id + name + subscription status, hard limits (sites, databases, custom domains, drives), and a capability matrix listing every gated feature (workers_lite, databases_neon_postgres, custom_domains, etc.) with whether you have access and the minimum plan needed.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description details the return values: plan id, name, subscription status, hard limits, and a capability matrix. It also notes that upgrade_url is human-clickable. Annotations already mark it as readOnlyHint=true, and the description adds valuable behavioral context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that front-loads the purpose in the first sentence. It is informative but could be slightly more compact; however, it earns its place with clear, direct statements.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and the presence of an output schema (implied by the detailed return description), the description is fully complete. It covers what the tool does, when to use it, and what it returns, leaving no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are zero parameters, so the description does not need to explain them. Per the calibration, 0 params yields a baseline of 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get the signed-in account's plan, capabilities, and upgrade URL.' It uses a specific verb ('get') and resource ('account status'), and distinguishes from sibling tools by focusing on account-level information rather than individual resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to 'Call this FIRST when figuring out what features you have access to' and explains how to use the upgrade_url ('show it in chat when a feature requires a higher plan'). This provides clear when-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_analyticsSite analyticsB

Read-only

Inspect

Daily page views per site for the last 30 days.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation readOnlyHint=true already signals a safe read operation. The description adds that it returns daily page views for the last 30 days, which is useful but does not disclose other behavioral traits such as whether results are paginated or if historical data is mutable. The annotation carries the transparency burden, so a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single clear sentence, front-loaded with key information. Every word is necessary and nothing is wasted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains what data is returned but leaves ambiguity: does it return data for all sites or a specific site? Without output schema details, the agent might not know the granularity. Given the tool's simplicity and the presence of an output schema, the description is adequate but could be more explicit.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With zero parameters and 100% schema coverage, the description does not need to provide parameter details. The baseline for no-parameter tools is 4, and the description adds no confusion.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'Daily page views per site for the last 30 days,' identifying the resource (analytics) and scope (last 30 days). However, it does not explicitly use a verb like 'retrieve' or 'get,' and it does not differentiate from siblings, though no close sibling exists.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There is no mention of limitations or prerequisites, leaving the agent to infer usage context on its own.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_connect_statusStripe Connect onboarding statusA

Read-only

Inspect

Return the seller's Stripe Connect state: not_started | in_progress | pending_verification | ready | disabled. When status != 'ready' the user can't list sites. Includes a one-shot onboardingUrl (if not_started or in_progress) and dashboardUrl (if ready). Refreshes from Stripe on every call.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that it refreshes from Stripe on every call (potential side effect despite readOnlyHint). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences, front-loaded with core purpose, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Explains all necessary context: status implications, included URLs, and refresh behavior. Output schema exists, so return fields need not be detailed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist; baseline 4 applies. Description adds no parameter info, but none needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states return value (Stripe Connect state with enumerated statuses) and resource. It clearly distinguishes from sibling tools by focusing on Stripe Connect onboarding, which no sibling covers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Describes when to use: to check if user can list sites (status == 'ready' required). No explicit alternatives or when-not-to, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_functionGet the deployed function for a siteA

Read-only

Inspect

Return the deployed function source + metadata for a site, or null if no function deployed.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug whose function to fetch

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, confirming it's a safe read operation. The description adds the behavior of returning null if no function is deployed, which is useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with no wasted words. Every part is necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and only one simple parameter, the description is complete. It explains what is returned and the null case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'slug' is fully described in the schema (100% coverage). The description does not add additional meaning beyond the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action: returning the deployed function source and metadata for a site, including the null case. It distinguishes from siblings like 'deploy_function' and 'get_function_logs'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for retrieving function details but does not explicitly state when to use it over alternatives like 'get_function_logs'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_function_logsRead recent function logsA

Read-only

Inspect

Read recent runtime logs for a site's per-site Worker (console output, request summaries, exceptions) from Cloudflare Workers Observability — 7-day retention, newest first. Use this to debug a deployed function: each event has timestamp, level, message, outcome, statusCode, requestId, and CPU/wall time. Also returns a CF dashboard deep-link.

ParametersJSON Schema

Name	Required	Description
`slug`	Yes	site slug whose function logs to read
`limit`	No	max events to return (default 50)
`since`	No	look back this many minutes (default 60, max 10080 = 7 days)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behavioral traits: 7-day retention, newest-first ordering, and the exact fields returned (timestamp, level, message, outcome, statusCode, requestId, CPU/wall time). It also mentions a Cloudflare dashboard deep-link. These details add value beyond the readOnlyHint annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a compact two sentences, with no wasted words. The first sentence covers the main purpose and constraints, the second elaborates on return data. It is appropriately front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (3 params, 1 required, output schema present), the description is complete. It explains what the tool does, when to use it, what the logs contain, and retention/ordering. No gaps remain for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. The description does not add parameter-specific meaning beyond the schema; it focuses on output details. While this is acceptable, there is no extra semantic enrichment for parameters like slug, limit, or since.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads recent runtime logs for a site's per-site Worker, specifying the resource (function logs), action (read), and additional details (console output, request summaries, exceptions). It distinguishes itself from sibling tools like get_function or deploy_function by focusing on log retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this to debug a deployed function,' giving clear context. However, it does not provide negative guidance or mention when not to use this tool versus alternatives, though sibling tool names imply differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_projectGet one project (full row)A

Read-only

Inspect

Read one of the dev's projects by id — includes label, status, customer details, intake responses, AI brief, drive folder. Use before update_brief / regenerate_brief.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	project id

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already set readOnlyHint=true, so the read-only behavior is clear. The description adds that it includes specific fields, which is useful but does not disclose additional behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first explains the purpose and included data, the second provides usage guidance. Highly concise without superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has one parameter, read-only annotations, an output schema, and the description covers purpose and usage, it is complete for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already fully describes the single parameter (id) with format and pattern. The description does not add further semantics, so it meets the baseline for 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads a project by ID and lists the included fields (label, status, customer details, etc.), distinguishing it from sibling tools like list_projects or update_brief.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use before update_brief / regenerate_brief', providing clear context for when to use this tool. It does not explicitly mention when not to use it, but the guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_siteSite detailB

Read-only

Inspect

Site settings + version history for one of my sites.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug, e.g. my-site

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation declares readOnlyHint=true, and the description confirms a read operation (retrieving settings and version history). No additional behavioral traits (e.g., rate limits, specific permissions) are disclosed beyond what annotations provide. The description does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded with the core purpose ('Site settings + version history'). Every word is necessary, and there is no redundancy. It is highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one parameter, read-only, output schema exists), the description covers the essential purpose. It does not mention authentication or scoping, but these are typically assumed. The description is sufficiently complete for this level of complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage (slug is described). The description does not add any extra meaning or constraints for the parameter beyond the schema's 'site slug, e.g. my-site'. Baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns site settings and version history for one site. It implicitly uses the 'get' verb, and the resource 'site' is unambiguous. However, it does not differentiate from sibling tools like get_site_access or list_versions, but the combination of settings + version history provides enough specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool versus alternatives. For example, it does not mention that list_sites should be used to enumerate sites, or that get_site_access is for access control. The description is purely declarative without usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_site_accessRead site access controlA

Read-only

Inspect

Read an owned site's current access policy (mode: public/password/restricted, allowedEmails, allowedDomains, hasPassword). Read-only counterpart to set_site_access — check before changing it.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	owned site slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds context as counterpart to set_site_access and lists returned fields, but no additional behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with colon-separated list and second clause. Concise, front-loaded, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists (implied), description lists all returned fields clearly. Simple read tool with clear purpose and usage guidance; complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with slug described as 'owned site slug'. Description repeats 'owned site' but adds no new meaning beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Explicitly states reading an owned site's access policy with specific fields (mode, allowedEmails, etc.). Distinguishes from sibling set_site_access.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to use before changing access (check before setting) and identifies read-only counterpart to set_site_access, guiding when to use vs. alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_test_statusDemand test status (the verdict)A

Read-only

Inspect

ONE consolidated object: page funnel (views, signups, confirmed, conversion) ⊕ email events (delivered/opened/clicked/bounced) ⊕ a computed verdict. The single place to check progress — never query email separately.

ParametersJSON Schema

Name	Required	Description	Default
`testId`	Yes	demand test id from create_test / list_tests

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the agent knows it is safe. The description adds behavioral context by specifying the consolidated nature and that it includes a computed verdict, which goes beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first lists the components, the second gives a usage directive. It is front-loaded with the essential information and contains no fluff. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description does not need to explain return values. It adequately describes the three components (page funnel, email events, verdict). The tool is simple with one parameter, and all necessary context is provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% as the only parameter 'testId' is well described in the schema. The description does not add extra meaning about the parameter beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it returns a consolidated object with page funnel metrics, email events, and a computed verdict. The verb 'get' is implied, and the resource is test status. It distinguishes from siblings by emphasizing it is the single source for progress, contrasting with querying email separately.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance: use this tool as 'The single place to check progress' and advises against querying email separately. It does not explicitly mention when not to use it, but the context is clear enough for an agent to prioritize this over individual email queries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_complaintsList complaints + bouncesA

Read-only

Inspect

Return threads tagged as complaints (spam reports) OR bounces (recipient rejected). Read these before any further sending.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description is consistent with the readOnlyHint annotation and adds no contradiction. It discloses the tool is for reading, but does not provide additional behavioral details beyond what annotations already cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences with no unnecessary words, effectively conveying the tool's purpose and usage context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters and an output schema is present, the description provides sufficient context. It explains the type of threads returned and when to use the tool, making it complete for this simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With zero parameters and 100% schema description coverage, the description adds no additional parameter meaning, which is acceptable. The baseline of 4 applies due to no parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns threads tagged as complaints or bounces, with a specific verb 'return' and resource 'threads'. The title reinforces this, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises to read these before any further sending, providing clear context for when to use the tool. However, it does not explicitly mention when not to use or suggest alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_cronsList cron triggers for a siteA

Read-only

Inspect

List cron triggers for a site's deployed function. Each cron is (path, schedule, lastRunAt).

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug whose cron triggers to list

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the tool is known to be read-only. The description adds the output format (path, schedule, lastRunAt), which is useful but not critical beyond annotations. No behavioral contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences: first sentence states the action, second sentence describes the output. No unnecessary words, achieving high conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple structure (1 param, output schema exists), the description sufficiently explains what the tool does and what each entry contains. However, it implicitly assumes a deployed function exists, which could be clarified, but overall it's complete enough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the parameter 'slug' described as 'site slug whose cron triggers to list'. The tool description adds no new semantic meaning beyond the schema, fitting the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists cron triggers for a site's deployed function, specifying the verb 'list' and the resource ('cron triggers for a site'). It also describes the output format (path, schedule, lastRunAt), distinguishing it from sibling tools like set_cron or remove_cron.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like set_cron or remove_cron, or other list tools. The description is purely descriptive of the action without contextual usage advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_custom_domainsList custom domainsA

Read-only

Inspect

List registered custom domains grouped with their subdomains, each subdomain's site and status, and the detected provider.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the read-only nature is covered. The description adds behavioral context about the returned data structure (grouped subdomains with site, status, provider), which is useful beyond the annotation. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded and contains no redundant information. Every word contributes to defining the tool's output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and a read-only annotation, the description adequately covers the key output aspects. It does not mention pagination or ordering, but those are typically absent when no parameters are provided. An output schema exists to further clarify.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters in the schema, so the description cannot add parameter-level details. Baseline is 4 because with 0 parameters, the tool is self-explanatory.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses the specific verb 'List' and resource 'custom domains' with details on grouping by subdomains, each subdomain's site and status, and detected provider. This clearly distinguishes from sibling tools like list_domains which likely returns a simpler list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool versus alternatives such as list_domains or check_custom_domain. The description does not mention context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_databasesList my databasesA

Read-only

Inspect

List the SQL databases (D1 or Neon Postgres) on my account, including which owned site (if any) each is attached to. Call this BEFORE db_query/db_schema-style work to discover a databaseId — those live on a per-database MCP server reached via GET /api/v1/databases/{id} (see llms.txt), which this id feeds.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the readOnlyHint annotation, the description discloses that the output includes an id used to reach a per-database MCP server, and mentions the GET endpoint for further operations, adding valuable behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: the first provides the core function, the second explains usage context. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description sufficiently covers what the tool does, when to use it, and the key output element (databaseId). No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With zero parameters and 100% schema coverage, the description has no need to explain parameters. The baseline score of 4 applies as no additional parameter details are required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists SQL databases (D1 or Neon Postgres) including attached site, and explicitly distinguishes it from sibling tools like create_database or delete_database by positioning it as a prerequisite for db_query/db_schema operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It directly advises to call this tool BEFORE db_query/db_schema work to obtain a databaseId, and explains that those tools live on a separate per-database MCP server, providing clear workflow context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_domainsList custom domainsB

Read-only

Inspect

List connected custom domains with status.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description's mention of 'connected' and 'status' adds mild context. However, it does not explain what 'connected' means or provide other behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with no wasted words. It is appropriately sized for a simple list tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present and no parameters, the description is fairly complete but lacks explanation of 'connected' status and does not mention any ordering or filtering. Minimal but not fully informative.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has no parameters, so baseline is 4. The description adds no parameter info, but it is not needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists connected custom domains with status. However, the presence of a sibling 'list_custom_domains' creates ambiguity about the difference, lowering the score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives like 'list_custom_domains' or when not to use it. The description lacks context for appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_drivesList drivesA

Read-only

Inspect

List the user's private cloud Drives (id, name). Optional clientId filters to one client.

ParametersJSON Schema

Name	Required	Description	Default
`clientId`	No	only drives filed under this client

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, so the description adds value by specifying returned fields and optional filtering. This goes beyond the annotation without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the main purpose and quickly adds the optional filter. Every word contributes; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and the tool is simple (1 optional param), the description sufficiently covers purpose and filter. It could mention pagination or limits but is generally complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers the clientId parameter completely (100% coverage). The description merely restates the filtering behavior without adding new semantic detail beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list' and the resource 'the user's private cloud Drives', specifying returned fields (id, name). It distinguishes from sibling tools like create_drive or drive_delete_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the optional clientId filter but does not explicitly state when to use this tool versus other list tools (e.g., list_databases, list_projects). Usage context is implied but not elaborated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_inboxList inbox threadsA

Read-only

Inspect

List the user's email inbox threads (outbound demand-test sends + inbound replies / unsubscribes / complaints / bounces). Filter by tag.

ParametersJSON Schema

Name	Required	Description
`limit`	No	max threads to return, ≤200
`filter`	No	default all
`offset`	No	number of threads to skip (pagination)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description's statement of listing is consistent. Beyond that, the description does not disclose additional behavioral traits (e.g., pagination behavior, rate limits) but adds context on thread types. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, front-loaded sentence that conveys the essential purpose and filter capability. Every word adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description does not need to explain return values. It sufficiently covers the tool's scope, filter, and context, making it complete for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description mentions 'Filter by tag' but the schema's filter parameter with enum values already covers that. No additional meaning is added beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool lists the user's email inbox threads, specifying the types of threads (outbound demand-test sends, inbound replies, unsubscribes, complaints, bounces). This specific verb-resource-scope combination distinguishes it from siblings like list_site_inbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for the user's own email inbox but does not explicitly state when to use it versus alternatives such as list_site_inbox or list_test_inbox_addresses. No when-not or exclusion criteria are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_listingsList my marketplace listingsA

Read-only

Inspect

Return every marketplace listing the seller owns (any status: draft, live, paused, sold). Includes site slug and current price. Use to see what's for sale across the account.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond the readOnlyHint annotation by specifying that it returns listings of any status and includes site slug and price. It does not contradict annotations and provides a useful behavioral summary.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the main action and details. Every sentence adds value, with no redundancy or unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and presence of an output schema, the description is reasonably complete. It explains what the tool returns and its purpose. Minor omission: no mention of pagination or ordering, but likely not needed for a simple list-all tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With zero parameters and 100% schema description coverage, the baseline score of 4 applies. The description adds no parameter information, but none is needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns every marketplace listing the seller owns, enumerating statuses (draft, live, paused, sold) and specifying included fields (site slug, current price). It effectively distinguishes from sibling tools like create_listing or list_my_orders by focusing on marketplace listings rather than other entities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use to see what's for sale across the account,' providing clear context for when to use the tool. While it does not explicitly exclude alternatives or state when not to use it, the purpose is sufficiently clear given sibling tool names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_mailbox_contactsList a mailbox's contactsA

Read-only

Inspect

List captured contacts for a (site, collection) mailbox, optionally filtered by status (signed_up/confirmed/unsubscribed). Returns email, status, confirmedAt, createdAt, and captured fields.

ParametersJSON Schema

Name	Required	Description
`slug`	Yes	site slug the mailbox belongs to
`status`	No	filter contacts by status
`collection`	Yes	mailbox collection name

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description is consistent. It adds the return fields (email, status, etc.), but does not disclose any other behavioral traits like pagination, rate limits, or ordering. With annotations covering the read-only nature, a 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence (24 words) that front-loads the purpose. Every word is informative without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists, the description does not need to explain return values, but it does (email, status, etc.). It lacks mention of pagination, sorting, or error conditions, but the presence of output schema and 100% parameter coverage makes it fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already well-documented. The description lists the status enum values and mentions optional filtering, but does not add meaning beyond what the schema already provides for slug and collection.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('list'), the resource ('captured contacts for a (site, collection) mailbox'), and an optional filter ('by status'). It is specific enough to distinguish from sibling list tools like list_inbox or list_suppressions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, nor any conditions or prerequisites. The description only mentions optional filtering, but no context for when this tool is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_my_ordersList orders where I am the buyerA

Read-only

Inspect

Return every marketplace order the user PURCHASED. Most recent first. Shows the acquired site, paid amount, and whether the order is still inside the 30-day refund window. Use to recap what the user owns by purchase.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	max orders to return (default 100, max 500)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description adds value by noting the refund window and default sorting. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the main action, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one optional parameter and an output schema, the description fully covers behavior, return fields, and purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter 'limit', and the description does not add extra semantic information beyond what the schema provides; baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it returns marketplace orders purchased by the user, ordered most recent first, and lists specific fields (site, amount, refund window). Distinguishes from siblings like list_my_sales by specifying 'buyer' role in title.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a use case ('recap what the user owns by purchase') but does not mention when not to use it or alternatives like list_my_sales for the seller side.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_my_salesList orders where I am the sellerA

Read-only

Inspect

Return every marketplace order for sites the user sold (incl. pending, paid, refunded, failed, disputed). Most recent first. Use to surface revenue + which orders are still inside the 30-day refund window (refundExpiresAt > now).

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	max orders to return (default 100, max 500)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, and description aligns with read-only behavior. It adds details beyond annotations: orders returned regardless of status, sorted most recent first, and the refundExpiresAt field for refund window checks.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. Front-loaded with core purpose and scope, then usage guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter, readOnly annotation, and presence of an output schema (not shown but known to exist), the description fully covers purpose, ordering, status filters, and practical use case (refund window).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers limit parameter with 100% description coverage (min, max, description). Description restates defaults but adds no new meaning beyond what's in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool returns marketplace orders where the user is the seller, including all statuses (pending, paid, refunded, failed, disputed). It distinguishes itself from sibling tools like 'list_my_orders' by focusing on seller orders.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for when to use: 'Use to surface revenue + which orders are still inside the 30-day refund window.' However, it does not explicitly state when not to use or mention alternative tools like 'list_my_orders'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_project_filesList files uploaded to a projectA

Read-only

Inspect

Return the customer-uploaded files for one project (path, size, contentType, createdAt). Empty when no drive folder exists yet (no uploads). Use to inspect what intake assets the customer attached.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	project id whose uploaded files to list

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true. Description adds that it returns a list of files with specific fields and that it returns empty when no drive folder exists. No contradictions. Provides additional context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. Front-loaded with core purpose and return fields, then edge case, then use case. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one parameter, readOnly annotation, and an output schema, the description covers purpose, return fields, edge case, and use case. Complete for the complexity level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and parameter description in schema is adequate. Description does not add extra meaning beyond the schema's 'project id whose uploaded files to list'. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states specific verb 'Return', resource 'customer-uploaded files for one project', and lists returned fields. Distinct from siblings like list_projects or drive_list_files by focusing on project-level uploaded files and mentions 'intake assets'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states use case: 'Use to inspect what intake assets the customer attached.' Also notes edge case: 'Empty when no drive folder exists yet.' Lacks explicit alternatives but context from siblings and description is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_projectsList my client-intake projectsA

Read-only

Inspect

List the dev's customer-intake projects (newest first). Optional filters: status (draft|intake_open|brief_ready|brief_failed|archived), q (case-insensitive match on label or customer email), limit (default 100). Use to triage what's in flight before opening a specific project.

ParametersJSON Schema

Name	Required	Description
`q`	No	case-insensitive match on label or customer email
`limit`	No	max projects to return (default 100)
`status`	No	filter by project status
`clientId`	No	only projects filed under this client

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint: true, so the description adds value by specifying ordering (newest first) and the scope (dev's projects). While the safety profile is already known, the description clarifies behavior beyond what annotations convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. It front-loads the core action and then provides filter details and usage guidance in the second sentence. Every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations covering safety, the description sufficiently explains what the tool does, its optional filters, and its typical use case. No critical information is missing for a list endpoint.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter already has a description. The tool description reiterates the optional filters and adds default value for limit, but this is minimal additional context. It groups parameters fluently, but does not add significant new meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists the dev's customer-intake projects, newest first, which is a specific verb and resource. It distinguishes from siblings like 'get_project' and other 'list_*' tools by explicitly mentioning 'customer-intake' and providing usage context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use to triage what's in flight before opening a specific project,' giving clear when-to-use context. It does not explicitly mention when not to use or list alternatives, but the guidance is sufficient and implies that this is for overview purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_secretsList secret names for a siteA

Read-only

Inspect

List secret names (values not returned) for a site's deployed function.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug whose secret names to list

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation indicates readOnlyHint=true, so the description's statement that values are not returned adds useful behavioral context. No contradictions; the description adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with key information, no wasted words. Highly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is sufficient for a simple list operation with a single parameter. Output schema exists, so return values are not required in the description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds minimal extra meaning beyond the schema's description of the 'slug' parameter. Baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists secret names (not values) for a site's deployed function. The verb 'list' and resource 'secret names' are specific and distinguish it from other secret-related tools like set_secret or remove_secret.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs. alternatives. The context of listing vs. setting/removing secrets is implied, but there is no direct statement of usage context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_sending_domainsList my sending domainsA

Read-only

Inspect

Return every BYO sending domain the user has added (id, domain, fromAddress, status: pending|verified|failed, DNS records). Use to inspect verification state or find the id of a domain to verify/remove.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, confirming it is a safe read operation. The description adds value by detailing the returned fields (status values, DNS records) beyond the annotation, but does not describe pagination or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the action and fields, no wasted words. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and presence of an output schema (implied by context), the description fully explains what the tool returns and its purpose. No gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are zero parameters, and schema description coverage is 100%. The description does not need to add parameter info, and it appropriately omits any.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns every BYO sending domain with specific fields (id, domain, fromAddress, status, DNS records). This distinguishes it from sibling tools like add_sending_domain, verify_sending_domain, and remove_sending_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use to inspect verification state or find the id of a domain to verify/remove,' providing clear usage guidance. It does not include when-not or alternative tools, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_site_inboxList a site's email inbox threadsB

Read-only

Inspect

Read the email threads (received, sent, web captures) for the agent's sites. Optionally scoped to one site by slug.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	No	limit to one site by slug; omit for all sites

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, confirming it is a read operation. The description adds 'Read the email threads' which aligns but offers no further behavioral details (e.g., error states, pagination, or rate limits).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core action and scope. Every word adds value; no redundancy or unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read tool with one optional parameter and an output schema, the description covers the essential scope (agent's sites, thread types). It could benefit from mentioning ordering or result limits, but is otherwise sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the parameter 'slug' is already documented. The description mentions 'optionally scoped to one site by slug,' which reinforces but does not significantly expand beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it reads email threads for the agent's sites and can be scoped by slug. The title specifies 'site's email inbox threads,' effectively distinguishing it from the sibling 'list_inbox' tool, though it does not explicitly contrast them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like 'list_inbox'. It does not specify prerequisites or exclusions, leaving the agent to infer usage without explicit direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_sitesList my sitesA

Read-only

Inspect

List the sites owned by this API key. Optional clientId filters to one client.

ParametersJSON Schema

Name	Required	Description	Default
`clientId`	No	only sites filed under this client

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adds minimal extra behavioral context. It notes ownership and filtering scope but omits details like pagination or limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences front-load the core purpose and optional filter. Every word earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple listing operation and presence of an output schema, the description adequately covers ownership and optional filter. Minor omission: no mention of pagination or result structure, but output schema likely fills that gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description for clientId is similar to the description (both mention filtering). Since schema coverage is 100%, the description adds no new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool lists sites and specifies ownership context ('owned by this API key'), distinguishing it from other list tools like list_custom_domains or list_site_inbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives such as get_site for details or other list tools. The description only states what it does without exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_site_variablesList variables attached to a siteA

Read-only

Inspect

Names of the variables attached to an owned site's Worker env (values never shown here).

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	owned site slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, and the description reinforces that values are never shown, adding context about Worker env. This is good but doesn't go beyond what annotations hint at.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, 13 words, perfectly concise. No wasted words. Front-loaded with the verb and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity (1 parameter, simple list operation), and presence of output schema and annotations, the description is complete. It covers the essential purpose and limitation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter 'slug' is described in the schema as 'owned site slug'. The description adds no additional meaning beyond that. With 100% schema coverage, score is at baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists variable names for a site's Worker environment, with the explicit note that values are never shown. This distinguishes it from sibling tools like list_variables or attach_variable. It uses a specific verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. While the purpose is clear, there is no mention of when not to use or when to use sibling tools like list_variables or set_variable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_suppressionsList suppressed emails / domainsA

Read-only

Inspect

Return the user's suppression list — addresses (and full domains) that shiply skips when sending. Bounces, complaints, manual user adds, and AI-detected unsubscribes all land here.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. The description adds value by specifying that the list includes addresses and full domains, and categorizes the types of suppressions (bounces, complaints, manual adds, AI-detected unsubscribes). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, both highly informative. No filler words. The first sentence states the core function; the second adds valuable detail about the sources of suppressions. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and the presence of an output schema, the description sufficiently covers what the tool does and what kind of data it returns. It explains the purpose and scope of the suppression list completely for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero parameters, so schema coverage is 100%. The description does not need to explain parameters. It adds context about the content of the returned list, which is appropriate for a parameterless tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Return' and resource 'suppression list', clearly stating it lists emails and domains that are skipped. It enumerates the sources of suppressions (bounces, complaints, etc.), distinguishing it from siblings like add_suppression and remove_suppression.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use it (to view suppressed addresses) but does not explicitly exclude alternatives or mention when not to use it. Given sibling tools for adding/removing suppressions, context makes usage clear, but lacks explicit guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_test_inbox_addressesList per-test inbox addressesA

Read-only

Inspect

Return every active demand test's reply / inbound address (@). Use this when you need to TELL someone where to email — e.g. drafting a reply or sharing a test's contact address. Mail sent to these aliases lands in /dashboard/inbox tied to the test.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true. The description adds context that mail sent to these aliases lands in /dashboard/inbox tied to the test, which is useful beyond the annotation. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that are direct and efficient. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no parameters and an output schema present, the description fully covers purpose, usage, and behavioral context for this simple list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters (0 params), so baseline is 4. The description doesn't need to add param info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns addresses for active demand tests, specifying the format (<slug>@<sitesDomain>). It distinguishes from sibling tools like list_tests and list_inbox by focusing on providing contact addresses, not listing tests or inbox contents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly says when to use: 'when you need to TELL someone where to email' with examples (drafting a reply or sharing a test's contact address). While it doesn't mention when not to use or alternatives, the guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_testsList demand testsA

Read-only

Inspect

List your demand tests with signups + confirmed counts.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating safe read operation. Description adds no further behavioral context (e.g., authentication requirements, rate limits).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words. Clearly conveyed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so return format is covered. Description is sufficient for a simple list tool with zero parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters defined; schema coverage is 100%. Description adds semantic context by indicating the list is scoped to 'your' tests (i.e., authenticated user).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description specifies the resource (demand tests) and what it returns (signups + confirmed counts), making the purpose clear and distinct from sibling tools like list_projects or list_domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. Usage is implied as listing tests, but no exclusions or alternatives are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_unsubscribesList unsubscribesA

Read-only

Inspect

Shortcut for list_inbox with filter=unsubscribes — shows every thread tagged as an opt-out request.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	max threads to return, ≤200

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, and the description adds that it shows threads tagged as opt-out requests. This confirms read-only behavior but doesn't add significant depth beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys the purpose clearly with no wasted words. It is appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, output schema present), the description is complete. It explains the filter and relationship to list_inbox, meeting all needs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (limit) with full schema documentation (min/max, type). The description does not add any additional meaning beyond what the schema provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a shortcut for list_inbox with a specific filter, showing threads tagged as opt-out requests. This distinguishes it from the similar sibling tool list_inbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly positions it as a shortcut for list_inbox with a specific filter, providing clear context for when to use it. While it doesn't explicitly state when not to use it, the sibling comparison implies alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_variablesList variablesA

Read-only

Inspect

List the encrypted variables. Values are masked unless reveal=true.

ParametersJSON Schema

Name	Required	Description	Default
`reveal`	No	return plaintext values instead of masked

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the readOnlyHint annotation by explaining that values are masked by default and can be revealed with the reveal parameter. This clarifies the tool's behavior without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is exceptionally concise with two short sentences that convey the essential information. It is front-loaded and contains no unnecessary wording.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only list tool with one optional boolean parameter and an output schema, the description is largely complete. It explains the core functionality and the masking behavior, though it could mention that it lists all variables rather than a subset.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the parameter description in the schema already explains the reveal parameter. The description adds no new semantic information beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list' and the resource 'encrypted variables', specifying both the action and the object. It distinguishes from sibling tools like list_secrets and list_site_variables by emphasizing encryption.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., list_secrets, list_site_variables). The description lacks explicit context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_versionsList a site's deploysA

Idempotent

Inspect

List a site's finalized deploys newest-first (id, createdAt, isLive, fileCount, bytes), capped at 20. Pair with rollback_site: pick a version id from here and re-point the site to it.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug to list deploys for

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds specificity beyond annotations: says deploys are 'finalized', sorted newest-first, and capped at 20. Annotations already declare idempotentHint, so the description provides additional behavioral detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, zero waste. First sentence packs action, resource, ordering, fields, and limit. Second sentence links to a sibling. Information-dense and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity (1 param, output schema present), the description covers purpose, behavior, limitation, and usage context. No gaps remain for an agent to select and invoke this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage for the single parameter 'slug' with its own description. The tool description doesn't add new parameter-level info beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'List a site's finalized deploys' with ordering (newest-first) and a 20-item cap. Differentiates from sibling list tools by explicitly pairing with rollback_site.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to 'Pair with rollback_site: pick a version id from here and re-point the site to it.' Provides a clear use case and direct tie to a sibling tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mark_thread_readMark a thread readA

Idempotent

Inspect

Zero the unread counter for a thread. Useful after the agent has read but not acted. read_thread already calls this implicitly; use this explicitly when you want to clear unread without re-fetching the thread body.

ParametersJSON Schema

Name	Required	Description	Default
`threadId`	Yes	thread id from list_inbox to mark read

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains that this tool clears the unread counter without re-fetching the thread body, which adds behavioral context beyond the idempotentHint annotation. It doesn't contradict any annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, with no redundant information. It is front-loaded with the core action and efficiently distinguishes usage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter, the description is complete. It covers the action, usage guidance, and relationship to a sibling tool. The presence of an output schema (as indicated in context signals) does not require further explanation of return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the parameter 'threadId' has a clear description. The description does not add additional meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb-resource pair ('Zero the unread counter for a thread') and explicitly distinguishes from the sibling tool 'read_thread', which already performs this action implicitly. This makes the purpose very clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states when to use this tool ('after the agent has read but not acted') and when not to (use 'read_thread' instead if you want to also fetch thread body). It provides explicit alternatives and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

promote_sitePromote a preview to a production siteA

Idempotent

Inspect

Copy the EXACT live bytes of one owned site (srcSlug — your preview) into another owned site (destSlug — your production site / custom domain), no rebuild. Dest keeps its slug, domains, and access settings; only the served bytes change.

ParametersJSON Schema

Name	Required	Description	Default
`srcSlug`	Yes
`destSlug`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint. Description adds behavioral detail: 'no rebuild', 'dest keeps its slug, domains, and access settings; only the served bytes change.' This goes beyond schema and annotations, though auth needs or rate limits are omitted.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is two sentences, front-loaded with the key action and details, with zero waste. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, no nested objects, presence of output schema), the description covers all essential aspects: what is copied, what remains unchanged, and the no-rebuild property. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, but description clarifies srcSlug as 'your preview' and destSlug as 'your production site / custom domain', adding semantic meaning to the otherwise bare 'string' type.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (copy exact live bytes), the resources (owned sites: srcSlug preview and destSlug production), and the outcome (no rebuild). It distinguishes from siblings like publish_site by emphasizing exact byte copy without rebuild.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for promoting a preview to production without rebuild. It does not explicitly state when not to use or mention alternatives (e.g., publish_site), but the context of 'preview to production' is clear enough for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

publish_from_drivePublish a drive as a siteA

Idempotent

Inspect

Snapshot a Drive (or a prefix of it) into a new live site at .shiply.now. Files copied server-side.

ParametersJSON Schema

Name	Required	Description
`title`	No	display title for the new site
`prefix`	No	only snapshot files under this path prefix
`driveId`	Yes	drive id (drv_…) or "default"

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the idempotentHint annotation, notably that files are 'copied server-side'. This provides useful insight into the operation's nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences, with no wasted words. It front-loads the key action and constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and moderate parameter complexity, the description sufficiently covers the tool's main action. However, it omits details about the return value or the meaning of <slug>, which are left to the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add extra meaning to the parameters beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool snapshot a Drive to create a live site, using a specific verb and resource. However, it does not differentiate itself from sibling tools like publish_site or promote_site, which may have similar purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives such as publish_site or pull_site. The description lacks context for appropriate usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

publish_sitePublish a siteA

Idempotent

Inspect

Publish files to the web → live URL at .shiply.now. UPDATING: never create a new site for changes — re-call with claimToken (anonymous sites) or slug (sites you own with a Bearer key) and the SAME URL gets the new version. Unchanged files are hash-skipped server-side, so re-publishing (including retrying a failed publish) is cheap — always update the same site rather than creating a new one. Works WITHOUT auth (anonymous: 24h lifetime, returns claimToken/claimUrl — SAVE THEM). With a Bearer shp_ key sites are permanent. ≤50 files / 2 MB inline; bigger: REST flow per https://shiply.now/llms.txt. index.html serves at /. spaMode for client-side routing.

ParametersJSON Schema

Name	Required	Description
`slug`	No	UPDATE an existing site you own (requires Bearer key)
`files`	Yes	site files; index.html required for a homepage
`title`	No	display title for the site
`client`	No	optional: file this under a client (the publish/site is grouped in their customer view)
`spaMode`	No	serve index.html for unknown paths (client-side routing)
`claimToken`	No	UPDATE an existing anonymous site (from the original publish result)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare idempotentHint=true, and the description reinforces this by explaining that re-publishing is cheap due to hash-skipping. It also discloses lifetime (24h anonymous, permanent with Bearer), and features like index.html serving at / and spaMode. However, it does not mention any destructive side effects (though update overwrites files) or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with core purpose, then organizes key behaviors in logical order: update behavior, auth modes, size limits, routing mode. Every sentence adds value without repetition, making it dense yet clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 params, nested objects, output schema exists), the description covers all essential aspects: URL format, update semantics, auth requirements, file limits, and special routing. It references external documentation for large files, ensuring completeness without overloading.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant context: slug requires Bearer key for update, claimToken comes from prior anonymous publish, files list index.html required for homepage, spaMode description clarifies routing. This substantially aids correct parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool publishes files to a live URL at <slug>.shiply.now and explains both creation and update scenarios. It distinguishes between anonymous and authenticated publishing, and though it doesn't explicitly compare to siblings, the verb 'publish' and mention of 'never create a new site for changes' differentiates it from tools like delete_site or duplicate_site.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use guidance: publish files to a live URL, update an existing site by re-calling with claimToken or slug. Also includes when-not-to-use: 'never create a new site for changes', and size limits: ≤50 files / 2 MB inline, bigger use REST flow. Auth modes are clearly explained: anonymous vs Bearer key.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pull_sitePull a site's current filesA

Idempotent

Inspect

Download the current files of a site you own (or created via a platform connection) so you can edit and republish to the same slug with publish_site. Static sites return editable source; framework/SSR sites return the built bundle (.shiply/bundle/*), not original source.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug to pull files for

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds detail beyond the idempotentHint annotation by specifying the return format (editable source vs. built bundle) and noting the tool works for sites owned or created via platform connections. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with the action, and every part adds value. There is no redundancy or unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, output schema exists), the description covers the main purpose and distinguishes behavior by site type. It could mention error conditions or file structure specifics, but it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has one parameter 'slug' with 100% schema coverage. The description adds context that the site must be owned or created via a platform connection, which is not in the schema description, thus adding meaningful guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Download') and resource ('a site's current files'), clearly stating the action. It also distinguishes behavior between static sites (editable source) and framework/SSR sites (built bundle), differentiating from any sibling tools that involve site management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for when to use: after pulling, you can edit and republish with publish_site. It implicitly guides usage but does not explicitly state when not to use or list alternatives. Given no direct sibling competitor, this is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_threadRead an inbox threadA

Read-only

Inspect

Return the thread metadata + all messages in chronological order. Use list_inbox first to get a threadId.

ParametersJSON Schema

Name	Required	Description	Default
`threadId`	Yes	thread id from list_inbox

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotation readOnlyHint=true already signals safe read. Description adds that it returns metadata and messages in chronological order, which aligns with read-only behavior. No contradictions; missing details like pagination or rate limits, but acceptable given annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states what the tool returns, second provides usage guidance. No extraneous words; every sentence is purposeful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having an output schema (not shown), description covers the key return items (metadata and messages in order). With a single required parameter and clear prerequisite, this is fully sufficient for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with threadId description 'thread id from list_inbox'. Description repeats this same phrase, adding no new semantic value. Baseline 3 is appropriate since schema already documents the parameter well.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Return the thread metadata + all messages in chronological order', which is a specific verb+resource. It distinguishes from siblings like list_inbox (which retrieves thread IDs) and other thread actions (archive, mark_read, etc.).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to 'Use list_inbox first to get a threadId', providing clear prerequisite context. Does not include when-not-to-use or alternatives, but for a read tool with a single input, this is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

refund_orderRefund one of my salesA

Destructive

Inspect

Issue a full refund on a paid order the user sold. Must be inside the 30-day refund window (server enforces). Triggers a Stripe refund; the webhook flips the order to 'refunded' and reverts site ownership to the seller. Idempotent on already-refunded orders.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No	optional refund reason
`orderId`	Yes	id of the paid order to refund (from list_my_sales)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the 'destructiveHint' annotation, the description discloses triggering a Stripe refund, webhook flipping order status, reverting site ownership, and idempotency on already-refunded orders. These details significantly enhance understanding of side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three front-loaded sentences with zero redundancy. Each sentence adds unique value: action, condition, side effects, and idempotency. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers essential aspects: full refund, window, side effects, idempotency. With output schema present, return details are unnecessary. Lacks explicit mention of synchronicity or error cases, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description does not add new parameter-specific information beyond what the schema provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it issues a full refund on a paid order the user sold, specifying the resource and action distinctly. It stands out among siblings as the only refund tool, providing no ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the 30-day refund window and idempotency, giving context for when it's valid. However, it does not explicitly state when to use versus alternatives or provide exclusions, though no direct competitors exist.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

regenerate_briefRe-run AI brief generationAInspect

Re-run the MiniMax/Anthropic brief generator from the project's current intake_responses and persist the result. Flips status to brief_ready on success or brief_failed on error. Use after the customer edits answers post-submit, or when the first AI attempt failed.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	project id to re-run the brief for

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses side effects: flips status to brief_ready or brief_failed. Annotations already indicate non-idempotent and open-world; description adds specific behavioral context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences: first states action and effect, second gives usage guidance. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and known side effects, the description covers the main purpose and usage. With an expected output schema (not shown), return values are not required. Could mention input constraints like required permissions, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'id' is already well-described in the input schema (100% coverage). The description adds no further semantic meaning beyond what the schema provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool regenerates an AI brief from intake_responses and persists the result. Distinguishes from sibling tools like update_brief and create_project by specifying the AI-driven regeneration from existing responses.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use: after customer edits answers or when first AI attempt failed. Provides clear context for usage, though it does not explicitly mention alternatives from the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_cronRemove a cron triggerA

Destructive

Inspect

Remove a cron trigger from a site's deployed function.

ParametersJSON Schema

Name	Required	Description	Default
`path`	Yes	URL path of the cron to remove
`slug`	Yes	site slug whose cron to remove

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare destructiveHint=true, which the description does not contradict. The description adds that removal is from a 'site's deployed function,' but does not disclose any other behavioral traits such as reversibility or permission requirements. With annotations carrying the safety signal, the description provides minimal added transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no fluff, front-loaded with verb and resource. Perfectly concise for its purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is adequate for a simple removal tool, but it lacks mention of prerequisites (e.g., existence of cron), side effects, or success/error behavior. Since an output schema exists, return values are covered, but operational context is thin.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% description coverage for both parameters, so the description offers no additional meaning beyond what is already in the schema. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (remove), the resource (cron trigger), and the context (from a site's deployed function). It is specific and distinguishes from sibling tools like set_cron and list_crons.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like set_cron or list_crons. No prerequisites or conditions are mentioned, leaving the agent without decision support.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_custom_domainRemove a custom domainA

Destructive

Inspect

Remove a registered custom domain and all its subdomains; they stop serving immediately.

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	registered custom domain to remove, e.g. example.com

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds 'they stop serving immediately' beyond the destructiveHint annotation, but does not disclose other behavioral traits such as irreversibility, required permissions, or any side effects. The annotation already marks it as destructive, so the added value is moderate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence that front-loads the action and consequence. Every word contributes to understanding, with no redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 parameter, annotations for destructiveness, output schema present), the description covers the key points: removal and immediate effect. It could mention prerequisites (domain must be registered) or confirm the scope, but it is sufficient for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already covers the 'domain' parameter fully (100% coverage). However, the description adds the semantic detail that the removal includes 'all its subdomains', which is not specified in the schema, providing additional context beyond the parameter definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Remove') and the resource ('a registered custom domain and all its subdomains'), with a specific consequence ('they stop serving immediately'). It distinguishes from siblings like 'remove_domain' and 'check_custom_domain' by specifying 'custom domain' and including subdomains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives (e.g., 'remove_domain', 'add_custom_domain'). The usage is implied by the name and description, but there are no when-to-use or when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_domainDisconnect a custom domainA

Destructive

Inspect

Remove a connected domain (by id). It stops serving immediately.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	connected domain id from list_domains

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the destructiveHint annotation by stating that the domain 'stops serving immediately,' which conveys urgency and immediate impact. This is a useful disclosure not present in structured fields. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two short sentences with no filler. The key information (action, identification method, immediate effect) is front-loaded. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter removal tool with an output schema, the description is mostly complete. It states the effect and the required identifier. However, it could be improved by including usage context (e.g., 'Use after listing domains with list_custom_domains') or noting prerequisites. The absence of sibling differentiation slightly reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers the single parameter 'id' with a description ('connected domain id from list_domains'). The tool description adds no further meaning beyond what the schema already provides. Given 100% schema coverage, baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Remove'), the resource ('a connected domain'), and the method ('by id'). It also adds immediate consequence ('stops serving immediately'). This distinguishes it from sibling tools like remove_sending_domain or remove_custom_domain, which operate on different domain types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide guidance on when to use this tool versus alternatives (e.g., remove_custom_domain, remove_sending_domain). It lacks explicit context for when a connected domain should be removed compared to other domain removal tools. The agent is left to infer based on the 'connected domain' phrasing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_functionRemove the deployed function from a siteA

Destructive

Inspect

Remove the deployed Worker function from a site (and all its routes, secrets, and cron triggers). Site falls back to static-only serving. Irreversible.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug whose function to remove

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond the destructiveHint annotation by detailing exactly what is destroyed (routes, secrets, cron triggers) and the consequence (site falls back to static-only). This provides comprehensive behavioral transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two sentences that front-load the main action and then list consequences. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 parameter, output schema present, strong annotations), the description is complete. It covers the action, scope, and outcomes, leaving no ambiguity for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'slug' is fully described in the schema (100% coverage). The description does not add additional meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool removes a deployed Worker function from a site, listing specific components that are removed (routes, secrets, cron triggers) and the resulting fallback to static-only serving. This distinguishes it from sibling tools like deploy_function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool (to remove a function) and mentions irreversibility. However, it does not explicitly mention when not to use it or suggest alternatives like deploy_function for adding a function.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_secretRemove a secret from a siteA

Destructive

Inspect

Remove a secret from a site's deployed function. The binding disappears on next request.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	secret name to remove
`slug`	Yes	site slug whose function holds the secret

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include destructiveHint=true, so agent knows it's destructive. The description adds useful timing context (binding disappears on next request), going beyond annotations. No further behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, front-loaded with action and effect. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, return values are covered. The description is brief but adequate for a simple tool. Minor gap: no mention of handling missing secrets.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to add parameter information. It adds no additional meaning beyond what the schema already describes.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (remove), the resource (secret from a site's deployed function), and the effect (binding disappears on next request). It distinguishes from sibling tools like 'set_secret' and 'list_secrets'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use is provided. The effect 'disappears on next request' implies a delay but no mention of alternatives or contexts where this tool is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_sending_domainRemove a sending domainA

Destructive

Inspect

Delete a BYO sending domain. Any demand tests bound to it fall back to the managed shiply sender. Also GCs the underlying Resend domain. Irreversible — re-adding requires re-verifying DNS.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	sending domain id from list_sending_domains

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the destructiveHint annotation, the description adds that the operation is irreversible and that re-adding requires re-verifying DNS, providing useful behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: action, consequences, and irreversibility. No fluff, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a delete operation with one parameter and existing output schema, the description covers what is deleted, effects on dependents, and irreversibility—fully sufficient for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not add meaning to the 'id' parameter beyond what is already in the schema (which has 100% coverage). A score of 3 is appropriate as the schema handles parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Delete a BYO sending domain' with a clear verb and resource, differentiating it from sibling tools like add_sending_domain or check_sending_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explains consequences (fallback to managed sender, GC of underlying Resend domain) and notes irreversibility, but does not explicitly state when not to use or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_suppressionRemove from suppression listA

Destructive

Inspect

Delete one suppression by id. Use list_suppressions first to find the id.

ParametersJSON Schema

Name	Required	Description	Default
`suppressionId`	Yes	suppression id from list_suppressions

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide destructiveHint=true, so the description's 'Delete' is consistent. No additional behavioral details beyond what annotations cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. The action and prerequisite are front-loaded and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple deletion with one parameter and output schema present, the description is sufficient. Could mention return value but output schema handles that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and includes parameter description. The description repeats the need to list first but adds no new semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete one suppression by id', specifying the verb and resource. It distinguishes from siblings like add_suppression and list_suppressions by focusing on deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It advises 'Use list_suppressions first to find the id', providing clear prerequisite context. No explicit exclusions or alternatives, but the sibling tools offer the counterpart.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reply_to_threadReply to an inbox threadAInspect

Send an email reply on an existing thread. Goes FROM the original recipient address (the @shiply.now alias the sender used) and TO the original sender. Threaded via RFC 5322 In-Reply-To so Gmail/Outlook group it with the original. Subject defaults to 'Re: ' when omitted. Cap at 20,000 chars. Use after read_thread to make sure you're replying to the right conversation.

ParametersJSON Schema

Name	Required	Description
`body`	Yes	reply body, ≤20,000 chars
`subject`	No	subject; defaults to 'Re: <original>'
`threadId`	Yes	thread id from list_inbox to reply on

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behaviors: FROM/TO direction, threading via RFC 5322, default subject, and character limit. Annotations only indicate openWorldHint=true and idempotentHint=false, so the description adds significant context. It does not mention error handling or rate limits, but covers the core behavior well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, each adding essential information: main action, FROM/TO behavior, threading, default subject, character limit, and usage hint. No wasted words, front-loaded with the primary purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 3 parameters, 100% schema coverage, and presence of an output schema, the description covers all needed aspects. It explains the email flow, threading mechanism, default behavior, constraints, and a workflow hint. Nothing essential is missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value by explaining the subject default ('Re: <original>'), the body character limit (already in schema but reinforced), and that threadId should come from list_inbox. This goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Send an email reply on an existing thread' and specifies the FROM/TO behavior and threading. It distinguishes the tool's purpose from siblings like forward_thread or send_email by focusing on replying, but does not explicitly contrast with alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises 'Use after read_thread to make sure you're replying to the right conversation,' providing clear context for when to use the tool. It implies the threadId comes from list_inbox, but does not explicitly state when not to use it or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resend_confirmationResend a confirmation emailAInspect

Re-send the double-opt-in confirmation to a signup that has not confirmed yet.

ParametersJSON Schema

Name	Required	Description	Default
`email`	Yes	the unconfirmed signup email to re-send to
`testId`	Yes	demand test id the signup belongs to

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-idempotent and open-world behavior. The description adds the condition (unconfirmed signup) but does not disclose further traits like potential errors, rate limits, or consequences of re-sending. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no unnecessary words. Every part earns its place: verb, resource, condition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only two parameters, an output schema, and annotations, the description sufficiently covers the core purpose and condition. It could mention return value or errors but is complete for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and each parameter is well-described in the schema. The tool description does not add additional meaning or usage context beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (re-send), the resource (double-opt-in confirmation email), and the condition (to unconfirmed signup). It effectively distinguishes itself from siblings like 'send_email' and 'resend_intake_invite' through specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for unconfirmed signups but does not explicitly state when to use this tool versus alternatives, nor does it provide any when-not-to-use guidance. Context is present but not fully elaborated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resend_intake_inviteRe-send the customer intake invite emailAInspect

Re-fire the 'your developer sent you a project intake' email to the project's customer. Throws invalid_request if the project has no customerEmail (the dev needs to set one via the dashboard first — customer email isn't agent-patchable). Use when the customer says they didn't receive the link.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	project id to re-send the intake invite for

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint=false and openWorldHint=true. Description adds context: error on missing customerEmail and that customer email isn't agent-patchable. No contradiction; adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: action description and usage guideline. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter tool with output schema, the description is sufficient. Covers main behavior and error scenario. Not missing critical information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear parameter description. The tool description does not add additional meaning beyond the schema, meeting the baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: re-fire the intake invite email to the project's customer. It specifies the error condition (no customerEmail). However, it does not explicitly differentiate from the sibling 'resend_confirmation', which could cause ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guideline: 'Use when the customer says they didn't receive the link.' Also explains precondition (customerEmail must be set via dashboard). Lacks explicit when-not-to-use but is clear for a simple tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

restore_projectRestore an archived projectA

Idempotent

Inspect

Move an archived project back to status='draft' so it reappears in the active list. Idempotent on non-archived projects? No — server rejects the transition unless the project is currently archived.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	archived project id to restore

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation provides `idempotentHint: true`, and the description adds nuance by noting it is not idempotent on non-archived projects (server rejects). It also states the status change to 'draft', enhancing transparency without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load the action and directly address idempotency. No wasted words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and an existing output schema, the description covers purpose, usage conditions, idempotency nuance, and error behavior. It is fully contextualized.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter `id` has 100% schema coverage with description 'archived project id to restore'. The tool description does not add extra meaning beyond the schema, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool moves an archived project back to status='draft', reappearing in the active list. The verb 'restore' and resource 'project' are specific, and it distinguishes itself from the sibling `archive_project`.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use (when project is archived) and when not to use (non-archived projects are rejected). It also clarifies idempotent behavior, providing clear guidance on usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rollback_siteRoll back a siteA

Idempotent

Inspect

Re-point a site to any finalized version (rollback or roll-forward). Get version ids from get_site. Serving updates immediately.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	site slug to re-point
`versionId`	Yes	finalized version id from get_site

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds that updates are immediate and that the tool can roll forward, beyond the idempotentHint annotation. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with the main action upfront. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is complete for this tool given the existing schema, output schema, and annotations. It covers input, source of version IDs, and effect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds meaningful context for versionId by specifying it comes from get_site. The slug parameter's description in schema is adequate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'Re-point a site to any finalized version (rollback or roll-forward)', which is specific and distinguishes it from sibling tools like promote_site or feature_site.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises to 'Get version ids from get_site', guiding the user on prerequisite actions. However, it does not explicitly state when not to use this tool or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_broadcastBroadcast to confirmed subscribersAInspect

Send a campaign to this test's confirmed (double-opt-in) subscribers — the "we're live" email. An unsubscribe link is added automatically. Fails if there are no confirmed subscribers yet.

ParametersJSON Schema

Name	Required	Description
`html`	Yes	email HTML body (unsubscribe link added automatically)
`text`	No	plain-text fallback body
`testId`	Yes	demand test id whose confirmed subscribers to email
`subject`	Yes	email subject line

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate openWorldHint (side effects) and non-idempotent, but the description adds that an unsubscribe link is automatically added and it fails if there are no confirmed subscribers. These are beyond what annotations provide, though it could mention rate limiting or confirmation delivery.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first defines purpose and recipient, second adds two critical behavioral details. No unnecessary words or repetition. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given an output schema exists, the description covers core behavior (send to confirmed subscribers, auto-unsubscribe, failure condition). It could mention that it sends to all confirmed subscribers of the test, but overall it's complete for a campaign tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description adds value by linking parameters to the campaign context (e.g., html body with auto-unsubscribe). It reinforces that subject and html are for the email body and that testId identifies the test, but doesn't add syntax beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it sends a campaign to confirmed (double-opt-in) subscribers of a test, using the phrase 'we're live' email. This distinguishes it from sibling tools like send_email and send_mailbox_broadcast by specifying the recipient audience and automatic unsubscribe addition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context that the tool is for sending to confirmed subscribers and fails if none exist, but it does not explicitly compare alternatives or state when not to use it. No mention of alternatives or when-to-use vs. other send tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_emailSend an email from a siteAInspect

Send an email from .shiply.now's managed sender. The site can send transactional/notification email — no SMTP setup. Rate-limited and spam-checked; replies route back to the site inbox.

ParametersJSON Schema

Name	Required	Description
`to`	Yes	recipient email address
`html`	Yes	email HTML body
`slug`	Yes	site slug to send from (<slug>.shiply.now sender)
`text`	No	plain-text fallback body
`subject`	Yes	email subject line

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral details beyond annotations: rate-limited and spam-checked, plus reply routing. Annotations only provide openWorldHint and idempotentHint. The description adds operational constraints and side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two clear, front-loaded sentences. First states core action, second adds key context. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 params, 4 required), the description covers purpose, usage context, and behavioral traits adequately. Output schema exists, so return value explanation is not required.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description adds marginal value by connecting slug to the sender domain and mentioning reply behavior, but doesn't enhance parameter understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (Send) and resource (email from a site's managed sender), specifying transactional/notification email. It differentiates from sibling tools like send_broadcast by emphasizing transactional nature.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context: no SMTP setup needed, rate-limited, spam-checked, and replies route to site inbox. While it doesn't explicitly state when not to use, it implies suitability for individual transactional emails, distinguishing from broadcast siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_mailbox_broadcastBroadcast to a mailbox's confirmed audienceAInspect

Send a one-shot broadcast to the confirmed (double-opt-in) subscribers of a (site, collection) mailbox. Spam-checked; unsubscribe footer auto-added. Fails if no confirmed subscribers exist yet.

ParametersJSON Schema

Name	Required	Description
`html`	Yes	email HTML body (unsubscribe footer added automatically)
`slug`	Yes	site slug the mailbox belongs to
`text`	No	plain-text fallback body
`subject`	Yes	email subject line
`collection`	Yes	mailbox collection whose confirmed audience to email

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (openWorldHint, idempotentHint), the description reveals that the broadcast is 'one-shot', spam-checked, automatically adds an unsubscribe footer, and fails if no subscribers exist. This adds valuable behavioral context about safety and preconditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: the first covers purpose and target, the second covers key constraints and traits. No unnecessary words, and critical information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers core functionality, failure condition, and important traits (spam check, auto footer). With a full input schema and an output schema present, the description is sufficiently complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description reiterates the role of 'slug' and 'collection' as defining the mailbox but adds no new parameter meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Send' and the specific resource: 'broadcast to the confirmed (double-opt-in) subscribers of a (site, collection) mailbox.' It distinguishes itself from generic broadcast tools by specifying the audience type and constraints.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that it fails if no confirmed subscribers exist, but does not provide guidance on when to use this tool versus alternatives like 'send_broadcast' or 'send_email'. No explicit context for choosing this tool is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_cronSet or update a cron triggerA

Idempotent

Inspect

Set or update a cron trigger on a site's deployed function. Schedule is crontab syntax (UTC). Path is the URL the cron handler should fire on (for the worker's scheduled() handler context).

ParametersJSON Schema

Name	Required	Description
`path`	Yes	URL path the cron handler fires on
`slug`	Yes	site slug whose function gets the cron
`schedule`	Yes	crontab schedule in UTC, e.g. "0 * * * *"

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint=true, but description adds behavioral context about crontab syntax and path for the worker's scheduled handler. However, it omits details about update behavior, validation, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with front-loaded purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description does not need to explain return values. It provides sufficient context for the simple set/update operation, though more details on side effects could be added.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. Description adds extra context beyond schema, clarifying that the path is for the worker's scheduled() handler and that schedule uses UTC crontab syntax.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'Set or update' and resource 'cron trigger on a site's deployed function'. It specifies the schedule format and path purpose, distinguishing from sibling tools like list_crons and remove_cron.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implicitly indicates usage for creating/updating crons, but does not explicitly state when to use versus alternatives like remove_cron or list_crons. No when-not guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_handleSet a vanity handleA

Idempotent

Inspect

Give ONE site a vanity URL: rename it to .shiply.now (3-30 chars, a-z 0-9 -). The old address 301-redirects for 30 days. (For a portfolio page listing all your sites, use set_profile instead — that lives at shiply.now/@.)

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	current site slug
`handle`	Yes	new vanity handle, 3-30 chars a-z 0-9 -

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotentHint=true. The description adds the behavioral detail that the old address 301-redirects for 30 days, which is valuable beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the main action, no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool with 2 params, idempotent annotation, and presence of output schema, the description fully covers purpose, behavior, and alternatives without gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add new parameter-specific details beyond what the schema provides (slug and handle constraints are already in schema). The redirect behavior is context, not parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it gives a site a vanity URL, specifying the resulting domain pattern (<handle>.shiply.now) and distinguishing from the sibling tool set_profile for portfolio pages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Give ONE site' and provides an alternative (set_profile) for different use cases, making it clear when to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_linkMount a site at a pathA

Idempotent

Inspect

Path-mounting: serve an owned target site at a path on an owned host site (host/docs -> target). location is a path like "docs", or "root" for the host root. Both sites must be owned by you. Pass remove=true to unmount.

ParametersJSON Schema

Name	Required	Description
`remove`	No	unmount instead of mounting
`hostSlug`	Yes	slug of the owned host site
`location`	Yes	path to mount at, e.g. "docs", or "__root__" for the host root
`targetSlug`	No	slug of the owned site to serve there (required unless remove=true)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare idempotentHint=true, and the description adds behavior: ownership requirement, path format ('docs' or '__root__'), and the ability to unmount. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with the core concept. Every sentence adds necessary information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the primary use cases (mount/unmount) and prerequisites. An output schema exists, so return values need not be explained. It is complete for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with descriptions, but the description adds value by explaining the purpose of location (path examples) and the conditional requirement for targetSlug. This enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool's purpose: path-mounting a target site on a host site. It uses specific verbs ('serve', 'mount') and explains the resource relationship (host/docs -> target). It also distinguishes from siblings by its unique function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states the context for use ('Both sites must be owned by you') and explains the remove flag for unmounting. It does not explicitly list alternatives, but the context is clear enough for most use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_mailboxConfigure a site collection's email behaviorA

Idempotent

Inspect

Turn a Site Data collection into a mailbox: double opt-in, owner notifications, sending domain, branding. Call once per (site, collection) to configure how captured leads are handled.

ParametersJSON Schema

Name	Required	Description
`slug`	Yes	site slug the collection belongs to
`branding`	No	branding applied to mailbox emails
`notifyTo`	No	address to send owner notifications to
`collection`	Yes	Site Data collection to turn into a mailbox
`doubleOptIn`	No	require email confirmation before a contact is active
`notifyOwner`	No	email the owner on each new capture
`sendingDomainId`	No	BYO sending domain id to send from

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description mentions key behaviors (double opt-in, owner notifications, sending domain, branding) and the 'call once' guideline aligns with the idempotentHint annotation. It does not contradict annotations but adds limited depth beyond what annotations already provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. It front-loads capabilities and usage instruction efficiently. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 params, nested object, output schema), the description provides necessary context about what it does and when to call it. It is complete enough, though it could mention effects on existing configurations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with each parameter described. The description lists features that map to parameters but does not add new semantics or constraints beyond the schema. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool turns a Site Data collection into a mailbox and lists key capabilities (double opt-in, notifications, etc.). It is specific about the resource and action, but does not explicitly differentiate from sibling tools like send_mailbox_broadcast or list_mailbox_contacts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises calling once per (site, collection), giving a usage hint. However, it does not specify when not to use this tool or mention alternatives. The context is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_primary_subdomainPick the canonical URL for a custom domainA

Idempotent

Inspect

Mark a hostname as the primary (canonical) URL for its site. Sibling hostnames (apex + www both pointed at the same site) start 301-redirecting to it, preserving path + query. The host-side fix for the duplicate-content SEO problem. The first subdomain you add for a site is primary by default; call this only when you need to switch.

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	registered custom domain, e.g. example.com
`hostname`	Yes	full hostname to make primary, e.g. www.example.com

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description fully discloses behavioral effects beyond the idempotentHint annotation: it marks a hostname as primary and triggers 301 redirects from sibling hostnames, preserving path and query. It also mentions that the first subdomain is primary by default, adding context. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences: first sentence states the action, second explains the result, third gives usage context. Every sentence adds value, no redundancy, and it is front-loaded with the main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with good schema and annotations, the description provides all necessary context: what it does, when to use, side effects, and default behavior. It is complete enough for an agent to select and invoke correctly, even without seeing the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters (domain and hostname). The description does not add significant extra meaning beyond the schema, merely using the terms 'hostname' and 'domain' in context. Baseline score of 3 is appropriate as the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Mark a hostname as the primary (canonical) URL for its site.' It uses a specific verb-resource combination and explains the effect (sibling hostnames start 301-redirecting). It also distinguishes from sibling tools by contextualizing the SEO duplicate-content problem, setting it apart from add/check domain tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'call this only when you need to switch.' It also explains when not to use it (the first subdomain is primary by default), and gives context about the problem it solves (SEO duplicate content). This helps the agent decide when to invoke this tool over others.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_profileSet up a public profileA

Idempotent

Inspect

Create or update the user's public PORTFOLIO page at shiply.now/@ (handle 3-30 chars a-z0-9-) — a landing page listing the user's sites. This is NOT a site's address: to give one site a vanity URL like .shiply.now, use set_handle instead. enable shows the profile; autoAdd auto-lists new sites. Use after publishing to give the user a shareable portfolio.

ParametersJSON Schema

Name	Required	Description
`enable`	No	show (true) or hide (false) the public profile
`handle`	No	public portfolio handle, 3-30 chars a-z 0-9 -; portfolio lives at shiply.now/@<handle> (a page listing your sites — not a site URL; for a site vanity URL use set_handle)
`autoAdd`	No	auto-list newly published sites on the profile

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotent. Description explains the behavior of boolean parameters (enable shows/hides, autoAdd auto-lists new sites). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise and well-structured. Two sentences plus a usage hint; front-loaded with purpose. No superfluous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 100% schema coverage, no required params, and presence of output schema, the description covers purpose, usage, and parameter effects adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description adds context beyond schema, e.g., explaining the handle format and the portfolio URL, and the meaning of enable and autoAdd.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it creates or updates the user's public portfolio page at shiply.now/@<handle>, distinguishing it from set_handle which gives a site a vanity URL.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context: 'use after publishing to give the user a shareable portfolio'. Distinguishes from set_handle and explains the effects of enable and autoAdd.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_secretSet a Worker secret on a siteA

Idempotent

Inspect

Set a CF Worker secret on a site's deployed function. Value is encrypted-at-rest and accessible as env. inside the worker. Use for Stripe keys, Resend API keys, etc.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	secret name, UPPER_SNAKE_CASE; available as env.<NAME>
`slug`	Yes	site slug whose function gets the secret
`value`	Yes	secret value (encrypted at rest), ≤8 KiB

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (idempotentHint), the description discloses that the value is encrypted-at-rest and accessible as env.<NAME> inside the worker. This adds meaningful behavioral context about runtime behavior and security properties beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: the first states the core action; the second adds encryption info and usage examples. No unnecessary words. Efficiently front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 3-parameter tool with high schema coverage, the description covers purpose, behavior, and usage context. It could mention overwrite behavior, but given idempotentHint, this omission is minor. Output schema exists and need not be described.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

While schema coverage is 100%, the description enriches parameter understanding by explaining that the value becomes an environment variable (env.<NAME>) and gives usage examples. This adds value beyond the schema's technical constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool sets a Cloudflare Worker secret on a site's deployed function, with specific examples (Stripe keys, Resend API keys). It distinguishes from siblings like set_variable by emphasizing the encrypted-at-rest property and environment variable accessibility.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides examples of when to use the tool (for sensitive keys), but does not explicitly state when not to use it or mention alternatives like set_variable for non-secret settings. The guidance is implicit rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_site_accessSet site access controlA

Idempotent

Inspect

Protect an owned site (paid plans). mode 'public' (anyone), 'password' (supply password), or 'restricted' (supply allowedEmails and/or allowedDomains — only those can request a login code). Changing any setting signs out existing visitors.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	public = anyone; password = supply password; restricted = supply allowedEmails/allowedDomains
`slug`	Yes	owned site slug to protect
`password`	No	required when mode='password'
`allowedEmails`	No	allowlisted emails when mode='restricted'
`allowedDomains`	No	allowlisted email domains when mode='restricted'

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include idempotentHint=true. The description adds significant behavioral context: 'Changing any setting signs out existing visitors.' This is beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key information. Every sentence adds value: the first states purpose and constraints, the second explains modes and a side effect. No waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main behaviors (modes, sign-out) and constraints (owned site, paid plans). It lacks details on allowedEmails/allowedDomains interaction, but the schema covers those. For a 5-parameter tool, it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. The description adds minimal extra meaning beyond the schema, e.g., clarifying the mode options and the sign-out effect. Baseline is 3 due to high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Protect an owned site (paid plans).' It explains the three modes (public, password, restricted) and their effects. It differentiates from sibling tools like get_site_access (read-only) by focusing on setting access control.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context with 'Protect an owned site (paid plans).' It does not explicitly mention when not to use it or alternative tools, but the context is clear given no other tool sets site access. A minor gap is the lack of explicit prerequisites beyond ownership and plan.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_variableSave an encrypted variableA

Idempotent

Inspect

Upsert a key/value in the user's encrypted variable store (UPPER_SNAKE name, ≤8 KiB value). Use for API keys the user's sites/agents need, e.g. SUPABASE_URL. NOTE: saving alone does NOT expose it to any site Worker's env — attach it to a specific site with attach_variable (takes effect on that site's next function deploy).

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	variable name, UPPER_SNAKE_CASE, e.g. SUPABASE_URL
`value`	Yes	value to store (encrypted, ≤8 KiB)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint=true, indicating safe retries. Description adds behavioral context: encrypted storage, size limit, and that saving alone does not expose to environment. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the tool's purpose and key constraints, followed by a crucial note about exposure. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 required params, 100% schema coverage, good annotations, output schema exists), the description is fully adequate. It covers purpose, constraints, and usage context without gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for both parameters. Description reinforces the naming convention (UPPER_SNAKE_CASE) and size limit (≤8 KiB), adding an example (SUPABASE_URL) that goes beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool upserts a key/value in an encrypted variable store, specifies naming convention (UPPER_SNAKE_CASE) and size limit (≤8 KiB). It distinguishes from sibling attach_variable by noting that saving alone does not expose to site workers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly says when to use (for API keys like SUPABASE_URL) and when not (saving alone does not expose to env). It also mentions the alternative attach_variable to attach to a specific site, providing clear guidance on tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

site_statusSSL + readiness checkA

Read-only

Inspect

Check any shiply slug or custom hostname: TLS certificate (issuer, days left) + HTTPS probe. ready=true means live.

ParametersJSON Schema

Name	Required	Description	Default
`target`	Yes	slug (my-site) or hostname (www.example.com)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds value by specifying the output details (issuer, days left) and defining what ready=true means, which is beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (28 words) and front-loaded with the action and target. Every word is necessary and contributes to clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and an output schema available, the description adequately covers what the tool does and what the results mean. It is complete enough for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the description repeats the schema's explanation of the target parameter (slug or hostname) without adding significant new meaning. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it checks TLS certificate (issuer, days left) and HTTPS probe, with a clear meaning for ready=true. However, it does not explicitly distinguish itself from sibling tools like check_custom_domain or check_domain, which may perform similar checks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus sibling tools like check_custom_domain or check_domain. The description does not mention prerequisites, exclusions, or context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

summarize_threadAI-summarize an inbox threadA

Read-only

Inspect

One-paragraph summary of the thread, oriented to the most actionable signal (interest, complaint, question, unsubscribe).

ParametersJSON Schema

Name	Required	Description	Default
`threadId`	Yes	thread id from list_inbox

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds behavioral context by specifying the summary's focus on actionable signals, which is valuable information beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence of 18 words, front-loaded with 'One-paragraph summary', and contains no filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With a simple one-parameter input, annotations covering safety, and an output schema (not shown but present), the description sufficiently explains the tool's output and purpose without needing to detail return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter threadId is described in the schema as 'thread id from list_inbox' (100% coverage). The description adds no further semantic detail, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'summarize' and the resource 'inbox thread', and specifies the output is a one-paragraph summary oriented to actionable signals, distinguishing it from raw content tools like read_thread.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for getting a quick overview (one-paragraph, actionable signals) but does not explicitly state when not to use it or mention alternatives like read_thread for raw details.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sync_dnsSync DNS recordsA

Idempotent

Inspect

For a connected custom domain, (re)write the CNAME records for all its subdomains automatically and report what changed. For unconnected domains, returns the records to add manually.

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	registered custom domain to sync, e.g. example.com

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (idempotentHint, openWorldHint), the description reveals that syncing modifies DNS records automatically and reports changes, and that for unconnected domains it behaves as a read-only guide. This adds useful behavioral context without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences. The first covers the primary automated action, the second covers the alternative manual case. No extraneous words, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity, the description adequately explains both operational modes. With an output schema present, it does not need to detail return format. The two sentences cover the essential scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter 'domain' is sufficiently described in the schema. The description adds value by explaining how the domain's connection status affects behavior, providing additional semantic context beyond the basic schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool syncs DNS records for connected custom domains by automatically rewriting CNAME records and reporting changes, while for unconnected domains it returns manual instructions. This specific verb+resource+scope distinguishes it from siblings like add_custom_domain or check_custom_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains two distinct use cases (connected vs unconnected domains) and implies when to use the tool for automated sync vs manual guidance. While it doesn't explicitly list alternatives, the context is sufficient for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

unarchive_threadUnarchive a threadA

Idempotent

Inspect

Restore an archived thread (clears archivedAt). It reappears in the default inbox list. Pair with list_inbox filter=archived to find archived threadIds first.

ParametersJSON Schema

Name	Required	Description	Default
`threadId`	Yes	archived thread id to restore

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide idempotentHint=true. Description adds meaning by stating effect on archivedAt field and visibility change, which is useful beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two highly concise sentences with all essential information front-loaded. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with output schema and annotations, the description fully covers purpose, usage hint, and behavioral effect. Complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with description for 'threadId' as 'archived thread id to restore'. Description adds no extra meaning beyond the schema, so baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'restore' and resource 'archived thread', with specific effect 'clears archivedAt' and reappearance in inbox. Distinguishes from sibling 'archive_thread' and 'list_inbox'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly suggests pairing with 'list_inbox filter=archived' to find thread IDs first, providing clear usage context. Lacks explicit when-not-to-use but sufficient for the simple tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_briefPatch the project briefA

Idempotent

Inspect

Overwrite the project's working brief (jsonb). Hard-capped at 500 KB. Use to revise the AI-generated brief by hand; the original AI output is preserved separately in briefAiOriginal so you can always compare. Does not change status.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	project id
`brief`	Yes	the full replacement brief object (jsonb, ≤500 KB)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral details beyond the idempotentHint annotation: a hard cap at 500 KB, preservation of original AI output, and no status change. These are useful for the agent. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no fluff. Front-loaded with the core action, followed by constraints and side effects. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple update tool with annotations and output schema, the description covers constraints, usage context, and side effects completely. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the schema itself describes both parameters well. The description adds a size limit (500 KB) and notes it's a full replacement, but this is minor. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Overwrite' and the resource 'project's working brief', and distinguishes from AI regeneration by stating it is a manual revision. It also notes that the original AI output is preserved, adding clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use to revise the AI-generated brief by hand', providing a clear when-to-use. It implies not to use when AI regeneration is desired, but does not explicitly name the alternative (regenerate_brief). Still, it offers good guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_listingUpdate one of my listingsA

Idempotent

Inspect

Patch a listing by siteSlug — change price, pitch, terms, jurisdiction, or status (draft|live|paused). Sold listings cannot be edited. Status transitions enforced server-side. Use to pause sales or drop the price.

ParametersJSON Schema

Name	Required	Description
`pitch`	No	new sales pitch (null to clear)
`status`	No	new listing status
`siteSlug`	Yes	slug of the listed site to patch
`termsMode`	No	'standard' template or 'custom' terms
`priceCents`	No	new whole-dollar price in cents, 100–999900
`termsCustom`	No	new custom terms text (null to clear)
`jurisdiction`	No	governing jurisdiction, e.g. 'California, USA'

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that sold listings are immutable and that status transitions are enforced server-side. Annotations only provide idempotentHint, so the description adds valuable behavioral context beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the primary action and followed by key constraints and example uses. No extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the main functionality, constraints, and examples. An output schema exists to describe return values, and the description does not need to repeat that. Could mention idempotency, but annotations already cover that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (all parameters have descriptions). The description summarizes fields but adds no new per-parameter semantics beyond what the schema already provides, meeting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool patches a listing by siteSlug and lists the modifiable fields (price, pitch, terms, jurisdiction, status). This clearly distinguishes it from create_listing and delete_listing among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides concrete use cases ('Use to pause sales or drop the price') and a constraint ('Sold listings cannot be edited'). However, it does not explicitly contrast with alternative tools like create_listing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_claimVerify a Shiply claim pairing codeAInspect

Confirm a pairing code shown in the user's browser at https://shiply.now/claim/?pair=1. Use ONLY in the agent session that originally published the site — it reads the claimToken from .shiply.json in the current working directory and proves to Shiply that this agent session is authorised to claim the site. After verification the user is auto-redirected to /welcome and the site binds to their account.

ParametersJSON Schema

Name	Required	Description	Default
`code`	Yes	the SHIPLY-XXXXXXXX code in the user's browser

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond the idempotentHint: false annotation by explaining that it reads a local file, proves authorization, and triggers a redirect. It doesn't cover error handling or rate limits, but for a simple verification step, it's informative.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a few sentences, front-loaded with the primary action, and includes only necessary details. It could be slightly more compact but is clear and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given a single parameter, annotations, and the presence of an output schema, the description covers the what, when, how, and outcome. It doesn't discuss error scenarios, but for a simple tool this is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema already describes the code pattern. The description adds minimal extra context ('code in the user's browser'), but the baseline of 3 is appropriate since schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Confirm a pairing code' which is a specific verb and resource. It provides the exact URL pattern and distinguishes this verification from sibling tools like verify_sending_domain or verify_site by tying it to the claim workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'Use ONLY in the agent session that originally published the site' and explains the prerequisite of .shiply.json in the current directory. It also notes the outcome (auto-redirect to /welcome). No explicit alternatives are given, but the context is narrow enough that this is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_sending_domainRe-check DNS for a sending domainA

Idempotent

Inspect

Trigger Resend to re-check the domain's DNS records, persist the new status. Call after adding the DNS records returned by add_sending_domain. Status flips to 'verified' once SPF + DKIM + MX all check out.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	sending domain id from list_sending_domains

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds detail beyond annotations: describes persistence, status flip to 'verified' upon SPF+DKIM+MX success. Consistent with idempotentHint and openWorldHint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three efficient sentences: action+effect, usage guidance, and success condition. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers trigger, precondition, and verification criteria. With output schema present, return values need no elaboration. Lacks error handling, but sufficient for the tool's scope.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already describes the single 'id' parameter with 100% coverage. Description adds no new parameter semantics but provides usage context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('re-check domain DNS records'), the resource ('sending domain'), and the effect ('persist new status'). It also ties to sibling 'add_sending_domain' for context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs 'Call after adding the DNS records returned by add_sending_domain', providing a clear precondition. However, it does not mention when not to use or alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_siteVerify a live deployA

Read-only

Inspect

Edge-check a shiply slug or custom hostname and return a structured readiness report: status (LIVE/PENDING), SSL details (valid, issuer, daysLeft), HTTP probe, and a presigned thumbnail URL when available. Use this after publishing to confirm the site is reachable.

ParametersJSON Schema

Name	Required	Description	Default
`target`	Yes	slug (my-site) or hostname (www.example.com)

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that it is an edge-check returning a readiness report, which adds behavioral context beyond the readOnlyHint annotation. It does not mention auth requirements or rate limits, but the disclosure is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise—two sentences that front-load the main function and output, followed by usage context. Every sentence earns its place with no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema and annotations, the description covers the essential behavioral and usage details. It could briefly mention error handling or what happens if the site isn't live, but it is still complete enough for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The parameter 'target' is described in the schema (slug or hostname) and reinforced in the description. With 100% schema coverage, the description adds value by providing examples ('my-site' or 'www.example.com').

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: edge-check a slug or hostname and return a structured readiness report. It lists output fields (status, SSL details, HTTP probe, thumbnail URL) and distinguishes from siblings like check_custom_domain or get_site by emphasizing post-publishing verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this after publishing to confirm the site is reachable,' which provides clear when-to-use guidance. It does not explicitly list when not to use, but the context is sufficient for an AI agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

whoamiAccount overviewA

Read-only

Inspect

Who am I? Returns the signed-in account: email, @handle, plan + limits, counts of sites/domains/drives, and connected DNS providers. Call this first to orient before managing sites or domains.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`result`	Yes

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the output (account details) and aligns with the readOnlyHint annotation, confirming no destructive behavior. It adds valuable context about what the agent can expect, beyond the annotation's safety flag.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: the first lists the return values, the second provides usage advice. No extraneous information; every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a zero-parameter tool with annotations and an output schema, the description fully covers what the tool does and why it should be used. It includes all relevant details without omission.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With zero parameters and 100% schema coverage, the description adds substantial meaning by listing the specific fields returned (email, handle, plan) and their context, which the empty schema does not provide.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb ('Returns') and clearly identifies the resource (signed-in account) and the detailed fields returned (email, handle, plan, limits, counts, DNS providers). It distinguishes itself from sibling tools as an orientation tool, making its purpose unique.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises to 'Call this first to orient before managing sites or domains,' providing clear when-to-use guidance. While it does not mention alternatives, this is appropriate as the tool's purpose is unique and introductory.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Related MCP Servers

CreateOS MCP
Cloud Platforms Developer Tools CI/CD & DevOps
NodeOps-app
F
license
-
quality
D
maintenance
Deploy full-stack apps from AI. 75+ tools: GitHub/Docker deploy, databases, environments, security, billing.
Last updated 2026-05-20
2
SettleMesh
Payments & Billing Cloud Platforms Developer Tools
StructureIntelligence
A
license
-
quality
A
maintenance
Deploy and monetize agent-built apps from one MCP. Ship a full-stack app (login, database, usage billing) with one command, then charge the signed-in end user per API call via X-Settle-Payer (end-user-pays: cost times markup, app owner earns the markup).
Last updated 2026-07-26
Apache 2.0
hatchable-mcp
App Automation Cloud Platforms Developer Tools
Woobox
A
license
A
quality
F
maintenance
Build and host full-stack web apps and sites on Hatchable from any MCP client. Each project gets a dedicated Postgres database, auth, storage, custom domains, and cron.
Last updated 2026-04-23
33
1
MIT
shiply — Static Site Hosting &
Agent Orchestration Cloud Platforms
stevejford
A
license
A
quality
A
maintenance
Instant web hosting for AI agents. Publish a live site in one call, no account needed.
Last updated 2026-07-05
5
MIT

View all MCP Servers

Try in Browser

Your Connectors

Resources

Need Help?