Skip to main content
Glama

shiply

Server Details

Instant web hosting for AI agents. Publish a live site in one call, no account needed.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.2/5 across 103 of 103 tools scored. Lowest: 3.2/5.

Server CoherenceC
Disambiguation2/5

Many tools have overlapping purposes (e.g., add_domain, add_subdomain, add_custom_domain, add_sending_domain) or similar names like check_domain and check_custom_domain, which can confuse an agent. While descriptions help, the sheer number of closely related tools makes it difficult to consistently select the correct one.

Naming Consistency3/5

Most tools follow a verb_noun pattern, but there are exceptions like 'read_thread' (instead of get_thread) and 'mark_thread_read' (verb_noun_verb). Prefixes like 'data_' are used inconsistently with other tools, but overall the naming is still readable.

Tool Count1/5

With 103 tools, the server is extremely bloated. Even for a comprehensive platform, this count overwhelms any agent and makes the tool surface difficult to navigate. Many tools could be merged or are too granular.

Completeness4/5

The toolset covers a wide range of operations across sites, domains, email, marketplace, projects, contracts, and functions. While some updates (e.g., general contract editing) are missing, the surface is mostly comprehensive for the domain.

Available Tools

104 tools
add_custom_domainRegister a custom domainA
Idempotent
Inspect

Register a registrable domain (e.g. example.com) the user owns and detect its DNS provider. Returns the provider and whether one-click connect is available. Then attach sites with add_subdomain.

ParametersJSON Schema
NameRequiredDescriptionDefault
domainYesregistrable domain you own, e.g. example.com

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (idempotent, openWorld), the description discloses the two-step process (register + detect) and return information (provider, one-click availability). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences: first defines core action, second provides actionable follow-up. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given a simple tool with one parameter, good annotations, and an output schema, the description adequately covers purpose, outcome, and next step. Lacks detail on error handling (e.g., domain already registered), but acceptable with openWorldHint.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'domain' is fully described in the schema (100% coverage). The description reinforces the schema but adds no new semantic nuance beyond ownership, which is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool registers a registrable domain and detects its DNS provider, distinguishing it from siblings like add_domain and add_subdomain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives a next-step guidance ('Then attach sites with add_subdomain') and implies prerequisite (user must own the domain). However, it doesn't explicitly differentiate from check_custom_domain or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_domainConnect a custom domainA
Idempotent
Inspect

Serve a site on a domain the user owns. Returns the CNAME to add (hostname → cname.shiply.now); the certificate issues automatically once DNS resolves. Poll with check_domain.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesowned site slug to serve there
hostnameYesfull hostname to serve, e.g. www.example.com

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotent and open-world behavior. The description adds critical behavioral details: the tool returns a CNAME record, certificate issuance is automatic upon DNS resolution. This goes beyond the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences. The first delivers the core purpose and action, the second adds essential return and follow-up details. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description provides necessary behavioral context: return value format, async certificate, and polling recommendation. This covers the user's full workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema documentation covers both required parameters (slug and hostname) with descriptions. The tool description does not add new meaning beyond that, so a score of 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool serves a site on a user-owned domain and explicitly describes the return value and follow-up action. It distinguishes itself from sibling tools like add_custom_domain by focusing on serving a site on a full domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives a clear follow-up action ('Poll with check_domain'), indicating an async operation. While it doesn't explicitly list when not to use it, the context implies usage for primary domain connection, and the sibling list shows alternatives for subdomains and sending domains.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_sending_domainAdd a sending domainA
Idempotent
Inspect

Register a domain the user owns for outbound demand-test sends. Returns DNS records (SPF, DKIM, MX) to add at the DNS provider. After DNS propagates, call verify_sending_domain. Cannot be a shiply.now subdomain.

ParametersJSON Schema
NameRequiredDescriptionDefault
domainYese.g. mail.yourbrand.com

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations: it discloses that DNS records are returned and requires DNS propagation before verification. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences and a constraint, front-loaded with the main purpose and no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single parameter and the presence of an output schema (implied by 'returns DNS records'), the description adequately explains the tool's function and next steps, though it could mention the tool's idempotency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds no extra parameter meaning beyond what the schema already provides (e.g., example in schema).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool registers a domain for outbound demand-test sends and returns DNS records, distinguishing it from sibling tools like add_custom_domain or add_domain that handle other scopes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a follow-up action (call verify_sending_domain) and a constraint (no shiply.now subdomain), but does not explicitly compare with sibling tools or specify when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_subdomainPoint a subdomain at a siteA
Idempotent
Inspect

Serve an owned site at . (use subdomain "@" or "" for the apex — apex needs a provider with CNAME flattening/ALIAS, e.g. Cloudflare). Auto-registers the parent domain. Returns the CNAME record to add (host -> cname.shiply.now); the certificate issues automatically once DNS resolves. Poll with check_domain.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesowned site slug to serve there
domainYesregistrable parent domain, e.g. example.com
subdomainYessubdomain label, or "@"/"" for the apex

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses auto-registration of parent domain, return of CNAME record, and automatic certificate issuance. Annotations already indicate idempotent and open world hints; description adds useful context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with main purpose, no unnecessary words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all aspects: purpose, usage, return value, and post-requisite (polling). Complete for the tool's complexity given schema and output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds valuable context beyond schema: explains subdomain '@' or '' for apex, clarifies domain as registrable parent, and states slug as owned site slug. Complements 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool serves an owned site at a subdomain, specifies apex handling, and differentiates from siblings like 'add_custom_domain' and 'check_domain'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use the tool (to point a subdomain), provides a when-not (apex needs CNAME flattening), and suggests polling with 'check_domain' for status.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_suppressionSuppress an email or domainA
Idempotent
Inspect

Add an address (or whole domain) to the user's suppression list. Future confirmations and broadcasts will skip it across every test. Use kind='email' for a single address, kind='domain' for everyone @example.com.

ParametersJSON Schema
NameRequiredDescriptionDefault
kindYes'email' for one address, 'domain' for everyone @example.com
notesNooptional reason / note for the suppression
valueYese.g. 'spammy@example.com' or 'competitor.com'

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide idempotentHint=true, so the description does not need to restate that. The description adds valuable behavioral context: 'Future confirmations and broadcasts will skip it across every test,' which explains the effect beyond the basic add action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences with no wasted words. Every sentence provides essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description provides sufficient context for an agent to understand the tool's purpose and behavior. It could mention idempotency explicitly, but the annotation covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds only marginal value. It reinforces the kind parameter usage with examples ('kind='email' for a single address, kind='domain' for everyone @example.com'). This is helpful but not essential given the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Add an address or whole domain to the user's suppression list'), the resource (suppression list), and the effect (future confirmations and broadcasts will skip it). It differentiates between 'email' and 'domain' suppression, which distinguishes it from any sibling tool like 'remove_suppression'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidance by explaining when to use each kind parameter. It does not explicitly state when not to use the tool or mention alternatives, but the context is clear enough for an AI agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

archive_projectArchive a projectA
Idempotent
Inspect

Move a project to status='archived'. Hidden from the default dashboard list. Optional reason is shown on the project page. Restore later with restore_project.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesproject id to archive
reasonNooptional reason shown on the project page

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the idempotentHint annotation: it explains that the project becomes hidden from the default dashboard list and that the optional reason is displayed on the project page. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: four short sentences with no unnecessary words. It front-loads the main action and efficiently covers consequences and alternative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (implied), the description need not cover return values. It sufficiently explains the tool's effect. It could mention permissions, but for a simple archive action, this is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already describes both parameters (id and reason) with high coverage (100%). The description adds value by specifying that the reason is 'shown on the project page', which is not in the schema. This clarifies the reason's purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Move a project to status='archived''), the resource (project), and the effect (hidden from dashboard, display reason). It also distinguishes from the sibling 'restore_project' by mentioning restoration later.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates when to use this tool (to archive a project) and implicitly when not to (if restoration is intended, use 'restore_project'). It also notes the optional reason. No explicit exclusion of other alternatives, but sufficient for this context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

archive_threadArchive a threadA
Idempotent
Inspect

Soft-archive the thread (sets archivedAt). Hidden from the default inbox view; surface again with list_inbox filter=archived. Reverse with unarchive_thread.

ParametersJSON Schema
NameRequiredDescriptionDefault
threadIdYesthread id from list_inbox to archive

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that it's a soft-archive, sets archivedAt, and is reversible. It also explains the visibility impact. The idempotentHint annotation is consistent. Behavior is well explained.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the main action, and every word adds value. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and an output schema, the description covers purpose, effect, reversal, and filtering. It is complete and self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for threadId. The description adds context that the ID comes from list_inbox, which is helpful beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool archives a thread by setting archivedAt, and explains the effect on visibility. It distinguishes from sibling tools like unarchive_thread and list_inbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions when to use (to hide from default inbox) and how to surface again (list_inbox filter) and reverse (unarchive_thread). It could be more explicit about when not to use, but provides adequate guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_custom_domainCheck a custom domainA
Read-only
Inspect

Check whether a custom domain's subdomains are live: re-polls Cloudflare cert status + probes DNS/TLS/HTTPS on each subdomain. Poll this after connect_provider / add_subdomain to confirm the domain is serving. Returns per-subdomain status + tls + http + ready.

ParametersJSON Schema
NameRequiredDescriptionDefault
domainYesregistered custom domain to check, e.g. example.com

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only and open-world. Description adds that it re-polls and probes, consistent with annotations. Provides behavioral detail beyond just 'check'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. First sentence defines action, second gives usage and return. Excellent structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple nature and presence of output schema, the description covers purpose, usage, and output completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers parameter 100% with description. Description adds context about subdomains and the checking action, enhancing meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it checks if a custom domain's subdomains are live, with specifics on re-polling Cloudflare cert status and probing DNS/TLS/HTTPS. This distinguishes it from siblings like check_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to poll this after connect_provider/add_subdomain to confirm serving, providing clear context. Does not include when not to use, but is otherwise good.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_domainCheck a custom domainA
Read-only
Inspect

Refresh certificate status + live TLS/HTTPS probe for a connected domain (by id from list_domains).

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesconnected domain id from list_domains

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as readOnlyHint and openWorldHint. The description adds 'live probe' context, indicating it is real-time and potentially not instantaneous. No additional behavioral traits like rate limits or auth needs are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without any wasted words. It is well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that the tool has a single parameter with full schema coverage and an output schema exists, the description adequately covers the tool's function and input source. Minor gaps remain in behavioral details, but overall it is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the tool description essentially repeats the parameter info ('by id from list_domains'). No extra semantics, constraints, or format details are added beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool refreshes certificate status and performs a live TLS/HTTPS probe on a connected domain. It specifies the resource and action, but does not differentiate from the sibling tool 'check_custom_domain', which may cause confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that the domain ID comes from 'list_domains', providing a prerequisite. However, no guidance is given on when to use this tool versus alternatives like 'check_custom_domain', nor any exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

connect_providerConnect the domain's DNS providerA
Idempotent
Inspect

Start one-click DNS connect for a registered custom domain. For Cloudflare-hosted domains this returns an authorize URL — SHOW THE USER the url as a clickable link; after they authorize, records are written automatically. For other providers, add the records manually.

ParametersJSON Schema
NameRequiredDescriptionDefault
domainYesregistered custom domain to connect, e.g. example.com

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare idempotentHint=true and openWorldHint=true, and the description does not contradict them. It adds valuable behavioral details: for Cloudflare, returns an authorize URL that must be shown to the user; for others, manual records. No mention of permissions or side effects, but sufficient given the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the primary action, no unnecessary words. The structure efficiently conveys the core purpose and critical behavioral distinction.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 parameter, output schema exists, annotations provide safety), the description covers the essential points. It could elaborate on what 'one-click DNS connect' entails or post-authorization steps, but overall it is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the only parameter 'domain'. The description adds the qualifier 'registered custom domain', reinforcing the schema's intent but not adding substantial new meaning beyond the example. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Start one-click DNS connect for a registered custom domain.' It specifies the action and distinguishes between Cloudflare and other providers, making it distinct from sibling tools like add_custom_domain or check_custom_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use (for registered custom domains) and differentiates behavior for Cloudflare vs other providers. However, it does not explicitly state when not to use or mention alternatives, leaving some room for ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_amendCreate an amendment to a signed contractAInspect

Create an amendment to a SIGNED parent contract. Scope delta required; fee delta and target date optional. Returns the draft amendment for editing before send — call contract_send with the returned amendment id to fire it. Cannot amend an amendment (amend the parent instead).

ParametersJSON Schema
NameRequiredDescriptionDefault
scopeDeltaYesWhat's changing — visible to customer.
feeDeltaCentsNoOptional fee adjustment in cents. Can be negative for descope.
parentContractIdYesid of the SIGNED parent contract to amend
targetCompletionDateNoOptional ISO date (YYYY-MM-DD) for revised completion.

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description clarifies that tool returns a draft for editing before sending, implying mutation without immediate finalization. Consistent with idempotentHint=false. No destructive behavior mentioned, but creation is not destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first defines purpose and field requirements, second describes workflow and constraint. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no enums, and presence of output schema (not shown but noted), description covers purpose, usage, constraints, and return value concept. Complete for agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds context: scopeDelta visible to customer, feeDeltaCents can be negative for descope, parentContractId format, targetCompletionDate ISO date. Summarizes required/optional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states creates amendment to signed parent contract, lists required and optional fields. Distinguishes from siblings like contract_send and contract_draft by specifying workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states parent contract must be signed, cannot amend amendment, and indicates follow-up with contract_send. Provides clear workflow and constraints.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_draftDraft a contract from a brief_ready projectA
Idempotent
Inspect

Draft a contract from a brief_ready project. Auto-fills 8 fields from the AI brief, Stripe Connect default currency, and dev profile. Returns the contract row with status='draft' so the dev can review fields before sending. After this, edit fields via PATCH /api/v1/contracts/{id} (no MCP edit tool yet), then call contract_send to fire it.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesProject to draft a contract for

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Disclosures auto-fill behavior, returned status='draft', and the need for subsequent editing. Annotations show idempotentHint=true, and description adds context about review step. Could mention idempotency or failure if project not brief_ready.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, behavior, follow-up workflow. No fluff, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one required parameter, idempotentHint annotation, and output schema (though not shown), the description sufficiently covers creation, return value, and next steps. No missing critical details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Single parameter projectId is fully described in schema. Description adds the crucial constraint 'brief_ready project', providing meaning beyond schema. Could clarify what 'brief_ready' means.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Draft a contract'), resource ('contract'), and source ('brief_ready project'). It distinguishes from sibling tools like contract_amend and contract_send by specifying the draft stage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit workflow: draft → edit via PATCH (no MCP tool yet) → send via contract_send. Guides when to use (after brief_ready) and what to do after, with alternatives for editing and sending.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_pdfGet the signed contract PDF (base64)A
Read-only
Inspect

Get the signed contract PDF as a base64-encoded download. PDF includes the contract, signature certificate, and any signed amendments. Returns { filename, contentType, base64 }. Errors with conflict:contract_not_signed if the parent contract has not been signed yet.

ParametersJSON Schema
NameRequiredDescriptionDefault
contractIdYessigned contract id (parent or amendment) to render as PDF

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true. Description adds return format and error detail, providing behavioral context beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, efficient and front-loaded. Every sentence adds value. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a simple PDF retrieval tool: includes return structure, content details, and error case. Output schema exists but description already suffices.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description covers 100% of parameters and already explains the contractId field. Description does not add new semantic information beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it retrieves the signed contract PDF as base64. Specifies included content (contract, signature certificate, amendments). Distinguishes from siblings like contract_draft and contract_send.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use and describes error condition ('conflict:contract_not_signed' if not signed). No explicit alternatives, but context makes purpose obvious.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_sendSend a draft contract to the customerAInspect

Send a draft contract to the customer. Validates all 8 fields are non-empty, computes content_hash, flips project to contract_sent, fires the customer email. Same handler works for amendment drafts — sending an amendment does not move project state.

ParametersJSON Schema
NameRequiredDescriptionDefault
contractIdYesdraft contract id to send (from contract_draft)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses side effects: validation, content_hash computation, project state flip (or not for amendments), and email sending. This adds value beyond annotations (openWorldHint, idempotentHint false) by specifying exact state changes. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: four sentences with no redundancy. It front-loads the primary action and then lists key behaviors. Every sentence adds value, making it efficient for an AI agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one required param, output schema exists), the description covers purpose, side effects, and amendment behavior. It lacks explicit mention of required draft state, but output schema likely provides return details. Overall, it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has only one parameter (contractId) with schema description already clear. The tool description does not add any additional meaning beyond what the schema provides (e.g., format, source). Since schema coverage is 100%, baseline assignment of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Send a draft contract to the customer.' It details the specific steps (validates fields, computes hash, flips project state, fires email) and distinguishes handling of amendments, differentiating it from sibling tools like contract_draft and contract_amend.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that the same handler works for amendment drafts and notes that sending an amendment does not move project state, providing context for when to use. However, it does not explicitly state prerequisites (e.g., contract must be a draft) or when to avoid using this tool in favor of alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_statusRead contract state + amendmentsA
Read-only
Inspect

Read the current state of a contract: status, sent_at, viewed_at, signed_at, signer info, content_hash, plus any amendments. Use this to check whether a customer has signed yet. Returns { contract, amendments } — the contract row matches GET /api/v1/contracts/{id}.

ParametersJSON Schema
NameRequiredDescriptionDefault
contractIdYescontract id to read state for

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint: true. The description adds valuable behavioral context by listing the returned fields and stating the return shape is { contract, amendments }, referencing the API endpoint. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no extraneous words. It front-loads the purpose and immediately provides a usage cue, making it efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and the tool's simplicity (one parameter), the description fully covers purpose, usage, returned fields, and response shape. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers the single parameter contractId with 100% description coverage. The description does not add additional meaning beyond the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it reads the current state of a contract and lists key fields (status, timestamps, signer info, content_hash, amendments). It clearly distinguishes from sibling tools like contract_amend, contract_draft, contract_pdf, and contract_send, which perform other operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a concrete use case: 'Use this to check whether a customer has signed yet.' While it does not explicitly list when not to use it or mention alternatives, the context of siblings makes the usage clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_driveCreate a driveBInspect

Create a private cloud Drive (plan-limited).

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesdisplay name for the new drive

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With only idempotentHint=false in annotations, the description carries the burden. It mentions 'plan-limited' but omits details like duplicate name behavior, failure conditions, or side effects. Minimal disclosure for a write operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no wasted words. Essential information is presented upfront and compactly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with one parameter and an output schema. The description covers the core purpose and a constraint. Could mention plan limit behavior explicitly, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single 'name' parameter, which already describes 'display name for the new drive'. The description adds nothing beyond that, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create' and the resource 'private cloud Drive', and adds a constraint 'plan-limited'. It distinguishes from sibling tools like list_drives, but could be more specific about what a Drive is.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like list_drives or other create tools. The description does not mention prerequisites or scenarios where creation is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_listingList one of my sites for saleAInspect

Publish (or upsert) a marketplace listing for an owned site. Requires Stripe Connect set up (status='ready' — see get_connect_status). priceCents = whole-dollar between 100 and 999900. termsMode='standard' uses shiply's template; 'custom' requires termsCustom ≥50 chars. jurisdiction is required (e.g. 'California, USA').

ParametersJSON Schema
NameRequiredDescriptionDefault
pitchNoshort sales pitch, ≤280 chars
statusNopublish state (default draft)
siteSlugYesslug of the owned site to list
termsModeYes'standard' uses shiply's template; 'custom' requires termsCustom
priceCentsYeswhole-dollar price in cents, 100–999900
termsCustomNocustom terms text, ≥50 chars, required when termsMode='custom'
jurisdictionYesgoverning jurisdiction, e.g. 'California, USA'

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide only idempotentHint=false. The description adds context about Stripe Connect requirement, price range, and termsMode rules, which are behavioral traits beyond the schema. However, it does not disclose potential side effects (e.g., whether an existing listing is overwritten), auth details, or rate limits. The description adds some value but is not fully transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of three sentences that front-load the purpose. Every sentence adds value: the first states the action, the second gives a critical prerequisite, and the third enumerates key constraints. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters (4 required), 100% schema coverage, and an output schema, the description covers the essential behavioral context: prerequisite, price constraints, terms mode rules, and jurisdiction. It does not explain the upsert behavior in detail (e.g., what happens if the listing already exists), but overall it is sufficiently complete for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds extra context for priceCents (whole-dollar range), termsMode (custom requires termsCustom ≥50 chars), and jurisdiction (example provided). This clarifies the meaning beyond the schema, especially for the conditional requirement of termsCustom. No value is added for siteSlug or pitch, but overall it enhances understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool publishes or upserts a marketplace listing for an owned site. The verb 'Publish (or upsert)' is specific and the resource 'marketplace listing' is well-defined. However, it does not explicitly differentiate from the sibling 'update_listing' tool, which could cause confusion about when to use each.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a prerequisite (Stripe Connect set up) and provides constraints on parameters, but it does not specify when to use this tool versus alternatives like 'update_listing' or when not to use it. Usage guidance is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_projectCreate a client-intake projectAInspect

Spin up a new customer-intake project on the dev's account. Returns the project row plus intakeUrl — the public link the customer fills out (10-step wizard). If customerEmail is provided, shiply also emails them the intake invite. Use originatedFromSiteId to link a project to an existing site (e.g. 'redesign this site').

ParametersJSON Schema
NameRequiredDescriptionDefault
labelYesproject name shown in the dev dashboard
customerNameNothe customer's name
customerEmailNocustomer email; if set, shiply emails them the intake invite
originatedFromSiteIdNolink this project to an existing site (e.g. a redesign)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the tool creates a project, returns specific data including intakeUrl, and optionally sends an email. This adds behavioral context beyond the annotations (idempotentHint=false). It is clear and does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the core purpose. Every sentence adds necessary detail without redundancy. Very concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the parameter count (4) and output schema existence, the description covers essential behavior: creation, return value (project row + intakeUrl), optional email, and linking to site. It does not elaborate on error handling or rate limits, but is sufficiently complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema description coverage is 100%, so baseline is 3. The description adds minor value by repeating schema descriptions and mentioning the 10-step wizard, but does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a customer-intake project on the dev's account. It specifies the resource (project) and the action (spin up), and mentions returning the project row and intakeUrl, differentiating it from related tools like archive_project or list_projects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool: to create a new customer-intake project. It also provides context about optional behaviors (emailing invite if customerEmail is provided, linking to an existing site via originatedFromSiteId). However, it does not explicitly exclude scenarios or mention alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_testCreate an email demand testAInspect

Provision a demand test in one call: deploys a landing page with a native email-capture form and creates a confirmed-subscriber segment. Returns testId + live siteUrl. Share the siteUrl to collect signups; each signup gets a double-opt-in confirmation. Read progress with get_test_status.

ParametersJSON Schema
NameRequiredDescriptionDefault
ctaNocall-to-action button label
subNosubheadline / supporting line
ideaYesthe product/idea name
priceNoprice to display, e.g. "$29/mo"
headlineYeslanding-page headline

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the idempotentHint=false annotation, the description reveals that the tool creates resources (landing page, segment), returns testId and siteUrl, triggers double-opt-in confirmations, and suggests reading progress with get_test_status. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no redundancy. The first sentence states the core action, the second clarifies outputs, and the third provides follow-up. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of provisioning a demand test with multiple components, the description covers purpose, outputs, and next steps. It does not mention permissions or limitations, but output schema exists and follow-up tool is referenced.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter having a description. The tool description does not add additional meaning or examples beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description provides a specific verb ('Provision') and resource ('demand test'), clearly stating it deploys a landing page with email-capture form and creates a confirmed-subscriber segment. It distinguishes from siblings like create_listing or create_project by focusing on demand tests.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for setting up a demand test with landing page and email capture. It does not explicitly state when not to use or provide alternatives, but the context of sibling tools helps differentiate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_export_collectionExport records (capped)A
Read-only
Inspect

Return up to limit records (default 1000, max 5000) from a collection — for snapshotting into agent context. For larger sets use the CLI: shiply data export <slug> <collection>.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesowned site slug
limitNomax records to return (default 1000, max 5000)
collectionYescollection name to export

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description adds the specific limit and default/max values. It does not describe permissions, side effects, or return format beyond what the output schema likely covers.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. The first sentence front-loads the core functionality and constraints; the second provides an alternative. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (3 params with full schema descriptions, output schema present) and the read-only nature, the description is nearly complete. It could mention pagination or sorting behavior, but the main use case is well-covered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for slug, limit, and collection. The description reinforces the limit's default and max but adds no new semantic information beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns up to a specified limit of records from a collection, explicitly for snapshotting into agent context. It uses specific verbs ('Return', 'Export') and distinguishes from the alternative CLI tool for larger sets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides guidance on when to use this tool (for small exports into agent context) and when to use the CLI (for larger sets). However, it does not explicitly differentiate from sibling tools like data_query, which could be used for more complex queries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_insertInsert a record into a collectionAInspect

Insert one record into a collection. Goes through the same public visitor endpoint a browser would use — manifest access.insert decides whether it is allowed. Use to seed waitlist data, test forms end-to-end, etc.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesowned site slug
recordYesthe record fields to insert as key/value pairs
collectionYescollection name to insert into

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the idempotentHint=false annotation, the description adds behavioral context: it uses the public visitor endpoint, implying potential access checks and side effects like manifest evaluation. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences that front-load the core action and add essential context without any fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool (insert one record) and the presence of an output schema, the description provides sufficient context: purpose, access mechanism, and example use cases. No critical gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides descriptions for all parameters (100% coverage). The description does not add further details about parameter semantics, so it meets the baseline but does not exceed it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Insert one record into a collection' and provides example use cases like seeding waitlist data and testing forms, clearly distinguishing it from sibling tools like data_query or data_export_collection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that it goes through the same public visitor endpoint as a browser and that manifest access.insert controls permission, giving context on when it can be used. It also suggests specific use cases, though it lacks explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_list_collectionsList Site Data collectionsA
Read-only
Inspect

List collections declared in an owned site's .shiply/data.json with current record counts. Empty list means the site has no manifest yet — scaffold one with shiply data init.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesowned site slug

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. The description adds valuable behavioral context: it returns record counts and explains behavior for missing manifests ('Empty list means... scaffold one'), though it doesn't detail authorization beyond 'owned site'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, no fluff. First sentence front-loads the primary purpose; second adds edge-case guidance. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description need not detail return format. It covers the empty-list case and suggests scaffolding. Could mention prerequisites (site ownership) but overall adequate for a simple listing tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the only parameter 'slug'. The description does not add additional semantics beyond the schema's 'owned site slug'. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists collections with current record counts, using specific verb 'List' and resource 'collections'. It distinguishes from sibling tools like data_insert or data_query by its listing nature.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when the result is empty and suggests a follow-up action, but it does not explicitly state when to use this tool versus alternatives such as data_export_collection or data_query.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_queryQuery records from a collectionA
Read-only
Inspect

Page records from an owned site's collection, newest-first. limit ≤ 200 (default 50). cursor from a previous response's nextCursor.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesowned site slug
limitNomax records to return, ≤200 (default 50)
cursorNonextCursor from a previous response's page
collectionYescollection name from data_list_collections

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral details beyond readOnlyHint: newest-first ordering, pagination with cursor, and limit constraints. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. Essential information front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all necessary aspects: what it does, how to paginate, limits, and ordering. Output schema exists, so return values are handled.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaning beyond schema by explaining 'newest-first' ordering and cursor usage, though schema already covers each parameter with descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Page records from an owned site's collection, newest-first', which is a specific verb+resource pattern. It distinguishes from sibling tools like data_insert and data_export_collection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage via pagination details (limit, cursor) and ordering, but does not explicitly state when to use vs. alternatives. However, context makes it clear it's for querying.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_listingUnpublish a listing (set to draft)A
Destructive
Inspect

Take a listing off the public marketplace by moving it to status='draft'. Marketplace v1 keeps the row so analytics + future re-listing work — there's no hard delete. Sold listings can't be modified. Use to stop accepting offers without losing pricing history.

ParametersJSON Schema
NameRequiredDescriptionDefault
siteSlugYesslug of the listed site to unpublish

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true, but the description adds valuable nuance: it's not a hard delete; the row and pricing history are preserved. This clarifies the actual impact and sets accurate expectations beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with no wasted words. Each sentence adds distinct value: action, behavioral nuance, and use case. It is front-loaded with the key action and immediately clarifies important distinctions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, behavior, and usage context. Given the presence of an output schema (not shown but indicated), return values are covered. Minor gap: it doesn't mention prerequisites like ownership of the listing, but this is non-critical and the overall completeness is high for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 100% coverage for the single parameter 'siteSlug', which is already well-described. The description does not add additional detail about the parameter, but given full schema coverage, the baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: moving a listing to 'draft' status to unpublish it. It distinguishes from hard deletion by explaining that the row is kept for analytics and re-listing. This avoids confusion with sibling tools like 'delete_site' or 'delete_variable' that perform actual deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says to use this tool 'to stop accepting offers without losing pricing history.' It also notes that 'sold listings can't be modified,' implying a precondition. While it doesn't mention alternatives like 'update_listing' for other changes, the guidance is clear and sufficient for the intended use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_siteDelete a siteA
Destructive
Inspect

PERMANENTLY delete a site and all stored files. Irreversible — confirm with the user first.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug to delete

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations already flag destructiveHint as true, the description adds crucial context: the deletion is permanent, includes all stored files, and is irreversible. This reveals behavioral details beyond the annotation flag.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is exceptionally concise with two sentences. It front-loads the most critical information (permanent deletion) and wastes no words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and existing output schema, the description covers all essential aspects: the action, its consequences, and a usage caveat. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the only parameter 'slug' has a clear description. The tool description adds nothing beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it 'PERMANENTLY delete a site and all stored files', using strong language to indicate severity and irreversibility. It distinguishes from sibling tools like archive_project or other delete operations by specifying the scope and permanence.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a usage guideline to 'confirm with the user first' due to irreversibility, but lacks explicit when-not-to-use or alternative mentions (e.g., archiving instead). The guideline is clear but minimal.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_variableDelete a variableB
Destructive
Inspect

Remove one variable by name.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesvariable name to delete

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation already declares destructiveHint=true. The description only repeats the destructive action ('Remove') without adding additional behavioral context such as side effects or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that gets the point across without unnecessary verbosity. For a simple tool, this is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the tool is simple and has an output schema, the description does not address possible error cases (e.g., variable not found). It provides basic information but lacks completeness for edge cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description of the 'name' parameter ('variable name to delete') adds minimal meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it removes a variable by name, matching the name and title. It distinguishes from sibling tools like set_variable and list_variables.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide explicit guidance on when to use this tool vs alternatives. However, the context of deleting a variable is clear from the name and description.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deploy_functionDeploy a Worker function to a siteA
Idempotent
Inspect

Deploy a Worker function to a site. The function runs on every request to .shiply.now and can receive webhooks, run on cron triggers, and access bindings (D1, secrets, env vars). Requires Developer plan. Use when the user wants webhook receivers, cron jobs, or a backend for their site.

ParametersJSON Schema
NameRequiredDescriptionDefault
langNosource language (default js)
slugYessite slug to deploy the function to
cronsNocron triggers to register, ≤20
sourceYesthe Worker source code

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes runtime behavior (every request, webhooks, cron, bindings) but does not clarify overwrite semantics or that deploying updates an existing function. IdempotentHint suggests safety, but description misses this nuance.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, no redundant information. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists (not shown but referenced), description sufficiently covers behavior, use cases, and requirements. Lacks explicit mention of return values or error conditions, but these are likely covered by output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all parameters fully. Description adds some context for slug (runs on slug.shiply.now) and crons (≤20), but does not significantly enhance understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Deploy a Worker function to a site' with specific verb and resource. Distinguishes from sibling tools like remove_function, get_function, and set_cron.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases ('webhook receivers, cron jobs, or a backend') and a requirement ('Developer plan'). Lacks explicit alternatives or when-not-to-use scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

drive_delete_fileDelete a drive fileA
Destructive
Inspect

Delete a file from a Drive.

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYespath of the file to delete
driveIdYesdrive id (drv_…) or "default"

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The destructiveHint annotation already indicates mutability. The description adds no further behavioral context (e.g., permanence, trash vs. permanent delete, auth requirements). With annotations present, this is minimally adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence with no wasted words. It is appropriately front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, with annotations and output schema), the description covers the basic function but lacks details on outcomes or error states. It is complete enough for a straightforward tool but could be richer.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the description adds no additional meaning beyond the schema. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Delete a file from a Drive.' clearly states the action and resource, distinguishing it from sibling delete tools like delete_listing or delete_site by specifying 'Drive' and 'file'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. It does not explain prerequisites, consequences, or differentiate from other delete tools (e.g., if a drive deletion tool exists elsewhere).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

drive_list_filesList drive filesA
Read-only
Inspect

List files in a Drive (driveId = drv_…, or "default"). Optional prefix filter.

ParametersJSON Schema
NameRequiredDescriptionDefault
prefixNoonly list files under this path prefix
driveIdYesdrive id (drv_…) or "default"

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds minimal behavioral context beyond the annotations (readOnlyHint: true). It mentions listing with optional filter but does not disclose pagination, sorting, or other expected behaviors.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded and efficient. Every word contributes meaning without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the availability of an output schema and the tool's simplicity (2 parameters), the description adequately covers the core functionality. However, it omits potential details like pagination or error cases, which could be helpful for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds some value by restating parameter formats (driveId as drv_… or 'default') and prefix filter as optional. However, since the schema already covers these with 100% description coverage, the addition is marginal, meeting the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists files in a Drive, specifying the driveId format and optional prefix filter. This distinguishes it from sibling tools like drive_delete_file and drive_put_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly provide when-to-use or when-not-to-use guidance compared to alternatives. The purpose is clear but lacks contextual usage advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

drive_put_fileWrite a drive fileA
Idempotent
Inspect

Write a file into a Drive (driveId = drv_… or "default"). content is utf8 or base64. Use for agent memory, notes, context, assets.

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYesdestination path inside the drive, e.g. notes/context.md
contentYesfile contents (utf8 text, or base64 when encoding=base64)
driveIdYesdrive id (drv_…) or "default"
encodingNodefault utf8; base64 for binary

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotentHint=true, which the description does not contradict. However, the description does not clarify overwrite behavior (e.g., whether writing to an existing path replaces the file or appends) or mention required permissions, leaving some behavioral ambiguity beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences: the first defines the core action, the second adds encoding and intended use. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but indicated), the description is sufficient for a 4-parameter tool. It covers encoding, driveId format, and use cases, leaving only minor gaps like overwrite behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents all parameters. The description adds the 'utf8 or base64' clarification and driveId format, but these are also present in the schema descriptions, providing minimal additional value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Write' and the resource 'file into a Drive', and differentiates from sibling tools like drive_list_files and drive_delete_file by specifying the write action and use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear use cases ('agent memory, notes, context, assets') and explains the encoding and driveId format, but does not explicitly say when not to use this tool or when to prefer alternatives, though the context of write vs read/list is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

duplicate_siteDuplicate a siteAInspect

Server-side copy of an owned site under a new slug — instantly live. Copies files + title; does NOT copy access settings, domains, or data. Great for iterating on variants.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug to copy
titleNodisplay title for the new copy (defaults to source title)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the tool is non-idempotent (creating a new copy), matches the idempotentHint=false annotation, and explicitly states what is not copied (access settings, domains, data). It could mention ownership requirements, but 'owned site' sufficiently implies that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the key action and result, and every word adds value. There is no wasted content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high schema coverage and existence of an output schema, the description is complete enough. It explains the scope (what is copied/not copied) and use case, though it could mention potential limits or prerequisites (e.g., site ownership).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description adds no additional meaning beyond the parameter names and types. The use of 'new slug' and 'title' mirrors the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs a server-side copy of a site under a new slug, specifies what is copied (files + title) and what is not (access, domains, data), and suggests use for iterating on variants, distinguishing it from sibling tools like create_site or delete_site.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'Great for iterating on variants' implies when to use it. It provides clear context but does not explicitly list alternatives or conditions when not to use it, which is acceptable given the clear purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

export_accountExport account dataA
Read-only
Inspect

Return a JSON bundle of the user's profile, sites, Site Data, drives, and metadata (secrets excluded). Data portability.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description adds valuable context: the tool returns a JSON bundle of multiple components, excludes secrets, and serves data portability. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no redundancy. The first lists contents, the second conveys purpose. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and the presence of an output schema, the description sufficiently covers what the tool does and excludes. However, it omits potential limitations (e.g., size caps, performance).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero parameters and 100% coverage, so the description does not need to add parameter details. Baseline 4 for no parameters is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Return') and resource ('JSON bundle of the user's profile, sites, Site Data, drives, and metadata'), clearly distinguishing it from sibling tools that handle individual entities. It also notes explicit exclusions ('secrets excluded').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for full account data export via 'Data portability' but does not explicitly state when to use this tool over alternatives (e.g., list_sites, list_drives) or provide exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

feature_siteFeature a site on ExploreA
Idempotent
Inspect

Toggle whether an owned, public site appears in the public shiply Explore gallery (https://shiply.now/explore). Only public-access sites are eligible.

ParametersJSON Schema
NameRequiredDescriptionDefault
showYestrue to feature on Explore, false to remove
slugYesowned, public site slug

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare idempotentHint=true, so the description adds the eligibility constraint but no further behavioral traits. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one sentence with no filler, front-loading the action and resource. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple toggle action, high schema coverage, and existence of an output schema, the description is complete. It covers eligibility, parameters, and purpose adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with parameter descriptions. The description adds the toggle context, but does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool toggles a site's visibility on the Explore gallery, specifying the resource (owned public site) and context. It distinguishes from sibling tools like list_sites, verify_site, and delete_site.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes an eligibility condition ('only public-access sites are eligible') and implies when to use (to feature/unfeature). However, it does not explicitly state when not to use or provide alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

forward_threadForward an inbox message to a new recipientAInspect

Forward a message from a thread (default: the most recent message) to a NEW recipient with an optional intro note. Body is composed as note + standard '---------- Forwarded message ----------' quote of the original. Goes from the thread's existing shiply alias so replies still route through the inbox. Subject defaults to 'Fwd: '.

ParametersJSON Schema
NameRequiredDescriptionDefault
toYesnew recipient email address
noteNointro note prepended above the forwarded quote
subjectNosubject; defaults to 'Fwd: <original>'
threadIdYesthread id from list_inbox to forward from
messageIdNospecific message to forward (default: most recent)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains body composition (note + forward quote), sender alias behavior, and subject default, adding context beyond annotations. No contradiction with annotations, though could mention any side effects of forwarding (none likely).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load core purpose and then provide essential details. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, 2 required, full schema coverage, and existing output schema, the description covers all critical behavior (body construction, alias, defaults). Complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds meaning by clarifying how 'note' is used, default behavior for 'messageId', and default subject. It enriches parameter understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The title and description clearly state the tool forwards a message from a thread to a new recipient, with specific defaults (most recent message). It distinguishes from siblings like reply_to_thread and send_email by clarifying the forward action and behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates when to use (forward to new recipient with optional note) and implies not for replying directly, but does not explicitly list alternatives or when-not usage. Still, the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_account_statusAccount plan + capability matrixA
Read-only
Inspect

Get the signed-in account's plan, capabilities, and upgrade URL. Call this FIRST when figuring out what features you have access to — it tells you exactly what's available and what's blocked. The upgrade_url is human-clickable; show it in chat when a feature requires a higher plan. Returns plan id + name + subscription status, hard limits (sites, databases, custom domains, drives), and a capability matrix listing every gated feature (workers_lite, databases_neon_postgres, custom_domains, etc.) with whether you have access and the minimum plan needed.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and description adds valuable context: returns upgrade URL (human-clickable), hard limits, and capability matrix with access status. No contradiction; description enriches understanding of what the read operation provides.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with action and usage advice. No filler words. Efficiently covers purpose, return structure, and actionability of output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and presence of output schema, the description thoroughly explains return fields, examples, and usage context (e.g., show upgrade URL in chat). Covers all necessary details for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist; schema description coverage is 100% by default. Description focuses on output structure, which is appropriate. Baseline 4 for zero-parameter tools.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it retrieves signed-in account's plan, capabilities, and upgrade URL with specific output fields. Verb 'Get' and resource 'plan + capability matrix' are precise. Distinguishes from sibling tools by positioning as a first-step diagnostic.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises calling FIRST when checking feature access. Gives specific guidance on upgrade URL usage. Lacks explicit when-not-to-use, but context implies it's for read-only checks before other actions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_analyticsSite analyticsA
Read-only
Inspect

Daily page views per site for the last 30 days.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the tool is safe to call. The description adds that it returns daily page views for the last 30 days, but does not elaborate on data format or aggregation details. With annotations covering safety, the description adds moderate behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words. Perfectly concise and front-loaded with the key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the core purpose. An output schema exists (not shown), which likely details the return format. For a simple, parameterless read-only tool, the description is nearly complete. Could clarify scope (all sites? current user's sites?) but adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schema coverage is 100% (vacuously). Baseline for 0 parameters is 4; the description does not need to add parameter info and does not detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description specifies exactly what the tool does: retrieve daily page views per site for the last 30 days. The verb 'get' and resource 'analytics' are clear, and it distinguishes itself from sibling tools like 'get_site' or 'list_sites' by focusing on analytics data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for retrieving recent site analytics but provides no explicit guidance on when to use this tool vs. alternatives (e.g., which sibling tools might also return site data). No 'when not to use' or alternative suggestions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_connect_statusStripe Connect onboarding statusA
Read-only
Inspect

Return the seller's Stripe Connect state: not_started | in_progress | pending_verification | ready | disabled. When status != 'ready' the user can't list sites. Includes a one-shot onboardingUrl (if not_started or in_progress) and dashboardUrl (if ready). Refreshes from Stripe on every call.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, consistent with read nature. Description adds that it refreshes from Stripe on every call, which is important behavioral info beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first describes output, second adds usage context and URLs. Extremely concise with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite output schema existing, the description covers statuses, their implication, and included URLs. Could mention URL expiration, but overall complete for a simple status check.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters in schema (0 params), so baseline is 4. Description adds meaning by explaining what the tool returns (status enum and URLs), which is useful beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns the seller's Stripe Connect state with explicit list of statuses. It distinguishes from sibling tools by focusing on Stripe Connect onboarding status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on when to use: checking if user can list sites. Includes one-shot URLs for onboarding/dashboard. Does not explicitly mention when not to use or alternatives, but the purpose is well-scoped.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_functionGet the deployed function for a siteA
Read-only
Inspect

Return the deployed function source + metadata for a site, or null if no function deployed.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug whose function to fetch

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, consistent with 'Return'. Description adds that it returns source + metadata and null if none, providing useful behavioral detail without contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with verb and resource, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only retrieval tool with 1 param and output schema, the description is complete: it states what is returned and the null case. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 1 parameter with description; schema coverage is 100%. The description adds no additional meaning beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Return', the resource 'deployed function', the scope 'for a site', and the null case, distinguishing it from siblings like deploy_function and get_function_logs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives among many siblings. The description implies retrieval of existing functions but lacks context on prerequisites or typical scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_function_logsGet a dashboard link for function logsA
Read-only
Inspect

Get a deep-link to Cloudflare's dashboard for live function logs (tail, invocation history). Real logs UI is deferred — this returns the CF dashboard URL.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug whose function logs to link to

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds context beyond annotations (readOnlyHint=true) by specifying that the tool returns a URL and that the real logs UI is deferred. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, front-loaded with purpose, no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a simple tool with one parameter and an output schema. Covers purpose, return type, and parameter. Minor omission of URL format or auth requirements.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the single parameter 'slug'. Description does not add additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it returns a deep-link to Cloudflare's dashboard for function logs. Uses specific verb+resource ('Get a deep-link...'), distinguishing it from sibling tools like get_function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage: use to get a dashboard link for function logs. No explicit when/when-not guidance or alternatives mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_projectGet one project (full row)A
Read-only
Inspect

Read one of the dev's projects by id — includes label, status, customer details, intake responses, AI brief, drive folder. Use before update_brief / regenerate_brief.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesproject id

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description's main contribution is detailing the returned fields (label, status, etc.). This adds value beyond the annotation without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences front-loaded with the core action and key details. Every sentence earns its place with no filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description adequately covers what the tool returns and its typical use. It lacks explicit error handling notes (e.g., if id not found), but that is likely covered by the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear 'project id' description for the id parameter. The description adds nothing new about the parameter beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Read one of the dev's projects by id' and lists what it includes (label, status, customer details, etc.), distinguishing it from siblings like list_projects. It explicitly positions itself as a pre-read for update_brief/regenerate_brief.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises 'Use before update_brief / regenerate_brief,' giving clear context. It doesn't explicitly state when not to use, but the read-only nature and sibling names (e.g., list_projects) imply appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_siteSite detailA
Read-only
Inspect

Site settings + version history for one of my sites.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug, e.g. my-site

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, so the read-only nature is already known. The description adds that it returns 'settings + version history', providing context beyond the annotation. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence that is front-loaded and contains no unnecessary words. Every part is informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With only one parameter and an output schema, the description adequately covers the tool's purpose. However, it could mention what 'version history' entails, but given the output schema, it is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with a description for the slug parameter. The tool description does not add new parameter-level information beyond what the schema already provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'Site settings + version history for one of my sites.' The verb 'get' and resource 'site' are specific, and it distinguishes from sibling tools like list_sites (which lists all sites) and other site-specific tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies it's for retrieving details of a single site, but does not explicitly state when to use it versus alternatives (e.g., list_sites for listing, duplicate_site for copying). No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_test_statusDemand test status (the verdict)A
Read-only
Inspect

ONE consolidated object: page funnel (views, signups, confirmed, conversion) ⊕ email events (delivered/opened/clicked/bounced) ⊕ a computed verdict. The single place to check progress — never query email separately.

ParametersJSON Schema
NameRequiredDescriptionDefault
testIdYesdemand test id from create_test / list_tests

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so description need not reiterate safety. It adds transparency by detailing what is included (page funnel, email events, verdict) without mentioning side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key information. Every word adds value; no repetition or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter, read-only tool with output schema, the description sufficiently covers purpose and content. No missing guidance for an agent to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter testId is fully described in the input schema. Description adds slight context (source of ID from create_test/list_tests) but does not significantly enhance schema information.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it returns a consolidated object with page funnel, email events, and a verdict. Distinguishes itself as the single place for test progress, contrasting with querying email separately.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to use this for test progress and never query email separately, implying when to use and when not to. Could name a specific alternative sibling, but the guidance is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_complaintsList complaints + bouncesA
Read-only
Inspect

Return threads tagged as complaints (spam reports) OR bounces (recipient rejected). Read these before any further sending.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true. Description adds value by specifying the type of data returned (complaints and bounces), which goes beyond the annotation's safety indication.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no waste. First sentence defines purpose; second provides usage guidance. Information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and presence of output schema, the description covers what the tool does and when to use it. Completing the picture for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so description cannot add value. Baseline is 4; no deficiency.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it returns threads tagged as complaints or bounces, using specific verb 'Return' and resource 'threads tagged as complaints or bounces'. It distinguishes from sibling list tools by specifying the content and adding usage advice.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to read these before any further sending, providing clear context for when to use. Does not explicitly exclude alternatives but the advice is targeted.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_cronsList cron triggers for a siteA
Read-only
Inspect

List cron triggers for a site's deployed function. Each cron is (path, schedule, lastRunAt).

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug whose cron triggers to list

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, so the description adds value by detailing the output format (each cron includes path, schedule, lastRunAt). No additional behavioral traits like rate limits or permissions are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. Front-loaded with the action and resource, followed by output format. Highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one required parameter, the description covers the purpose and output structure. Could mention that slug must refer to a site with a deployed function, but it is adequately complete given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema already documents the single parameter 'slug' with a clear description. The tool description does not add extra meaning beyond what the schema provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool lists cron triggers for a site's deployed function. Distinguishes from sibling tools like set_cron and remove_cron by specifying the list action and the resource (cron triggers).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage by describing the resource (cron triggers for a site's deployed function), but does not explicitly state when to use this tool over alternatives like set_cron or remove_cron, nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_custom_domainsList custom domainsA
Read-only
Inspect

List registered custom domains grouped with their subdomains, each subdomain's site and status, and the detected provider.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description doesn't need to repeat that. It adds value by detailing the grouped output structure (subdomains, site, status, provider), which helps the agent understand what to expect.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the action and resource. Every part is necessary and nothing is wasted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers what the tool returns, which is sufficient for a simple listing tool with no parameters. It doesn't mention pagination or ordering, but that's likely acceptable. An output schema exists but is not shown.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are no parameters (schema coverage 100% empty). The description doesn't need to add parameter info. Baseline for zero parameters is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists registered custom domains grouped with subdomains, including each subdomain's site, status, and provider. It uses a specific verb and resource, and distinguishes from sibling tools like list_domains or add_custom_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing custom domains but provides no explicit guidance on when to use this tool versus alternatives like list_domains. No when-not or filtering context is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_domainsList custom domainsB
Read-only
Inspect

List connected custom domains with status.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds minimal context ('with status') but does not disclose any other behavioral traits such as authentication requirements or data freshness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, front-loaded sentence with no redundancy. Efficient use of prose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (no params, annotations present, output schema exists), the description is adequate. However, the presence of a highly similar sibling tool is not addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schema description coverage is 100%. Baseline for 0 params is 4; description adds no further semantics, which is acceptable.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists connected custom domains with status. However, there is a sibling tool 'list_custom_domains' with no differentiation, causing potential ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like 'list_custom_domains' or other list tools. Lacks explicit context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_drivesList drivesA
Read-only
Inspect

List the user's private cloud Drives (id, name).

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description's addition of return fields (id, name) provides useful behavioral context beyond safety. No pagination or rate limit issues are noted, but for a zero-parameter list, this is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—one short sentence that front-loads the core purpose with no extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (no parameters, output schema exists), the description fully covers the necessary context by stating what is returned. It is complete for the agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are no parameters, so the baseline is 4. The description adds no parameter info, but that is expected since none exist.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'list' and the resource 'user's private cloud Drives' with return fields 'id, name'. It effectively distinguishes from sibling tools like create_drive or drive_list_files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage (when you need to enumerate drives) but provides no explicit guidance on when not to use it or alternatives. For a simple list tool, this is adequate but not exemplary.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_inboxList inbox threadsA
Read-only
Inspect

List the user's email inbox threads (outbound demand-test sends + inbound replies / unsubscribes / complaints / bounces). Filter by tag.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax threads to return, ≤200
filterNodefault all
offsetNonumber of threads to skip (pagination)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description is consistent with the readOnlyHint annotation, but it adds no additional behavioral details beyond listing and filtering. No contradictions are present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the verb and resource, with no extraneous words. Every word contributes to clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description is adequate for a list tool. It could mention pagination behavior or default ordering, but the parameters cover limits and offsets.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents parameters. The description adds 'Filter by tag', which is slightly misleading because the filter enum uses predefined categories, not tags. However, it adds minimal extra meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and the resource 'email inbox threads', and specifies the types of content included (outbound demand-test sends, inbound replies, unsubscribes, complaints, bounces), which distinguishes it from sibling tools like list_site_inbox or list_complaints.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for viewing the inbox but does not provide explicit guidance on when to use this tool versus alternatives (e.g., list_complaints, list_unsubscribes) or mention any prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_listingsList my marketplace listingsA
Read-only
Inspect

Return every marketplace listing the seller owns (any status: draft, live, paused, sold). Includes site slug and current price. Use to see what's for sale across the account.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond the readOnlyHint annotation by specifying the returned data includes 'site slug and current price' and all statuses. It does not contradict annotations and provides sufficient behavioral context for a simple read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. The first sentence states the core functionality, and the second sentence adds guidance. It is front-loaded and perfectly concise for a zero-parameter tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters and an output schema exists, the description fully covers what the agent needs to know: what it returns (all listings with details) and why to use it. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters, so schema_description_coverage is trivially 100%. The description adds no parameter information because none are needed. With 0 parameters, the baseline expectation is 4, and the description meets it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it returns every marketplace listing the seller owns across all statuses (draft, live, paused, sold) and includes site slug and current price. This clearly distinguishes it from sibling tools like create_listing, delete_listing, or update_listing, which have different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises using the tool 'to see what's for sale across the account,' providing clear context. While it doesn't explicitly state when not to use it or mention alternatives, the tool's scope is well-defined and easily distinguishable from siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_mailbox_contactsList a mailbox's contactsA
Read-only
Inspect

List captured contacts for a (site, collection) mailbox, optionally filtered by status (signed_up/confirmed/unsubscribed). Returns email, status, confirmedAt, createdAt, and captured fields.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug the mailbox belongs to
statusNofilter contacts by status
collectionYesmailbox collection name

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description's mention of returned fields (email, status, confirmedAt, createdAt, captured) adds minor context but does not disclose any significant behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys purpose and optional filtering efficiently without superfluous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately covers core details. However, it lacks information on pagination or ordering, which might be expected for a list tool, but is otherwise complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description does not add new parameter meaning beyond repeating the status enum and field names. Baseline score of 3 is appropriate as no additional semantics are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (list), resource (contacts for a mailbox), and scope (site, collection), with optional filtering. It distinguishes from sibling tools like list_inbox and list_suppressions by specifying contacts and filter options.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through the context of listing mailbox contacts, but does not explicitly state when to use this tool vs alternatives like list_inbox or list_unsubscribes. No exclusions or conditional guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_my_ordersList orders where I am the buyerA
Read-only
Inspect

Return every marketplace order the user PURCHASED. Most recent first. Shows the acquired site, paid amount, and whether the order is still inside the 30-day refund window. Use to recap what the user owns by purchase.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax orders to return (default 100, max 500)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint: true, so the description's addition of sorting ('Most recent first'), returned fields (site, amount, refund window), and refund window scope adds meaningful behavioral context beyond the annotation. It does not mention pagination or cursor, but the limit parameter covers that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each serving a purpose: first states function, second gives order, third lists shown fields and use case. No wasted words, essential info is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there is an output schema (listed in context) and only one optional parameter, the description adequately covers the tool's behavior. It explains sorting, key fields, and the refund window concept. Minor gap: does not explain what happens if no orders or error conditions, but sufficient for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage for the single parameter 'limit' is 100%, but the description adds default and maximum values (default 100, max 500) not present in the schema. This provides practical guidance for the agent beyond the schema definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Return every marketplace order the user PURCHASED.' It uses a specific verb 'return' and clearly identifies the resource (marketplace orders) and scope (purchased by user). This distinguishes it from the sibling tool 'list_my_sales', which is for sales orders.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description ends with 'Use to recap what the user owns by purchase,' providing a clear usage context. However, it does not explicitly state when not to use it or mention alternatives like list_my_sales. The context of siblings partially compensates.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_my_salesList orders where I am the sellerA
Read-only
Inspect

Return every marketplace order for sites the user sold (incl. pending, paid, refunded, failed, disputed). Most recent first. Use to surface revenue + which orders are still inside the 30-day refund window (refundExpiresAt > now).

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax orders to return (default 100, max 500)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true. The description adds valuable details such as inclusion of statuses (pending, paid, refunded, failed, disputed), ordering (most recent first), and a specific field (refundExpiresAt) for the refund window. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two sentences that front-load the key information without any extraneous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity (one optional parameter, output schema exists), the description fully covers the tool's behavior, return content, ordering, and a practical use case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter 'limit', which is adequately described in the schema. The description does not add further parameter-specific details beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns marketplace orders for sites the user sold, including various statuses, sorted most recent first. However, it does not explicitly distinguish from the sibling tool 'list_my_orders', which may cause confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a specific use case ('surface revenue + which orders are still inside the 30-day refund window'), but lacks guidance on when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_project_filesList files uploaded to a projectA
Read-only
Inspect

Return the customer-uploaded files for one project (path, size, contentType, createdAt). Empty when no drive folder exists yet (no uploads). Use to inspect what intake assets the customer attached.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesproject id whose uploaded files to list

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral detail beyond the readOnlyHint annotation: it specifies the empty return case when no drive folder exists. This is useful context that the annotation alone does not provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The first sentence states functionality and edge case; the second gives usage guidance. Information is front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read tool with one parameter, existing output schema, and readOnly annotation, the description covers all essential aspects: return fields, empty case, and usage purpose. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds minimal meaning beyond the input schema, which already fully describes the 'id' parameter as 'project id whose uploaded files to list.' With 100% schema coverage, the baseline is 3, and the description does not significantly enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns customer-uploaded files for a project with specific fields (path, size, contentType, createdAt). This distinguishes it from sibling tools like drive_list_files (which likely lists all files in a drive) and list_projects (which lists projects).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: 'Use to inspect what intake assets the customer attached.' This tells the agent when to use it, though it does not explicitly mention when not to use it or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_projectsList my client-intake projectsA
Read-only
Inspect

List the dev's customer-intake projects (newest first). Optional filters: status (draft|intake_open|brief_ready|brief_failed|archived), q (case-insensitive match on label or customer email), limit (default 100). Use to triage what's in flight before opening a specific project.

ParametersJSON Schema
NameRequiredDescriptionDefault
qNocase-insensitive match on label or customer email
limitNomax projects to return (default 100)
statusNofilter by project status

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the readOnlyHint annotation by stating the result ordering (newest first) and the available filters. There is no contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. The first sentence covers purpose and filters; the second provides usage guidance. Efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the presence of an output schema, and no required parameters, the description is complete. It covers purpose, ordering, filters, and a practical use case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description still adds value by summarizing the optional filters and noting the default limit (100). It reinforces parameter meaning without relying solely on the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List the dev's customer-intake projects (newest first)' with a specific verb and resource. It distinguishes from sibling tools like 'get_project' by focusing on listing multiple projects owned by the developer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises using this 'to triage what's in flight before opening a specific project', providing clear context. It does not explicitly exclude alternative tools or mention when not to use it, but the use case is well-defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_secretsList secret names for a siteA
Read-only
Inspect

List secret names (values not returned) for a site's deployed function.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug whose secret names to list

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true; description adds that values are not returned, enhancing transparency without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no extraneous information; efficiently conveys key facts.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with one param and output schema present; description adequately covers context for selection and use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and already describes 'slug'; description adds no new parameter meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly specifies verb 'list', resource 'secret names', and notes 'values not returned'. Distinguished from set_secret and remove_secret among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implicitly indicates usage for listing names only, but no explicit guidance on when to use vs alternatives like set_secret.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_sending_domainsList my sending domainsA
Read-only
Inspect

Return every BYO sending domain the user has added (id, domain, fromAddress, status: pending|verified|failed, DNS records). Use to inspect verification state or find the id of a domain to verify/remove.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating this is a safe read operation. The description adds detail about what fields are returned and the status values but does not disclose additional behavioral traits such as rate limits or data freshness. The score is appropriate as the description adds value but does not go beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the main purpose and includes key details (fields, statuses, usage). It is concise and informative, earning its place without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters, has annotations, and an output schema (implied by listing fields), the description is complete. It covers what the tool does, what it returns, and why to use it, making it fully adequate for this simple list operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has zero parameters, and schema description coverage is 100%. The description does not need to add parameter semantics since there are none. Baseline for zero parameters is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns every BYO sending domain with specific fields (id, domain, fromAddress, status, DNS records) and provides status values. It distinguishes itself from sibling tools like verify_sending_domain and remove_sending_domain by focusing on inspection and ID retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit usage guidance: 'Use to inspect verification state or find the id of a domain to verify/remove.' This tells the agent when to use this tool (inspection) and what to do with the output (find id for subsequent operations). It does not explicitly mention when not to use it, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_site_inboxList a site's email inbox threadsA
Read-only
Inspect

Read the email threads (received, sent, web captures) for the agent's sites. Optionally scoped to one site by slug.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugNolimit to one site by slug; omit for all sites

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond the 'readOnlyHint' annotation by specifying the type of data accessed (email threads including received, sent, web captures). This informs the agent about the scope of data, which is valuable despite the annotation already indicating read-only behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the action and scope, and efficiently covers the optional parameter in the second sentence. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description does not need to explain return values. It fully specifies the data source ('email threads for the agent's sites') and optional filtering, making it complete for a read-only list operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers the parameter 'slug' with a clear description, and the tool description reiterates the optional scoping. With 100% schema coverage, the description adds no new parameter semantics, achieving the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Read') and resource ('email threads'), and clarifies the scope ('for the agent's sites') and types of threads (received, sent, web captures). It effectively distinguishes this tool from siblings like 'list_inbox' by specifying the context of sites.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly states the tool's purpose and optional scoping by slug, but does not provide explicit guidance on when to use this tool versus alternatives (e.g., 'list_inbox' or 'read_thread'). Usage context is implied but not contrasted with sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_sitesList my sitesA
Read-only
Inspect

List the sites owned by this API key.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true. The description adds no further behavioral details such as pagination, rate limits, or limitations beyond listing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, front-loaded sentence with no waste. Every word is necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity (no parameters, output schema exists), the description sufficiently covers the tool's purpose and scope.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist in the schema, and the description correctly omits parameter details, aligning with the baseline for 0 parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list), resource (sites), and scope (owned by this API key), distinguishing it from sibling tools like get_site or delete_site.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like list_projects or list_drives, nor when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_suppressionsList suppressed emails / domainsA
Read-only
Inspect

Return the user's suppression list — addresses (and full domains) that shiply skips when sending. Bounces, complaints, manual user adds, and AI-detected unsubscribes all land here.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description adds behavioral context: 'shiply skips when sending' indicates the list affects sending behavior. It also enumerates sources of suppressions (bounces, complaints, etc.), which is useful beyond the annotation. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences delivering high-density information: first sentence states action and scope, second sentence details content. No filler, front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no parameters, the output schema exists to document return structure. The description covers the content of the suppression list comprehensively, including types of entries. This is complete for a read-only list tool with a well-defined schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters (schema description coverage 100%), so the description carries no param burden. Baseline for 0 parameters is 4, and the description does not need to add param-specific meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Return the user's suppression list' with specific verb 'Return' and resource 'suppression list'. It details the types of suppressions (bounces, complaints, manual adds, AI-detected unsubscribes), making the purpose clear and distinguishable from siblings like 'list_unsubscribes' (which focuses only on unsubscribes) and 'add_suppression' or 'remove_suppression'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for viewing all suppressions that affect sending, but lacks explicit guidance on when to use this tool versus alternatives (e.g., 'list_unsubscribes' or 'list_complaints'). No exclusions or prerequisites are stated, relying on the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_test_inbox_addressesList per-test inbox addressesA
Read-only
Inspect

Return every active demand test's reply / inbound address (@). Use this when you need to TELL someone where to email — e.g. drafting a reply or sharing a test's contact address. Mail sent to these aliases lands in /dashboard/inbox tied to the test.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true. The description adds behavioral context beyond that: it notes that mail sent to these aliases lands in /dashboard/inbox tied to the test. This gives the agent useful information about the side effects of sending mail, which is not covered by annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences. The first sentence clearly states the action and output format. The second adds usage guidance and a behavioral note. No redundant words; every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

An output schema exists, so the return format is already defined. The description covers purpose, usage, output format, and behavioral context. For a zero-parameter tool with good schema coverage, this is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are zero parameters, so schema coverage is 100%. The description adds value by explaining the output format (<slug>@<sitesDomain>) and the use case, which is sufficient for a parameterless tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool returns every active demand test's reply/inbound address with format (<slug>@<sitesDomain>). It clearly distinguishes from sibling tools like list_inbox or list_mailbox_contacts by focusing on per-test inbox addresses.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use this when you need to TELL someone where to email — e.g. drafting a reply or sharing a test's contact address.' This provides explicit when-to-use guidance. It could be improved by mentioning when not to use it (e.g., for reading emails), but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_testsList demand testsA
Read-only
Inspect

List your demand tests with signups + confirmed counts.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description's 'List' is consistent but adds no new behavioral insight beyond the annotation safety profile.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single concise sentence, front-loaded with action and resource, no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no inputs and presence of an output schema, the description sufficiently tells the agent what to expect: it lists tests with counts. No missing context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With zero parameters, the description adds value by specifying the output includes 'signups + confirmed counts', which is not in the schema. Schema coverage is trivially 100%, baseline 3, but the description provides useful semantics beyond.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The title and description clearly state the tool lists demand tests with signup and confirmed counts. It distinguishes from sibling tools like 'list_test_inbox_addresses' and 'get_test_status'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description purely states function without context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_unsubscribesList unsubscribesA
Read-only
Inspect

Shortcut for list_inbox with filter=unsubscribes — shows every thread tagged as an opt-out request.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax threads to return, ≤200

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds that it shows threads tagged as opt-out requests and is a shortcut, implying a call to list_inbox. No contradictions, and additional context is helpful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the key information. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one optional param, output schema present, good annotations), the description fully covers its behavior and relationship to list_inbox. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the limit parameter, with description in schema. The tool description does not add further meaning beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: it's a shortcut for list_inbox with filter=unsubscribes, showing threads tagged as opt-out requests. This directly distinguishes it from the sibling list_inbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage: use this instead of list_inbox when you want only opt-out requests. It names the alternative (list_inbox) but doesn't explicitly say when not to use it or other constraints.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_variablesList variablesA
Read-only
Inspect

List the encrypted variables. Values are masked unless reveal=true.

ParametersJSON Schema
NameRequiredDescriptionDefault
revealNoreturn plaintext values instead of masked

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating a read operation. The description adds transparency about the masking behavior ('Values are masked unless reveal=true'), which is not covered by annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that conveys the tool's purpose and key behavior without extraneous words. It is front-loaded with the verb and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and the presence of an output schema, the description sufficiently covers the core behavior. However, it omits details about pagination, ordering, or filtering that might be relevant for a list operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (the 'reveal' parameter is fully described in the schema). The description reinforces the parameter's purpose by stating the default masking behavior, but adds minimal new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List') and the resource ('encrypted variables'), and distinguishes behavior based on the 'reveal' parameter. It differentiates from sibling tools like 'list_secrets' and 'delete_variable' by specifying encryption and masking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing encrypted variables, but does not explicitly state when to use this tool versus alternatives like 'list_secrets' or 'set_variable'. No guidance on exclusions or prerequisites is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mark_thread_readMark a thread readA
Idempotent
Inspect

Zero the unread counter for a thread. Useful after the agent has read but not acted. read_thread already calls this implicitly; use this explicitly when you want to clear unread without re-fetching the thread body.

ParametersJSON Schema
NameRequiredDescriptionDefault
threadIdYesthread id from list_inbox to mark read

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include idempotentHint: true. The description adds that it zeros the unread counter and that read_thread already calls this implicitly, clarifying it is a lightweight alternative. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the action, and contains no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool is simple (1 parameter, idempotent, with output schema), the description covers purpose, usage, parameter origin, and behavioral context. It is fully adequate for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers 100% of parameters. The description adds context that the threadId comes from list_inbox, which enriches the schema description and helps the agent understand the parameter's source.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool zeros the unread counter for a thread. It distinguishes itself from the sibling tool 'read_thread' by noting that 'read_thread already calls this implicitly', making the purpose distinct and clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: use when the agent has read but not acted, and when you want to clear unread without re-fetching the thread body. It also tells when not to use (if read_thread is already used).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

promote_sitePromote a preview to a production siteA
Idempotent
Inspect

Copy the EXACT live bytes of one owned site (srcSlug — your preview) into another owned site (destSlug — your production site / custom domain), no rebuild. Dest keeps its slug, domains, and access settings; only the served bytes change.

ParametersJSON Schema
NameRequiredDescriptionDefault
srcSlugYes
destSlugYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the idempotentHint annotation by explaining that only served bytes change, dest settings are preserved, and no rebuild occurs. This aligns with idempotency and clarifies the non-destructive nature on metadata.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that packs all necessary information without redundancy. It is front-loaded with the key action and scoping, making it efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately covers the operation's behavior, parameters, and constraints. It misses potential details like size limits or immediacy, but overall is sufficient for selecting and invoking the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully explains both parameters: srcSlug is the preview site, destSlug is the production site/custom domain, and both must be owned. This provides clear meaning beyond the schema field names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool copies exact bytes from one owned site to another, distinguishing it as a promotion from preview to production without rebuild. It explicitly differentiates from siblings like publish_site and publish_from_drive by specifying 'no rebuild' and preserving destination settings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for use: promoting a preview to a production site while keeping slug, domains, and access. It implies the user must own both sites. However, it does not explicitly mention when not to use it or list alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

publish_from_drivePublish a drive as a siteA
Idempotent
Inspect

Snapshot a Drive (or a prefix of it) into a new live site at .shiply.now. Files copied server-side.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNodisplay title for the new site
prefixNoonly snapshot files under this path prefix
driveIdYesdrive id (drv_…) or "default"

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include idempotentHint: true, and the description adds that files are 'copied server-side', which clarifies the operation. However, it does not disclose whether the drive is modified, what happens if the slug already exists, or any permission requirements. The description adds some value but lacks full behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence of 15 words with no fluff. Every word contributes to the purpose. Ideal conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema existing, return values are covered. However, the description omits how the slug is determined (it is not a parameter), prerequisites (e.g., does the drive need to exist?), and the nature of the 'new live site' (e.g., is it public by default?). These gaps reduce completeness for a tool that creates a complex resource.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The tool description adds value by explaining the overall context (site created at a specific subdomain) and reinforcing the prefix parameter's role in limiting the snapshot scope. This exceeds the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action: 'Snapshot a Drive (or a prefix of it) into a new live site'. The verb 'snapshot' and the specific resource (Drive) make the purpose unambiguous, and it distinguishes itself from sibling 'publish_site' by specifying the source is a drive snapshot.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool should be used when you want to create a live site from a drive snapshot, but it does not provide explicit guidance on when not to use it or mention alternative tools like 'publish_site'. Usage context is only implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

publish_sitePublish a siteA
Idempotent
Inspect

Publish files to the web → live URL at .shiply.now. UPDATING: never create a new site for changes — re-call with claimToken (anonymous sites) or slug (sites you own with a Bearer key) and the SAME URL gets the new version (unchanged files are hash-skipped). Works WITHOUT auth (anonymous: 24h lifetime, returns claimToken/claimUrl — SAVE THEM). With a Bearer shp_ key sites are permanent. ≤50 files / 2 MB inline; bigger: REST flow per https://shiply.now/llms.txt. index.html serves at /. spaMode for client-side routing.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugNoUPDATE an existing site you own (requires Bearer key)
filesYessite files; index.html required for a homepage
titleNodisplay title for the site
spaModeNoserve index.html for unknown paths (client-side routing)
claimTokenNoUPDATE an existing anonymous site (from the original publish result)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Disclosures beyond idempotentHint include authentication requirements, site lifetime (24h vs permanent), hash-skipping for unchanged files, and necessity to save claimToken/claimUrl. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Dense but well-structured: starts with purpose, uses uppercase 'UPDATING' to highlight key behavior, then covers auth, limits, and spaMode. Every sentence adds value, no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers auth modes, update mechanism, limits, and special features. Lacks explicit output details, but output schema likely covers that. Slightly incomplete on error handling, but sufficient for agent decision.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The tool description adds context beyond schema, explaining the roles of claimToken and slug in updates, and the effect of spaMode. Adds meaningful semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action 'Publish files to the web' and identifies the result (live URL). Distinguishes between creating new sites and updating existing ones, differentiating from sibling tools like delete_site or get_site.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use anonymous vs authenticated, how to update (re-call with claimToken or slug), limits (≤50 files/2 MB), and alternative for larger files (REST flow). Also mentions spaMode for client-side routing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_threadRead an inbox threadA
Read-only
Inspect

Return the thread metadata + all messages in chronological order. Use list_inbox first to get a threadId.

ParametersJSON Schema
NameRequiredDescriptionDefault
threadIdYesthread id from list_inbox

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description does not need to restate that. The description adds no additional behavioral context (e.g., pagination, limits, or non-modification). It is adequate but does not exceed what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first describes output, second gives prerequisite. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description covers the essential points: what is returned and how to get the input. Could mention error handling or validation, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the parameter description already states 'thread id from list_inbox'. The tool description reinforces this with 'Use list_inbox first to get a threadId', adding slight marginal value. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns thread metadata and messages in chronological order. It is specific but does not explicitly differentiate from sibling tools like summarize_thread or forward_thread, though its purpose is distinct enough.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit instruction to use list_inbox first to get a threadId, establishing a clear prerequisite. However, it does not specify when to use this tool over alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

refund_orderRefund one of my salesA
Destructive
Inspect

Issue a full refund on a paid order the user sold. Must be inside the 30-day refund window (server enforces). Triggers a Stripe refund; the webhook flips the order to 'refunded' and reverts site ownership to the seller. Idempotent on already-refunded orders.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNooptional refund reason
orderIdYesid of the paid order to refund (from list_my_sales)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (destructiveHint=true, openWorldHint=true), the description details that a Stripe refund is triggered and the order status flips to 'refunded' with ownership reverted. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences front-loading main action, constraints, and effects. No unnecessary words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (financial mutation, webhook, idempotency) and presence of output schema, the description fully covers what the agent needs to know: purpose, constraints, side effects, and idempotency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add extra parameter details beyond the schema, which already documents orderId and reason sufficiently.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool issues a full refund on a paid order sold by the user, with a specific verb and resource. It distinguishes from siblings like 'delete_listing' which deletes a listing, not refunds.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the 30-day refund window constraint and idempotency on already-refunded orders, which guides usage. No explicit when-not-to-use or alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

regenerate_briefRe-run AI brief generationAInspect

Re-run the MiniMax/Anthropic brief generator from the project's current intake_responses and persist the result. Flips status to brief_ready on success or brief_failed on error. Use after the customer edits answers post-submit, or when the first AI attempt failed.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesproject id to re-run the brief for

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool triggers a state change (status flip to brief_ready or brief_failed) and uses AI models (MiniMax/Anthropic). Annotations already indicate non-idempotent and non-readonly, so the description adds value by specifying the effect and the source of data. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, each adding value. First sentence states the action and resource, second provides usage context and status behavior. No redundant information. Front-loaded with the key verb and noun.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, no need to describe return values. The description covers purpose, usage, and key behavioral changes. It could be enhanced by mentioning permission requirements or potential side effects, but it's sufficient for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already describes the single 'id' parameter as 'project id to re-run the brief for' with format details. The description does not add extra semantic meaning beyond confirming the parameter is a project id. Schema coverage is 100%, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the action 'Re-run the MiniMax/Anthropic brief generator' and the resource 'from the project's current intake_responses', distinguishing it from siblings like 'update_brief' which is a general update. It also mentions the persistence and status flipping, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Use after the customer edits answers post-submit, or when the first AI attempt failed.' This provides clear context. While it doesn't list when not to use, it covers the main scenarios adequately.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_cronRemove a cron triggerA
Destructive
Inspect

Remove a cron trigger from a site's deployed function.

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYesURL path of the cron to remove
slugYessite slug whose cron to remove

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already set destructiveHint=true, so the description's 'remove' aligns but adds no new behavioral context (e.g., side effects, permissions). With annotations, a 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no redundancy, front-loaded with key info. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal tool, the description is adequate given the schema and output schema exist. Could mention irreversibility or effects on the function, but not necessary for clear operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameter descriptions ('slug' and 'path' are described). The description does not add meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('remove') and the resource ('cron trigger') with context ('from a site's deployed function'). It distinguishes from sibling tools like set_cron (add/update) and list_crons (list).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use vs alternatives, but the tool name and sibling list make it obvious that this is for removal only. Implicitly clear but lacks explicit conditions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_custom_domainRemove a custom domainA
Destructive
Inspect

Remove a registered custom domain and all its subdomains; they stop serving immediately.

ParametersJSON Schema
NameRequiredDescriptionDefault
domainYesregistered custom domain to remove, e.g. example.com

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the destructiveHint annotation, the description adds that removal is immediate and affects all subdomains, providing valuable behavioral context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that conveys the essential information without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description sufficiently explains the tool's destructive and cascading behavior, though more detail about the removal process could be added.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema description already explains the parameter. The tool description does not add extra meaning beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the verb 'Remove' and the resource 'registered custom domain and all its subdomains', clearly distinguishing from sibling tools like 'remove_domain' or 'remove_sending_domain'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states when to use the tool (to remove a custom domain and its subdomains) but does not provide explicit exclusions or alternatives, though the context of sibling tools implies differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_domainDisconnect a custom domainA
Destructive
Inspect

Remove a connected domain (by id). It stops serving immediately.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesconnected domain id from list_domains

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond the annotation: it states the domain 'stops serving immediately', which is a key side effect. The annotation destructiveHint=true confirms destruction, and the description clarifies the immediacy. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences, no fluff. It front-loads the action and adds a critical behavioral detail. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 parameter, destructive, immediate effect), the description covers all essential behavioral and usage aspects. No output schema interpretation needed. It is fully complete for the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the required 'id' parameter: 'connected domain id from list_domains'. The description adds no further parameter details, but the schema already provides sufficient meaning. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Remove a connected domain (by id)') and the resource (connected domain). It distinguishes from sibling tools like add_custom_domain and list_custom_domains, though there is a similar sibling remove_custom_domain. The title 'Disconnect a custom domain' aligns with the description.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal usage guidance: it specifies the domain is removed by id from list_domains. However, it fails to differentiate when to use remove_domain versus the similar sibling remove_custom_domain, nor does it mention prerequisites or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_functionRemove the deployed function from a siteA
Destructive
Inspect

Remove the deployed Worker function from a site (and all its routes, secrets, and cron triggers). Site falls back to static-only serving. Irreversible.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug whose function to remove

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint=true, and the description adds value by explicitly stating 'Irreversible' and detailing what is removed (routes, secrets, cron triggers), as well as the fallback behavior, providing useful context beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences, front-loaded with the action, and contains no unnecessary words. Every sentence contributes meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the destructive nature and the presence of an output schema (not shown), the description fully covers the tool's effect and consequences, including irreversible changes and fallback behavior, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 100% and only one parameter (slug) clearly described in the schema, the description does not add additional meaning beyond what the schema provides, resulting in a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Remove', the resource 'deployed Worker function', and the scope 'from a site', including all associated elements (routes, secrets, cron triggers). It differentiates from sibling tools like deploy_function and remove_secret.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by noting the site falls back to static-only serving, but it does not explicitly state when to use this tool versus alternatives like remove_secret or set_cron, nor does it provide any when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_secretRemove a secret from a siteA
Destructive
Inspect

Remove a secret from a site's deployed function. The binding disappears on next request.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYessecret name to remove
slugYessite slug whose function holds the secret

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark destructiveHint=true. The description adds behavioral detail: the binding disappears on next request, which clarifies the timing of the effect beyond simple destruction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states the action, second clarifies the effect. No irrelevant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal with two parameters and an output schema, the description is sufficient. It covers the action, scope, and timing, though lacks mention of permissions or reversibility.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are documented in the schema. The description adds no additional meaning beyond what is already in the input schema properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (remove) and the resource (secret from a site's deployed function), distinguishing it from siblings like set_secret and list_secrets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like set_secret or list_secrets. Usage is implied but not clarified with exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_sending_domainRemove a sending domainA
Destructive
Inspect

Delete a BYO sending domain. Any demand tests bound to it fall back to the managed shiply sender. Also GCs the underlying Resend domain. Irreversible — re-adding requires re-verifying DNS.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYessending domain id from list_sending_domains

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond the destructiveHint annotation by explaining specific consequences: bound demand tests fall back to managed sender, underlying Resend domain is garbage collected, and the operation is irreversible. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: first states the core action, second explains side effects, third emphasizes irreversibility. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has one parameter and an output schema, the description covers the destructive nature, side effects, and consequences. It is complete for a simple delete tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'id' is fully documented in the schema (100% coverage) with a clear description from list_sending_domains. The description adds no additional meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete') and the resource ('BYO sending domain'). It distinguishes from sibling tools like add_sending_domain and verify_sending_domain by focusing on deletion and side effects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes that the action is irreversible and that re-adding requires re-verifying DNS, which guides appropriate usage. It does not explicitly list alternative tools but the context is clear for a destructive action.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_suppressionRemove from suppression listA
Destructive
Inspect

Delete one suppression by id. Use list_suppressions first to find the id.

ParametersJSON Schema
NameRequiredDescriptionDefault
suppressionIdYessuppression id from list_suppressions

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include destructiveHint: true, which the description confirms. It adds the context of needing to find the id via list_suppressions. No other behavioral traits are disclosed, but the destructive nature is covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words, front-loaded with essential action. Efficient and to the point.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and an existing output schema, the description is complete. It explains the prerequisite and the action, leaving no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds value by explaining that the suppressionId comes from list_suppressions, clarifying the parameter's origin beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The name and title clearly indicate removing a suppression, and the description explicitly states 'Delete one suppression by id.' It distinguishes from sibling tools like add_suppression and list_suppressions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use list_suppressions first to find the id,' providing clear guidance on prerequisite. It lacks explicit when-not-to-use or alternatives, but is sufficient for the simple tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reply_to_threadReply to an inbox threadAInspect

Send an email reply on an existing thread. Goes FROM the original recipient address (the @shiply.now alias the sender used) and TO the original sender. Threaded via RFC 5322 In-Reply-To so Gmail/Outlook group it with the original. Subject defaults to 'Re: ' when omitted. Cap at 20,000 chars. Use after read_thread to make sure you're replying to the right conversation.

ParametersJSON Schema
NameRequiredDescriptionDefault
bodyYesreply body, ≤20,000 chars
subjectNosubject; defaults to 'Re: <original>'
threadIdYesthread id from list_inbox to reply on

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide openWorldHint and idempotentHint. The description adds significant behavioral details: FROM/TO routing, RFC 5322 threading, subject default, and 20,000 char limit. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is 5 sentences, each adding distinct value: purpose, FROM/TO, threading, subject default, char limit, and usage advice. No fluff or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 parameters, output schema exists, and annotations present, the description covers behavior, preconditions, constraints, and usage sequence. It could mention error handling or response format, but output schema likely covers that. Complete enough for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so all parameters are documented. The description reiterates the char limit and subject default but does not add new parameter-specific meaning beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Send an email reply on an existing thread', which is a specific verb+resource. It distinguishes from sibling tools like send_email (new email) and forward_thread (forwarding) by detailing FROM/TO and threading behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use after read_thread to make sure you're replying to the right conversation', providing a clear precondition. It implies when to use (replying to an existing thread) but doesn't explicitly state when not to use, which is acceptable given context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resend_confirmationResend a confirmation emailAInspect

Re-send the double-opt-in confirmation to a signup that has not confirmed yet.

ParametersJSON Schema
NameRequiredDescriptionDefault
emailYesthe unconfirmed signup email to re-send to
testIdYesdemand test id the signup belongs to

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds context that the tool sends an email (side effect) and applies only to unconfirmed signups. Annotations indicate non-idempotent and open-world, which aligns. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 14 words, no superfluous information. Efficiently communicates purpose and condition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple email resend with two parameters and an output schema, the description covers the essential precondition. Could mention possible error states (e.g., already confirmed) but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already describes both parameters with 100% coverage. The tool description does not add additional meaning beyond the schema descriptions, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 're-send' and resource 'confirmation email', with clear condition 'to a signup that has not confirmed yet'. This distinguishes it from sibling like 'resend_intake_invite'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly states the tool applies only to unconfirmed signups, giving clear precondition. However, it does not mention alternative tools or what to do if confirmation already occurred.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resend_intake_inviteRe-send the customer intake invite emailAInspect

Re-fire the 'your developer sent you a project intake' email to the project's customer. Throws invalid_request if the project has no customerEmail (the dev needs to set one via the dashboard first — customer email isn't agent-patchable). Use when the customer says they didn't receive the link.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesproject id to re-send the intake invite for

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that it throws invalid_request if no customerEmail and that customer email isn't agent-patchable. No annotation contradiction; adds context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, zero waste, all essential info included.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is simple (one param), output schema exists, description covers purpose, usage, limitation. Complete for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with description for 'id'. Tool description does not add extra meaning beyond schema; baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it re-fires the intake invite email to the customer. The verb 're-fire' and specific email name differentiate it from siblings; no sibling similarly re-sends invites.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use ('when the customer says they didn't receive the link') and mentions a prerequisite (customerEmail must be set). Could note alternatives but none obvious.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

restore_projectRestore an archived projectA
Idempotent
Inspect

Move an archived project back to status='draft' so it reappears in the active list. Idempotent on non-archived projects? No — server rejects the transition unless the project is currently archived.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesarchived project id to restore

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint true; the description adds nuance by clarifying idempotency only applies when project is archived, and notes the rejection behavior. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action and effect, no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one required parameter and an output schema, the description fully explains behavior, constraints, and idempotency nuance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of the single parameter with a clear description. The tool description adds no extra meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Move' and resource 'archived project', specifying the effect: status becomes 'draft' and reappears in active list. It distinguishes from archive_project sibling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (for archived projects) and notes the server rejects non-archived, but does not explicitly list alternatives or when not to use. Still clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rollback_siteRoll back a siteA
Idempotent
Inspect

Re-point a site to any finalized version (rollback or roll-forward). Get version ids from get_site. Serving updates immediately.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug to re-point
versionIdYesfinalized version id from get_site

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the idempotentHint annotation: 'Serving updates immediately' indicates real-time effect. It also clarifies that the tool can roll forward or backward. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences, front-loading the purpose and then providing a usage hint and behavioral note. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description fully covers what the tool does, how to use it (prerequisite get_site), and the immediate effect (updates instantly). With an output schema present, return values need not be described. The tool is simple and the description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description adds value by telling the agent where to get versionId ('from get_site'), which is a practical usage tip beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Re-point a site to any finalized version (rollback or roll-forward).' It uses a specific verb ('re-point') and resource ('site'), and distinguishes this tool from siblings like publish_site or duplicate_site by focusing on version manipulation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear usage hint: 'Get version ids from get_site.' This helps the agent know where to obtain the versionId parameter. However, it does not explicitly state when not to use this tool or compare it with alternatives like publish_site, which would be needed for a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_broadcastBroadcast to confirmed subscribersAInspect

Send a campaign to this test's confirmed (double-opt-in) subscribers — the "we're live" email. An unsubscribe link is added automatically. Fails if there are no confirmed subscribers yet.

ParametersJSON Schema
NameRequiredDescriptionDefault
htmlYesemail HTML body (unsubscribe link added automatically)
textNoplain-text fallback body
testIdYesdemand test id whose confirmed subscribers to email
subjectYesemail subject line

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (only openWorldHint true, idempotent false). The description adds value by disclosing the automatic addition of an unsubscribe link and the failure condition when no subscribers exist. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, efficient and front-loaded with the core purpose. No superfluous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (sending a broadcast campaign) and presence of an output schema (not shown but exist), the description adequately covers key contexts: target audience, automatic unsubscribe, failure mode. Could mention the non-idempotent nature implied by annotations, but that's already annotated.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover all 4 parameters with high quality (100% coverage). The description does not add further parameter-level detail beyond the schema, so it meets the baseline but does not exceed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (send campaign) and the target (confirmed subscribers of a test). Differentiates from sibling tools like send_email or send_mailbox_broadcast by specifying the context of a test's confirmed subscribers and the 'we're live' email.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a condition for use: fails if no confirmed subscribers. However, it does not explicitly contrast with alternatives or state when not to use, leaving some room for ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_emailSend an email from a siteAInspect

Send an email from .shiply.now's managed sender. The site can send transactional/notification email — no SMTP setup. Rate-limited and spam-checked; replies route back to the site inbox.

ParametersJSON Schema
NameRequiredDescriptionDefault
toYesrecipient email address
htmlYesemail HTML body
slugYessite slug to send from (<slug>.shiply.now sender)
textNoplain-text fallback body
subjectYesemail subject line

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide openWorldHint=true and idempotentHint=false, but the description adds valuable context: rate-limited, spam-checked, and replies route to site inbox. This goes beyond annotation constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The description is front-loaded with the action and sender, then provides additional constraints efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the sender, type of email, limitations, and reply routing. With an output schema present, it doesn't need to explain return values. Could mention attachment support or size limits, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description ties the slug parameter to the sender address format ('<slug>.shiply.now sender'), but adds no other insight beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'send' and the resource 'email from a managed sender'. It distinguishes from sibling broadcast tools by mentioning 'transactional/notification email', but does not explicitly name alternatives like send_broadcast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for transactional/notification emails by stating 'no SMTP setup' and mentions rate-limiting and spam-checking. However, it does not provide explicit guidance on when to use this tool vs. alternatives (e.g., send_broadcast), missing potential when-not or contrast.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_mailbox_broadcastBroadcast to a mailbox's confirmed audienceAInspect

Send a one-shot broadcast to the confirmed (double-opt-in) subscribers of a (site, collection) mailbox. Spam-checked; unsubscribe footer auto-added. Fails if no confirmed subscribers exist yet.

ParametersJSON Schema
NameRequiredDescriptionDefault
htmlYesemail HTML body (unsubscribe footer added automatically)
slugYessite slug the mailbox belongs to
textNoplain-text fallback body
subjectYesemail subject line
collectionYesmailbox collection whose confirmed audience to email

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses behaviors beyond annotations: 'Spam-checked; unsubscribe footer auto-added' and 'one-shot'. No contradiction with annotations (openWorldHint, idempotentHint). Adds useful context about side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences, front-loaded with purpose. No wasted words; each sentence adds distinct value (purpose, behavior, failure condition).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, key behavioral traits, and failure case. Output schema exists to explain returns. Lacks mention of permissions or rate limits, but context signals don't show high complexity requiring more.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter described. Description adds no extra param info beyond schema. Baseline 3 is appropriate as the schema already defines semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'send', resource 'broadcast to confirmed subscribers', and scope '(site, collection) mailbox'. Distinguishes from siblings like send_broadcast and send_email by specifying it targets confirmed double-opt-in subscribers of a specific mailbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a clear condition for when the tool will fail ('Fails if no confirmed subscribers exist yet'), guiding the agent on prerequisites. However, lacks explicit comparison with sibling tools for alternative selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_cronSet or update a cron triggerA
Idempotent
Inspect

Set or update a cron trigger on a site's deployed function. Schedule is crontab syntax (UTC). Path is the URL the cron handler should fire on (for the worker's scheduled() handler context).

ParametersJSON Schema
NameRequiredDescriptionDefault
pathYesURL path the cron handler fires on
slugYessite slug whose function gets the cron
scheduleYescrontab schedule in UTC, e.g. "0 * * * *"

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Idempotent annotation is reinforced by description (set or update). Adds context about cron behavior (fires on URL path for scheduled() handler). No contradictions. Describes effects beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action, then parameter details. No redundant phrases. Every sentence is informative and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple setter tool. Does not explain output schema or error cases, but given low complexity and good annotations, it is sufficiently complete for an agent to understand usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. Description adds minor elaboration (e.g., 'for worker's scheduled() handler context') but largely restates schema. Baseline 3 with marginal added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (set/update), resource (cron trigger on site's deployed function), and specific details (crontab syntax, UTC, path for handler). Distinguishes from siblings like remove_cron and list_crons.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly clear that this is for setting/updating crons, but does not explicitly mention when to use alternative tools (remove_cron, list_crons). Lacks exclusions but provides sufficient context for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_handleSet a vanity handleA
Idempotent
Inspect

Rename a site to .shiply.now (3-30 chars, a-z 0-9 -). The old address 301-redirects for 30 days.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYescurrent site slug
handleYesnew vanity handle, 3-30 chars a-z 0-9 -

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotentHint=true. The description adds that the old address 301-redirects for 30 days, which is valuable behavior beyond idempotency. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, information-dense sentence. It front-loads the action ('Rename a site to...') and includes all necessary details without wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple rename operation, the description covers the effect, behavior (redirect), and constraints. Combined with idempotent annotation and output schema (present but not shown), it is fully sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema descriptions cover both parameters fully (100% coverage). The description adds the domain suffix pattern (<handle>.shiply.now) and reiterates constraints, providing extra context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool renames a site to a vanity handle under shiply.now, with specific character constraints. It distinguishes from sibling tools like add_custom_domain or set_link by focusing on the built-in subdomain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to set a vanity handle) but does not explicitly mention when not to use or provide alternatives among siblings. Usage context is clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_mailboxConfigure a site collection's email behaviorA
Idempotent
Inspect

Turn a Site Data collection into a mailbox: double opt-in, owner notifications, sending domain, branding. Call once per (site, collection) to configure how captured leads are handled.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYessite slug the collection belongs to
brandingNobranding applied to mailbox emails
notifyToNoaddress to send owner notifications to
collectionYesSite Data collection to turn into a mailbox
doubleOptInNorequire email confirmation before a contact is active
notifyOwnerNoemail the owner on each new capture
sendingDomainIdNoBYO sending domain id to send from

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare idempotentHint: true, and the description reinforces safety with 'Call once per...'. It adds context on what gets configured (opt-in, notifications, domain, branding) and how captured leads are handled, but doesn't detail immediate effects or auth requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences deliver the purpose and usage instruction. Every word is meaningful, and the key action is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 params, nested object, output schema exists), the description covers the essential purpose and usage. With output schema present, it does not need to explain return values. It is complete enough for an agent to understand when and how to invoke it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description groups parameters into features (double opt-in, owner notifications, etc.) but adds no additional meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Turn a Site Data collection into a mailbox' and lists key features like double opt-in, owner notifications, sending domain, and branding. It distinguishes from sibling tools like send_mailbox_broadcast by focusing on configuration, not sending.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Guidance is provided: 'Call once per (site, collection) to configure how captured leads are handled.' This implies the tool is for setup, but no explicit alternatives or when-not-to-use are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_primary_subdomainPick the canonical URL for a custom domainA
Idempotent
Inspect

Mark a hostname as the primary (canonical) URL for its site. Sibling hostnames (apex + www both pointed at the same site) start 301-redirecting to it, preserving path + query. The host-side fix for the duplicate-content SEO problem. The first subdomain you add for a site is primary by default; call this only when you need to switch.

ParametersJSON Schema
NameRequiredDescriptionDefault
domainYesregistered custom domain, e.g. example.com
hostnameYesfull hostname to make primary, e.g. www.example.com

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses that sibling hostnames start 301-redirecting to the primary, preserving path and query. This adds behavioral detail beyond the idempotentHint annotation. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with only a few sentences. The key action is front-loaded, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the description covers the purpose, use case, default behavior, and effects. It is complete for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are well-documented in the schema. The description does not add significant semantic information beyond the schema, but it does reinforce the context (e.g., 'registered custom domain', 'full hostname').

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Mark a hostname as the primary (canonical) URL for its site.' It uses specific verbs and resource, and distinguishes from siblings by explaining the SEO duplicate-content problem and when to use this tool versus the default behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to call this tool: 'call this only when you need to switch.' It explains the default (first subdomain is primary) and the effect (301 redirects with path+query preservation), giving clear context for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_profileSet up a public profileA
Idempotent
Inspect

Create or update the user's public portfolio at .shiply.now (handle 3-30 chars a-z0-9-). enable shows the profile; autoAdd auto-lists new sites. Use after publishing to give the user a shareable portfolio.

ParametersJSON Schema
NameRequiredDescriptionDefault
enableNoshow (true) or hide (false) the public profile
handleNopublic handle, 3-30 chars a-z 0-9 -; profile lives at <handle>.shiply.now
autoAddNoauto-list newly published sites on the profile

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint=true. Description adds that the tool does upsert ('create or update') and explains URL structure, adding behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: action+constraint, parameter explanations, usage timing. No redundant information; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all key aspects: parameter meanings, handle format, URL structure, and use case. Output schema exists, so return values not required in description. Could add more about idempotency implications, but currently sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage 100% with descriptions. Description adds behavioral meaning: 'enable shows the profile; autoAdd auto-lists new sites,' which goes beyond the schema's simple type description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Create or update the user's public portfolio' with specific resource (public profile/portfolio) and verb (create/update). Distinguishes from siblings like set_handle by focusing on portfolio rather than just handle.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit timing: 'Use after publishing to give the user a shareable portfolio.' This guides when to use the tool. Does not explicitly mention alternatives or when not to use, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_secretSet a Worker secret on a siteA
Idempotent
Inspect

Set a CF Worker secret on a site's deployed function. Value is encrypted-at-rest and accessible as env. inside the worker. Use for Stripe keys, Resend API keys, etc.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYessecret name, UPPER_SNAKE_CASE; available as env.<NAME>
slugYessite slug whose function gets the secret
valueYessecret value (encrypted at rest), ≤8 KiB

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value over the idempotent hint by explaining encryption-at-rest and accessibility as env.<NAME>. It does not mention side effects like overwriting, but the annotation covers idempotency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. It front-loads the core action and then adds security and usage context efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, security, and usage. It does not mention output or error conditions, but an output schema exists. The idempotent hint implies overwriting behavior, so the description is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds some context (e.g., 'encrypted at rest' for value) but largely aligns with the already detailed schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (set a Worker secret) and the resource (secret on a site's deployed function). It distinguishes this tool from siblings like set_variable by emphasizing security and encryption.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage examples (Stripe keys, Resend API keys) and explains how the secret is used (env.<NAME>). However, it does not explicitly mention when not to use this tool or compare it to alternatives like set_variable, missing some guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_site_accessSet site access controlA
Idempotent
Inspect

Protect an owned site (paid plans). mode 'public' (anyone), 'password' (supply password), or 'restricted' (supply allowedEmails and/or allowedDomains — only those can request a login code). Changing any setting signs out existing visitors.

ParametersJSON Schema
NameRequiredDescriptionDefault
modeYespublic = anyone; password = supply password; restricted = supply allowedEmails/allowedDomains
slugYesowned site slug to protect
passwordNorequired when mode='password'
allowedEmailsNoallowlisted emails when mode='restricted'
allowedDomainsNoallowlisted email domains when mode='restricted'

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint=true. Description adds critical side effect: 'Changing any setting signs out existing visitors', which goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. First sentence covers purpose and modes; second sentence covers behavioral side effect. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (5 parameters, 3 modes, conditionals), description covers all essential aspects: purpose, mode usage, side effect. Output schema exists so return values are covered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage 100% already documents each parameter. Description reinforces conditional requirements (e.g., password required for mode='password'), adding clarity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'Protect' and resource 'owned site', clearly explains three modes with distinct purposes. Distinguishes from sibling tools like set_link or set_mailbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states requirements (paid plans) and when to use each mode ('public', 'password', 'restricted'). Does not explicitly state when not to use, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_variableSave an encrypted variableA
Idempotent
Inspect

Upsert a key/value in the user's encrypted variable store (UPPER_SNAKE name, ≤8 KiB value). Use for API keys the user's sites/agents need, e.g. SUPABASE_URL.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesvariable name, UPPER_SNAKE_CASE, e.g. SUPABASE_URL
valueYesvalue to store (encrypted, ≤8 KiB)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations: encrypted storage, naming convention, and size limit. The idempotentHint annotation is consistent with 'upsert'. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences covering all essential info. No wasted words, front-loaded with key details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description does not need to explain return values. It fully covers purpose, constraints, and usage context. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description adds meaning beyond schema fields: naming convention (UPPER_SNAKE_CASE), example (SUPABASE_URL), and encryption/size details for value. This provides valuable context for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'upsert' and the resource 'encrypted variable store'. It includes naming convention (UPPER_SNAKE), size limit (≤8 KiB), and an example (SUPABASE_URL). It distinguishes from siblings like delete_variable and list_variables by specifying it's for storing encrypted key/value pairs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends use for 'API keys the user's sites/agents need', providing clear context. However, it does not contrast with the sibling tool set_secret (which likely has a similar purpose), so differentiation is not fully explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

site_statusSSL + readiness checkA
Read-only
Inspect

Check any shiply slug or custom hostname: TLS certificate (issuer, days left) + HTTPS probe. ready=true means live.

ParametersJSON Schema
NameRequiredDescriptionDefault
targetYesslug (my-site) or hostname (www.example.com)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating safe, non-destructive operation. Description adds behavioral detail: it performs a TLS check and HTTPS probe, and defines the 'ready=true' live condition. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with zero wasted words. Front-loaded with action and resource, immediately followed by specific checks and condition. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given readOnlyHint, openWorldHint, and an existing output schema, the description covers main outputs (TLS issuer, days left, HTTPS probe, ready flag). Lacks mention of error conditions or performance, but sufficient for a simple check tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter 'target' is fully described in schema (100% coverage) with examples: 'slug (my-site) or hostname (www.example.com)'. Description echoes this but adds no new semantic beyond schema, meeting baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'Check' with clear resource ('shiply slug or custom hostname') and details output: TLS certificate (issuer, days left) + HTTPS probe. Distinguishes from sibling tools like 'check_domain' by specifying exact checks and the 'ready=true' flag.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for checking SSL readiness but does not provide explicit when-to-use or when-not-to-use guidance, nor compares to alternatives like 'check_custom_domain' or 'check_domain'. Only states what it does.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

summarize_threadAI-summarize an inbox threadA
Read-only
Inspect

One-paragraph summary of the thread, oriented to the most actionable signal (interest, complaint, question, unsubscribe).

ParametersJSON Schema
NameRequiredDescriptionDefault
threadIdYesthread id from list_inbox

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint, so the description need not repeat that. It adds value by describing the output orientation (actionable signal), which is beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, well-structured sentence that is front-loaded and contains no unnecessary words. Every part of the description serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool with one parameter, full schema coverage, and an output schema, the description is mostly complete. However, it could explicitly note that the summary is AI-generated, which is implied but not stated.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already covers the single parameter threadId with a description linking it to list_inbox, achieving 100% coverage. The description adds no additional parameter details, so score is at baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it produces a one-paragraph summary of a thread oriented to actionable signals. It uses specific verbs and resources, distinguishing it from siblings like read_thread which returns full content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives such as read_thread or archive_thread. The context implies usage for quick actionable insights, but lacks explicit when-to-use or when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sync_dnsSync DNS recordsA
Idempotent
Inspect

For a connected custom domain, (re)write the CNAME records for all its subdomains automatically and report what changed. For unconnected domains, returns the records to add manually.

ParametersJSON Schema
NameRequiredDescriptionDefault
domainYesregistered custom domain to sync, e.g. example.com

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotent and open world. Description adds that it rewrites CNAME records and reports changes, which aligns well. It doesn't contradict annotations and provides useful behavioral context beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with main action, no extraneous information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and existing output schema, the description sufficiently covers both scenarios and the action taken. No major gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers the single parameter with description. The tool description adds context (e.g., 'registered custom domain' and 'e.g. example.com') that enhances meaning beyond the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it syncs DNS (CNAME records) for custom domains, distinguishing two cases. It differentiates from sibling tools like check_custom_domain or add_custom_domain by specifying the action of rewriting records.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use based on domain connection status. It implies the tool is appropriate for DNS setup, though it doesn't explicitly mention alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

unarchive_threadUnarchive a threadA
Idempotent
Inspect

Restore an archived thread (clears archivedAt). It reappears in the default inbox list. Pair with list_inbox filter=archived to find archived threadIds first.

ParametersJSON Schema
NameRequiredDescriptionDefault
threadIdYesarchived thread id to restore

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes the behavioral effect (clears archivedAt, reappears in default inbox) beyond the idempotentHint annotation. While it doesn't detail error handling or prerequisites, the description adds useful context about the restoration process.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load the action and effect, followed by usage guidance. No wasted words; every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple restoration tool with one required parameter and output schema, the description covers purpose, usage, and behavioral effect comprehensively. It is fully adequate for an AI agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the parameter description is clear, but the tool description adds value by explaining how to obtain the threadId using list_inbox, providing context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool restores an archived thread by clearing archivedAt, and distinguishes it from sibling tools like archive_thread by specifying the opposite action. It also mentions reappearing in the default inbox, which adds clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides when-to-use guidance by instructing to pair with list_inbox filter=archived to find the threadId first, implying this tool should be used after retrieving archived threads, and provides a workflow integration hint.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_briefPatch the project briefA
Idempotent
Inspect

Overwrite the project's working brief (jsonb). Hard-capped at 500 KB. Use to revise the AI-generated brief by hand; the original AI output is preserved separately in briefAiOriginal so you can always compare. Does not change status.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesproject id
briefYesthe full replacement brief object (jsonb, ≤500 KB)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral traits beyond annotations: hard cap at 500 KB and does not change status. Annotations only indicate idempotentHint=true, so description provides useful additional context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. Front-loaded with action and constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, constraints, side effects, and references preserved original. For a simple patch tool, it is fully informative.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description reinforces the size constraint and explains the brief parameter is a full replacement, adding marginal value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Overwrite the project's working brief') and identifies the resource ('brief'). It distinguishes from sibling 'regenerate_brief' by emphasizing manual revision.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides clear context: 'Use to revise the AI-generated brief by hand' and notes that the original is preserved. It lacks explicit mention of when not to use, but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_listingUpdate one of my listingsA
Idempotent
Inspect

Patch a listing by siteSlug — change price, pitch, terms, jurisdiction, or status (draft|live|paused). Sold listings cannot be edited. Status transitions enforced server-side. Use to pause sales or drop the price.

ParametersJSON Schema
NameRequiredDescriptionDefault
pitchNonew sales pitch (null to clear)
statusNonew listing status
siteSlugYesslug of the listed site to patch
termsModeNo'standard' template or 'custom' terms
priceCentsNonew whole-dollar price in cents, 100–999900
termsCustomNonew custom terms text (null to clear)
jurisdictionNogoverning jurisdiction, e.g. 'California, USA'

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the idempotentHint annotation, the description discloses critical behavioral traits: sold listings are immutable and status transitions are server-side enforced. This adds value and does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three succinct sentences with no wasted words. The most important info (verb, resource, fields) is front-loaded, followed by constraints and usage examples.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 7-parameter tool with output schema, the description covers primary use, constraints, and state dependency. It could mention the output schema or idempotent behavior, but remains mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds limited new meaning. It groups fields ('price, pitch, terms...') and mentions the sold constraint, but doesn't enrich parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies a clear verb ('Patch') and resource ('listing by siteSlug'), explicitly lists modifiable fields, and distinguishes from sibling tools like create_listing and delete_listing. It also provides example use cases ('pause sales or drop the price').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states when to use (to change specific fields) and includes a key constraint ('Sold listings cannot be edited'). It lacks explicit exclusion of other scenarios but provides sufficient context for typical use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_claimVerify a Shiply claim pairing codeAInspect

Confirm a pairing code shown in the user's browser at https://shiply.now/claim/?pair=1. Use ONLY in the agent session that originally published the site — it reads the claimToken from .shiply.json in the current working directory and proves to Shiply that this agent session is authorised to claim the site. After verification the user is auto-redirected to /welcome and the site binds to their account.

ParametersJSON Schema
NameRequiredDescriptionDefault
codeYesthe SHIPLY-XXXXXXXX code in the user's browser

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behavioral traits beyond the idempotentHint annotation: it reads a local file, performs a verification, and auto-redirects the user, binding the site to the account. This fully informs the agent of side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with no waste. It front-loads the core action, then provides usage constraints and outcome, all in a clear, scannable structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single parameter, high schema coverage, and presence of an output schema, the description covers all necessary aspects: purpose, prerequisites, expected user action, and resultant behavior. It leaves no ambiguity for a capable agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The parameter 'code' is fully described in the schema (pattern and description). The tool description adds context about its origin (user's browser) and role in the flow, which enriches understanding slightly beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies a concrete verb ('Confirm') and a specific resource (pairing code in user's browser), and distinguishes itself from siblings like 'verify_sending_domain' and 'verify_site' by detailing its unique purpose and context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit usage guidance: 'Use ONLY in the agent session that originally published the site.' It explains the prerequisite (.shiply.json) and the outcome, but does not explicitly state when not to use it or compare with alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_sending_domainRe-check DNS for a sending domainA
Idempotent
Inspect

Trigger Resend to re-check the domain's DNS records, persist the new status. Call after adding the DNS records returned by add_sending_domain. Status flips to 'verified' once SPF + DKIM + MX all check out.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYessending domain id from list_sending_domains

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that it triggers a re-check and persists status, beyond the annotations (openWorldHint, idempotentHint). No contradiction with annotations. Adds context about the specific records checked.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a simple, idempotent tool with one parameter. Covers usage context, action, and outcome conditions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema describes the single parameter 'id' with source hint. Description adds no further detail; baseline 3 for 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('re-check the domain's DNS records') and resource ('sending domain'), with a specific verb ('Trigger Resend') and outcome ('persist the new status'). It distinguishes from siblings by referencing the prerequisite call to add_sending_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Call after adding the DNS records returned by add_sending_domain.' Also explains the verification condition (SPF + DKIM + MX check out). Does not explicitly exclude alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_siteVerify a live deployA
Read-only
Inspect

Edge-check a shiply slug or custom hostname and return a structured readiness report: status (LIVE/PENDING), SSL details (valid, issuer, daysLeft), HTTP probe, and a presigned thumbnail URL when available. Use this after publishing to confirm the site is reachable.

ParametersJSON Schema
NameRequiredDescriptionDefault
targetYesslug (my-site) or hostname (www.example.com)

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. Description adds specific behavioral details about the report contents (SSL details, HTTP probe, thumbnail URL), which goes beyond annotations. No mention of rate limits or errors, but the added context is valuable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences: first defines the tool's action and outputs, second provides usage timing. No filler, front-loaded, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given a simple parameter set, strong annotations, and existence of output schema, the description is largely complete. Could be improved by mentioning error behavior (e.g., invalid target) or distinguishing from similar tools, but adequate for the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'target' is fully described in the input schema with format examples. The tool description merely rephrases the same concept (slug or hostname) without adding new information. With 100% schema coverage, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool is for edge-checking a slug or hostname and returning a readiness report. Lists specific outputs (status, SSL, HTTP, thumbnail). However, does not explicitly differentiate from sibling tools like check_custom_domain or site_status, which may have overlapping purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this after publishing to confirm the site is reachable,' providing clear usage context. Does not mention when not to use or alternatives, but the guidance is sufficient for the intended use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

whoamiAccount overviewA
Read-only
Inspect

Who am I? Returns the signed-in account: email, @handle, plan + limits, counts of sites/domains/drives, and connected DNS providers. Call this first to orient before managing sites or domains.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description details specific returned fields (email, handle, plan, limits, counts, DNS providers) which goes beyond the readOnlyHint annotation. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with a clear question, no wasted words. Perfectly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no required parameters, an output schema, and the description covers the return content. Fully complete for its purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist in the input schema and schema coverage is 100%, so the description does not need to add parameter information. Baseline for 0 params is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns account information and positions it as an initial orientation step before managing sites or domains, distinguishing it from the many sibling action tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Call this first to orient' providing clear usage context. Does not list when-not or alternatives, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources