Skip to main content
Glama

Server Details

MCP Server for Bitrise, enabling app management, build operations, artifact management, and more.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
bitrise-io/bitrise-mcp
GitHub Stars
37
Server Listing
Bitrise MCP Server

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsC

Average 3.2/5 across 67 of 67 tools scored. Lowest: 2.2/5.

Server CoherenceB
Disambiguation2/5

Many tools have overlapping or ambiguous purposes, such as multiple build-related tools (abort_build, abort_pipeline, rebuild_pipeline, trigger_bitrise_build) that could be confused, and several artifact/installable artifact tools with similar names and functions. While descriptions provide some clarity, the sheer number and redundancy make it difficult for an agent to reliably choose the correct tool without deep domain knowledge.

Naming Consistency4/5

Tool names predominantly follow a consistent verb_noun pattern (e.g., list_apps, get_app, create_connected_app, update_artifact), with clear and predictable naming. Minor deviations exist (e.g., 'me' instead of 'get_user_info'), but overall the naming is highly consistent and readable across the set.

Tool Count2/5

With 67 tools, the count is excessive for a CI/CD and release management server, leading to cognitive overload and potential confusion. A well-scoped server should typically have 3-15 tools; this feels bloated and could be streamlined by grouping related functionalities or removing redundancies.

Completeness5/5

The tool set provides comprehensive coverage for Bitrise's domain, including app management, build pipelines, artifacts, release management, tester groups, webhooks, caching, and workspace administration. It supports full CRUD operations for key resources and handles complex workflows like app registration and artifact processing, leaving no obvious gaps for agent operations.

Available Tools

67 tools
abort_buildC
DestructiveIdempotent
Inspect

Abort a specific build.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
build_slugYesIdentifier of the build
abort_reasonNoReason for aborting the build
abort_with_successNoIf set to true, the aborted build will be marked as successful
skip_notificationsNoIf set to true, skip sending notifications
skip_git_status_reportNoIf set to true, skip sending git status report
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare destructiveHint=true and idempotentHint=true, but the description adds zero behavioral context about side effects, what state the build enters, or the impact of skip_notifications/skip_git_status_report parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise with zero waste—every word in 'Abort a specific build' earns its place. No filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 6-parameter destructive mutation tool with notification side effects, the single-sentence description is insufficient. It omits critical context about the abort behavior and consequences despite having no output schema to provide that detail.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing full parameter documentation. The description adds no parameter semantics beyond the schema, warranting the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

States the verb (Abort) and resource (build), distinguishing from sibling `abort_pipeline` by resource type. However, it is extremely minimal (4 words) and lacks the richness seen in higher-scoring definitions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to invoke vs. alternatives, prerequisites (e.g., build must be running), or when to prefer `abort_pipeline` over this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

abort_pipelineC
DestructiveIdempotent
Inspect

Abort a pipeline.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
pipeline_idYesIdentifier of the pipeline
abort_reasonNoReason for aborting the pipeline
abort_with_successNoIf set to true, the aborted pipeline will be marked as successful
skip_notificationsNoIf set to true, skip sending notifications
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare destructiveHint=true and idempotentHint=true, but the description adds no context beyond this. It fails to explain what gets destroyed (running workflows vs builds), what idempotency means here (safe to retry), or the effect of the 'abort_with_success' flag on final pipeline state.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The three-word description is maximally concise, but given the tool's complexity (destructive mutation with behavioral flags), it is inappropriately sized. The single sentence exists but doesn't earn its place by conveying sufficient information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive operation with specific behavioral flags (abort_with_success, skip_notifications) and idempotent behavior, the description is incomplete. It omits the implications of these settings and provides no coverage of side effects or return behavior despite the lack of output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema adequately documents all 5 parameters. The description mentions no parameters, but baseline 3 is appropriate when schema carries the full burden.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Abort a pipeline' is essentially a tautology of the tool name 'abort_pipeline', converting snake_case to sentence case. It states the verb and resource but fails to distinguish from sibling 'abort_build' or clarify the specific scope (CI/CD pipeline vs other types).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool versus alternatives like 'abort_build' or 'rebuild_pipeline'. No mention of prerequisites or conditions (e.g., pipeline must be running).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_member_to_groupC
Destructive
Inspect

Add a member to a group.

ParametersJSON Schema
NameRequiredDescriptionDefault
user_slugYesSlug of the user
group_slugYesSlug of the group
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate destructiveHint=true and idempotentHint=false, suggesting irreversible side effects and non-safe retries, but the description adds no context explaining what exactly is destroyed or how non-idempotent behavior manifests (e.g., fails if member exists?). No explanation of openWorldHint implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While brief (5 words), the extreme brevity constitutes under-specification rather than efficient communication. The single sentence merely echoes the tool name without front-loading additional value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive mutation tool with non-idempotent behavior, the description lacks critical context: error conditions (e.g., duplicate membership handling), side effects, or group type specificity. No output schema exists to compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage ('Slug of the user', 'Slug of the group'), the schema carries the semantic load. The description adds no parameter details but meets baseline expectations given complete schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Add a member to a group' is a tautology that merely restates the tool name. It fails to specify what kind of group (workspace, tester, or other) and does not distinguish from siblings like 'invite_member_to_workspace' or 'add_testers_to_tester_group'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives like 'invite_member_to_workspace' or 'add_testers_to_tester_group'. Does not mention prerequisites, permission requirements, or success conditions for the destructive mutation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_testers_to_tester_groupAInspect

Adds testers to a tester group of a connected app.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesThe uuidV4 identifier of the tester group to which testers will be added.
user_slugsYesThe list of users identified by slugs that will be added to the tester group.
connected_app_idYesThe uuidV4 identifier of the related Release Management connected app.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover the basic safety profile (readOnlyHint=false, destructiveHint=false), so the description is not required to disclose mutability. However, the description adds only the 'connected app' context and fails to disclose idempotency behavior (relevant given idempotentHint=false), side effects (relevant given openWorldHint=true), or failure modes (e.g., duplicate user handling).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, front-loaded sentence with zero redundancy. Every word earns its place: the action (Adds), targets (testers, tester group), and scope (connected app) are conveyed with maximum density.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the straightforward 3-parameter mutation, comprehensive annotations, and fully documented schema, the description is sufficient for an agent to understand the tool's role. It appropriately omits output details (no output schema exists) but could be improved with notes on prerequisites or partial failure handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the baseline is 3. The description adds value by mapping abstract parameter names to domain concepts: 'testers' clarifies 'user_slugs', 'tester group' clarifies 'id', and 'connected app' clarifies 'connected_app_id'. It effectively translates the schema into the operation's business logic.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description provides a specific verb ('Adds'), resource ('testers'), and target container ('tester group' of a 'connected app'), clearly mapping to the parameter structure. However, it does not explicitly distinguish this append operation from sibling tools like 'create_tester_group' (group creation) or 'update_tester_group' (which could imply replacement), though the term 'Adds' does imply an append semantic.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided regarding when to use this tool versus alternatives, prerequisites (e.g., that the tester group must already exist), or whether callers should first verify tester existence using 'get_potential_testers'. The description is purely functional.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_connected_appBInspect

Add a new Release Management connected app to Bitrise.

ParametersJSON Schema
NameRequiredDescriptionDefault
idNoAn uuidV4 identifier for your new connected app. If it is not given, one will be generated.
platformYesThe mobile platform for the connected app. Available values are 'ios' and 'android'.
project_idNoSpecifies which Bitrise Project you want to get the connected app to be associated with. If this field is not given a new project will be created alongside with the connected app.
store_app_idYesThe app store identifier for the connected app. In case of 'ios' platform it is the bundle id from App Store Connect. In case of Android platform it is the package name.
store_app_nameNoIf you have no active app store API keys added on Bitrise, you can decide to add your app manually by giving the app's name as well while indicating manual connection.
workspace_slugYesIdentifier of the Bitrise workspace for the Release Management connected app. This field is mandatory.
manual_connectionNoIf set to true it indicates a manual connection (bypassing using store api keys) and requires giving 'store_app_name' as well.
store_credential_idNoIf you have credentials added on Bitrise, you can decide to select one for your app. In case of ios platform it will be an Apple API credential id. In case of android platform it will be a Google Service credential id.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover write access (readOnlyHint: false) and non-destructive nature; description adds 'Release Management' domain context but omits behavioral details like side effects, idempotency implications (idempotentHint: false), or what 'connected' means in this ecosystem.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single efficient sentence with zero redundancy. However, extreme brevity (9 words) for an 8-parameter tool with complex conditional logic may be insufficient; could benefit from front-loading key usage constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Minimum viable for a creation tool with rich schema and annotations. Omits output expectations (no output schema exists) and lacks guidance for the mutually exclusive credential patterns (manual vs. API key) described in the parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, documenting all 8 parameters including conditional requirements (manual_connection requiring store_app_name). Tool description adds no parameter semantics, meeting baseline expectation when schema coverage is high.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Uses specific verb 'Add' and resource 'Release Management connected app', clearly identifying the operation. Mentioning 'Release Management' distinguishes from generic 'register_app' sibling, though explicit contrast is absent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus 'register_app' or other app creation alternatives. Does not mention prerequisites (e.g., workspace existence) or when to use manual_connection versus store credentials.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_outgoing_webhookCInspect

Create an outgoing webhook for an app.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesURL of the webhook
eventsYesList of events to trigger the webhook
secretNoSecret for webhook signature verification
headersNoHeaders to be sent with the webhook
app_slugYesIdentifier of the Bitrise app
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-idempotent write operation that reaches external URLs (openWorldHint: true), but the description adds no behavioral context about delivery guarantees, retry logic, signature verification using the secret parameter, or the format of payloads sent to the URL.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no redundant words. While terse to the point of under-specification, the sentence structure is efficient and front-loaded with the verb.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a configuration tool with 5 parameters including nested objects (headers) and no output schema, the description is too minimal. It omits critical context such as what event values are valid, webhook payload structure, and the external nature of the integration despite annotations covering basic safety properties.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description mentions 'for an app' which loosely maps to app_slug, but adds no semantic detail about valid event types, the optional nature of secret/Headers, or URL requirements beyond what the schema explicitly defines.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

States the basic action (create) and resource (outgoing webhook) plus scope (for an app). However, it fails to distinguish from the sibling 'register_webhook' tool or clarify what distinguishes 'outgoing' webhooks from other webhook types in this platform.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus 'register_webhook' or prerequisites such as requiring an existing app. No mention of when 'update_outgoing_webhook' should be used instead.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_tester_groupAInspect

Creates a tester group for a Release Management connected app. Tester groups can be used to distribute installable artifacts to testers automatically. When a new installable artifact is available, the tester groups can either automatically or manually be notified via email. The notification email will contain a link to the installable artifact page for the artifact within Bitrise Release Management. A Release Management connected app can have multiple tester groups. Project team members of the connected app can be selected to be testers and added to the tester group. This endpoint has an elevated access level requirement. Only the owner of the related Bitrise Workspace, a workspace manager or the related project's admin can manage tester groups.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesThe name for the new tester group. Must be unique in the scope of the connected app.
auto_notifyNoIf set to true it indicates that the tester group will receive notifications automatically.
connected_app_idYesThe uuidV4 identifier of the related Release Management connected app.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a write operation (readOnlyHint: false) but the description adds significant behavioral context not in annotations: notification email content ('link to the installable artifact page'), automatic vs. manual notification triggers, and the constraint that apps can have multiple groups. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Eight sentences all earn their place: purpose, use case, notification behavior, email content, cardinality constraints, membership workflow, and elevated permissions. Information is front-loaded with the core purpose, followed by operational details and access constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 simple parameters and no output schema, the description is comprehensive. It covers the complete lifecycle context (creation, notification triggers, membership, permissions), making it clear what the tool creates and how it fits into the broader tester management workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the baseline is 3. The description adds semantic value beyond the schema by explaining what 'auto_notify' means in practice (receiving notifications automatically when artifacts are available) and contextualizing 'connected_app_id' within the Release Management workflow.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a specific verb and resource ('Creates a tester group for a Release Management connected app') that clearly defines the operation. It implicitly distinguishes from sibling tools like 'update_tester_group' or 'list_tester_groups' by focusing on the creation workflow and initial setup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states permission requirements ('Only the owner... workspace manager or... project's admin can manage tester groups'), providing crucial access control guidance. While it doesn't explicitly name sibling alternatives, it explains the workflow relationship to 'add_testers_to_tester_group' by describing how project team members become testers.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_workspace_groupCInspect

Create a new group in a workspace.

ParametersJSON Schema
NameRequiredDescriptionDefault
group_nameYesName of the group
workspace_slugYesSlug of the Bitrise workspace
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With annotations declaring idempotentHint=false and openWorldHint=true, the description carries minimal additional context. It fails to explain mutation side effects, failure modes if group name exists, or what 'openWorld' effects manifest when creating the group.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no redundancy or filler. Appropriately front-loaded with the action verb, though extreme brevity limits informational value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter creation tool with complete schema coverage and safety annotations present, the description is minimally sufficient. Missing return value documentation (no output schema exists) and semantic context about group purpose (access control vs other types).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are fully documented in the schema. The description adds no additional parameter semantics, but the baseline score of 3 applies since the schema carries the burden.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb 'Create' and resource 'group' with scope 'in a workspace'. However, does not distinguish from sibling tool create_tester_group, which also creates a group entity potentially within workspace contexts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives like create_tester_group or get_workspace_groups (to check for existing groups). No prerequisites or exclusions mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_all_cache_itemsA
DestructiveIdempotent
Inspect

Delete all key-value cache items belonging to an app.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations (destructiveHint=true matches 'Delete') and adds domain context ('key-value cache', 'belonging to an app'). However, it does not expand on behavioral traits beyond what annotations provide, such as explaining the idempotent nature or what happens when no items exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence of nine words with zero waste. Every word contributes: 'Delete' (action), 'all' (scope), 'key-value cache items' (resource), 'belonging to an app' (parameter context).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single parameter, high schema coverage, and clear annotations indicating destructive/idempotent behavior, the description is nearly complete. Minor gap: no mention of return values or confirmation behavior, though acceptable given no output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage for 'app_slug', the baseline is 3. The description adds semantic value by clarifying the relationship between the parameter and the resource ('belonging to an app'), but does not add syntax details or validation rules beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Delete') with clear resource ('key-value cache items') and scope ('all', implied by tool name and 'belonging to an app'). It effectively distinguishes from sibling tool 'delete_cache_item' (singular) by emphasizing the bulk 'all' operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

While the description implies this is for bulk deletion via the word 'all' and the plural 'items', it does not explicitly state when to use this versus the singular 'delete_cache_item' alternative, nor does it mention prerequisites or warnings about data loss despite the destructive nature.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_appA
DestructiveIdempotent
Inspect

Delete an app from Bitrise. When deleting apps belonging to multiple workspaces always confirm that which workspaces' apps the user wants to delete.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already disclose destructiveness, idempotency, and side effects. The description adds valuable business-logic context about workspace confirmation requirements that is not captured in structured fields, helping agents understand the operational risk of accidental cross-workspace deletion.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences total. The first states purpose; the second states the critical workspace confirmation rule. No redundancy, appropriately front-loaded, and every sentence provides actionable information for tool selection or invocation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single-parameter input, 100% schema coverage, destructive/idempotent annotations, and lack of output schema, the description provides sufficient context. The workspace confirmation caveat addresses the primary complexity of this operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with the app_slug parameter fully described as 'Identifier of the Bitrise app.' The description does not augment parameter semantics, but with complete schema documentation, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with the specific phrase 'Delete an app from Bitrise,' clearly stating the verb (delete), resource (app), and system (Bitrise). This effectively distinguishes it from sibling tools like delete_artifact, delete_cache_item, and delete_outgoing_webhook.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The second sentence provides specific guidance for multi-workspace scenarios: 'When deleting apps belonging to multiple workspaces always confirm that which workspaces' apps the user wants to delete.' This establishes a guardrail for when to seek confirmation, though it does not explicitly name alternative tools like get_app for verification.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_artifactC
DestructiveIdempotent
Inspect

Delete a build artifact.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
build_slugYesIdentifier of the build
artifact_slugYesIdentifier of the artifact
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructive, idempotent, and non-readonly status. Description adds no behavioral details beyond confirming the delete action, such as permanence, side effects, cascading deletions, or required permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise at four words with zero redundancy. However, for a destructive operation requiring three specific identifiers, this brevity may be insufficient to convey necessary caution or context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Sufficient for a simple delete operation given rich annotations and complete parameter schema, though lacks guidance on success/failure outcomes and prerequisites expected for destructive operations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage for the three slug parameters, the description meets baseline expectations without needing to repeat parameter details. No additional semantic context (e.g., slug format) is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb (delete) and resource (build artifact) that distinguishes from siblings like delete_app or delete_cache_item by specifying the domain. However, lacks elaboration on what constitutes a 'build artifact' (file, log, report, etc.).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives like list_artifacts or get_artifact, nor prerequisites (e.g., whether artifact must be in a specific state) or warnings about irreversible deletion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_cache_itemB
DestructiveIdempotent
Inspect

Delete a key-value cache item.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
cache_item_idYesKey of the cache item
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description confirms the domain (key-value cache) but adds minimal behavioral context beyond the annotations. While annotations already disclose destructiveness (destructiveHint=true) and idempotency (idempotentHint=true), the description omits details about permanence, recovery options, or side effects that would help an agent understand the impact of invocation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no redundant words. However, given the presence of a destructive bulk-delete sibling and the destructive nature of the operation, it is arguably too terse—lacking necessary qualifying context (e.g., 'single', 'specific') that would distinguish it without adding verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive operation with two well-documented parameters and clear annotations, the description meets minimum viability but leaves gaps. It does not address the tool's relationship to 'delete_all_cache_items', describe the lack of output schema implications, or clarify irreversibility—gaps that should be addressed given the destructiveHint.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% input schema coverage, the schema already documents that 'cache_item_id' is the 'Key of the cache item' and 'app_slug' identifies the Bitrise app. The description uses 'key-value' terminology consistent with the schema but adds no additional syntax guidance, validation rules, or format examples beyond what the structured schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (Delete) and resource (key-value cache item), providing specific domain context. However, it fails to explicitly distinguish this single-item deletion from the sibling 'delete_all_cache_items' tool, relying solely on the tool name's singular form to convey scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus the bulk 'delete_all_cache_items' alternative, nor does it mention prerequisites such as needing the app identifier or key format requirements. It merely states the action without contextual selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_outgoing_webhookB
DestructiveIdempotent
Inspect

Delete the outgoing webhook of an app.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
webhook_slugYesIdentifier of the webhook
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations fully cover safety profile (destructiveHint, idempotentHint, readOnlyHint). Description adds minimal behavioral context beyond confirming the action type—omits permanence implications, side effects on triggered builds, or openWorldHint implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single 7-word sentence with zero redundancy. Appropriate length for a straightforward 2-parameter deletion operation; no filler words or unnecessary preamble.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for simple destructive operation with complete schema and rich annotations. Lacks explicit confirmation of idempotent behavior (covered by annotation) and return value discussion, but sufficient given no output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for app_slug and webhook_slug. Tool description mentions 'of an app' implying app_slug relationship but adds no format constraints, validation rules, or slug semantics beyond schema baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'Delete' and resource 'outgoing webhook', clearly distinguishing from siblings create_outgoing_webhook, update_outgoing_webhook, and list_outgoing_webhooks. However, lacks scope details like 'permanently removes' or identification method.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use delete versus update_outgoing_webhook (e.g., for temporary disablement), no mention of prerequisites like obtaining slugs from list_outgoing_webhooks, and no warnings about impact on active builds.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

finish_bitrise_appA
Destructive
Inspect

Finish the setup of a Bitrise app. If this is successful, a build can be triggered via trigger_bitrise_build. If you have access to the repository, decide the project type, the stack ID, and the config to use, based on https://stacks.bitrise.io/, and the config should be also based on the project type.

ParametersJSON Schema
NameRequiredDescriptionDefault
configNoThe configuration preset to use for the app.other-config
app_slugYesThe slug of the Bitrise app to finish setup for.
stack_idYesThe stack ID to use for the app.linux-docker-android-22.04
project_typeYesThe type of projectother
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare destructive=true and readOnly=false, indicating state mutation. Description adds valuable external reference (https://stacks.bitrise.io/) for parameter selection and implies repository access requirements, but does not detail the destructive impact or failure modes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each earning its place: purpose, workflow integration (next step), and parameter selection logic. Front-loaded with the primary action and no redundant repetition of schema details or annotations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive mutation tool with no output schema, description adequately covers purpose, prerequisite state (needs access to repository), decision criteria for parameters, and next steps. Could benefit from noting idempotency constraints or failure behavior, but sufficient for invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (baseline 3). Description adds meaningful selection logic: parameters should be decided based on the external stacks URL and 'config should be also based on the project type', providing semantic guidance for choosing enum values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb 'Finish' and resource 'setup of a Bitrise app'. Clearly distinguishes from register_app (which creates) and explicitly references sibling trigger_bitrise_build as the next step in the workflow, establishing its position in the lifecycle.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage context by stating 'If this is successful, a build can be triggered via trigger_bitrise_build', establishing the sequence. Also adds conditional guidance 'If you have access to the repository...'. Lacks explicit warnings about when NOT to use (e.g., if already configured).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generate_installable_artifact_upload_urlA
Read-only
Inspect

Generates a signed upload url valid for 1 hour for an installable artifact to be uploaded to Bitrise Release Management. The response will contain an url that can be used to upload an artifact to Bitrise Release Management using a simple curl request with the file data that should be uploaded. The necessary headers and http method will also be in the response. This artifact will need to be processed after upload to be usable. The status of processing can be checked by making another request to a different url giving back the processed status of an installable artifact.

ParametersJSON Schema
NameRequiredDescriptionDefault
branchNoOptionally you can add the name of the CI branch the installable artifact has been generated on.
workflowNoOptionally you can add the name of the CI workflow this installable artifact has been generated by.
file_nameYesThe name of the installable artifact file (with extension) to be uploaded to Bitrise. This field is mandatory.
file_size_bytesYesThe byte size of the installable artifact file to be uploaded.
connected_app_idYesIdentifier of the Release Management connected app for the installable artifact. This field is mandatory.
with_public_pageNoOptionally, you can enable public install page for your artifact. This can only be enabled by Bitrise Project Admins, Bitrise Project Owners and Bitrise Workspace Admins. Changing this value without proper permissions will result in an error. The default value is false.
installable_artifact_idYesAn uuidv4 identifier generated on the client side for the installable artifact. This field is mandatory.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations declaring readOnly/openWorld, the description adds critical behavioral context: the 1-hour expiration, the exact response contents (URL, headers, HTTP method), and the post-upload processing requirement. It notes that the artifact requires separate processing before being usable, which is not inferable from the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences logically ordered: action definition, immediate response usage, response details, and post-conditions/next steps. Slightly inefficient phrasing in sentence 3 ('The necessary headers and http method will also be in the response') could be combined with sentence 2, but overall well-structured and appropriately sized at ~80 words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description adequately compensates by detailing the response structure (URL, headers, method) and explains the complete artifact lifecycle including the processing phase. Only minor gaps: explicit error conditions and the specific sibling tool name for status checking would make it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already fully documents all 7 parameters including UUIDv4 format for the artifact ID and optional CI metadata. The description adds workflow context (using curl with file data) but does not significantly augment individual parameter semantics beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool 'Generates a signed upload url' (specific verb + resource) for an installable artifact, notes the 1-hour validity constraint, and distinguishes this generation step from the sibling status-checking tool (implied by mentioning 'another request to a different url' for checking processing status).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description outlines a clear workflow sequence: generate URL → upload via curl → processing occurs → check status via different request. However, it fails to explicitly name the sibling tool 'get_installable_artifact_upload_and_proc_status' as the specific tool for checking status, which would strengthen selection guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_appB
Read-onlyIdempotent
Inspect

Get the details of a specific app.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations (readOnlyHint, idempotentHint) and implies safe read behavior via 'Get'. However, it adds minimal behavioral context beyond annotations—no mention of error behavior when the app_slug is invalid, rate limits, or cache considerations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single efficient sentence with no redundant words. While extremely minimal, it is appropriately front-loaded and avoids tautology.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single string parameter, no nested objects) and comprehensive annotations, the description is minimally viable. However, lacking an output schema, it should ideally clarify what 'details' are returned to distinguish from siblings like 'get_bitrise_yml'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema fully documents the 'app_slug' parameter as 'Identifier of the Bitrise app'. The description mentions 'specific app' which loosely maps to this parameter but adds no syntax details, format constraints, or examples beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a clear verb ('Get') and resource ('details of a specific app'), distinguishing it from 'list_apps' by specifying 'specific'. However, it does not explicitly differentiate from sibling getters like 'get_bitrise_yml' or clarify what constitutes 'details' (metadata vs configuration).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'list_apps' (for searching) or 'get_bitrise_yml' (for configuration). It also omits prerequisites such as needing the app_slug identifier beforehand.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_artifactB
Read-onlyIdempotent
Inspect

Get a specific build artifact.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
build_slugYesIdentifier of the build
artifact_slugYesIdentifier of the artifact
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations cover safety (readOnly, idempotent, non-destructive) and external access, the description adds minimal context beyond the name. It confirms the scope is 'specific' (single item) but fails to disclose return format, authentication requirements, or rate limits. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The single sentence contains no waste or redundancy. However, it borders on underspecification rather than optimal conciseness, as it front-loads no qualifying details about behavior, output, or constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 3-parameter read operation with full schema coverage and strong annotations, the description is minimally viable. However, it lacks mention of the output structure, what constitutes an 'artifact' in this domain, and relationships to the build/app hierarchy that would complete the agent's understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description does not add parameter-specific guidance, syntax examples, or relationships between the slugs (e.g., that artifact_slug is scoped to a build), but the schema carries this load adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Get' and identifies the resource as a 'build artifact'. The word 'specific' implies singular retrieval, distinguishing it from 'list_artifacts', though it does not explicitly name siblings or clarify what 'get' entails (metadata vs download).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'list_artifacts' (when IDs are unknown) or 'delete_artifact'. There are no stated prerequisites, conditions, or exclusions to aid selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_bitrise_ymlB
Read-onlyIdempotent
Inspect

Get the current Bitrise YML config file of a specified Bitrise app.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app (e.g., "d8db74e2675d54c4" or "8eb495d0-f653-4eed-910b-8d6b56cc0ec7")
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnly, idempotent, non-destructive). Description adds 'current' implying the latest active configuration versus historical versions. However, it fails to explain the openWorldHint annotation or disclose return format (raw YAML string vs parsed object), error conditions, or whether this retrieves from Bitrise servers or external Git repositories.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single efficient sentence with no redundancy. Information density is reasonable, though front-loading could be improved by preceding with distinction from build-specific config retrieval. No wasteful text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 100% schema coverage and comprehensive hint annotations (readOnly, idempotent, destructive), the description adequately covers the retrieval action. However, missing explicit sibling differentiation and return value description (no output schema exists to compensate) leaves gaps for an agent determining which config retrieval tool to use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed app_slug description including format examples. Description references 'specified Bitrise app' which aligns with the parameter but adds no semantic nuance (e.g., whether slug refers to the app's unique ID or repository name) beyond what schema already provides. Baseline 3 appropriate for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb (Get) and resource (Bitrise YML config file) clearly. Specifies 'current' and 'of a specified Bitrise app', which implicitly distinguishes from sibling get_build_bitrise_yml by focusing on app-level rather than build-level config. However, it doesn't explicitly clarify the distinction from get_build_bitrise_yml or when to prefer this over retrieving build-specific configs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus siblings like get_build_bitrise_yml (which retrieves build-specific configs) or update_bitrise_yml. No mention of prerequisites or typical workflows (e.g., checking config before triggering builds).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_buildB
Read-onlyIdempotent
Inspect

Get a specific build of a given app.

ParametersJSON Schema
NameRequiredDescriptionDefault
verboseNoInclude all build details. Default: false
app_slugYesIdentifier of the Bitrise app
build_slugYesIdentifier of the build
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnlyHint, destructiveHint, idempotentHint), so description doesn't need to carry that burden. Description aligns with read-only nature ('Get'). However, it adds no context about what 'get' returns (build metadata, status, configuration) or behavior on missing builds (404 vs null).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with action verb. Zero redundancy or waste. Appropriately sized for a simple two-parameter retrieval operation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple read operation with complete schema coverage and safety annotations. Missing only error handling details (e.g., what happens if build_slug doesn't exist). Given the lack of output schema, description could briefly indicate what build information is retrieved.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, establishing baseline 3. The description frames the relationship ('of a given app') mapping to app_slug, but doesn't add syntax details, format constraints, or examples beyond what the schema already provides ('Identifier of the Bitrise app').

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb ('Get') + resource ('specific build') + scope ('of a given app'). Accurately describes retrieving a singleton resource. Lacks explicit differentiation from siblings like 'list_builds' or 'get_build_log', though 'specific' implies single-item retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this vs. 'list_builds' (when user has ID vs. needs to search) or vs. 'get_build_log'/'get_build_steps' (when they need the full build object vs. sub-resources). No prerequisites or error conditions mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_build_bitrise_ymlC
Read-onlyIdempotent
Inspect

Get the bitrise.yml of a build.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
build_slugYesIdentifier of the build
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, idempotent, and non-destructive nature. Description adds minimal behavioral context beyond this, failing to mention that it retrieves a historical snapshot of the configuration or what format the response takes (YAML string vs object).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely terse at six words with no filler. However, given the existence of similarly-named sibling tools, the description could benefit from one additional clause clarifying the temporal/build-specific nature without harming conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple retrieval operation with strong annotations, but lacks mention of return format (YAML content vs JSON object) and misses opportunity to clarify the build-specific snapshot behavior given no output schema is provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both app_slug and build_slug. Description adds no additional parameter semantics, meeting baseline expectations when schema documentation is comprehensive.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb (Get) and resource (bitrise.yml) with scope (of a build), distinguishing it from sibling get_bitrise_yml implicitly by referencing a build-specific artifact. However, lacks explicit differentiation explaining why one would use this over the app-level configuration retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus get_bitrise_yml or other build inspection tools. Does not clarify that this retrieves the configuration snapshot used for a specific historical build rather than the current active configuration.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_build_logC
Read-onlyIdempotent
Inspect

Get the build log of a specified build of a Bitrise app.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoThe number of lines to read. Defaults to 2000. Set to a high value to read the entire log.
offsetNoThe line number to start reading from. Defaults to 0. Set -1 to read from the end of the log. Failures are usually at the end of the log.
app_slugYesIdentifier of the Bitrise app (e.g., "d8db74e2675d54c4" or "8eb495d0-f653-4eed-910b-8d6b56cc0ec7")
step_uuidNoUUID of the step to get the log for. If not provided, the full build log is returned. Always provide this value whenever possible to avoid large log responses and running out of the LLM context window.
build_slugYesIdentifier of the Bitrise build

Output Schema

ParametersJSON Schema
NameRequiredDescription
log_linesYesThe requested lines of the build log.
next_offsetNoThe offset to use to read the next portion of the log, if any.
total_linesYesThe total number of lines in the build log.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare read-only, idempotent, and non-destructive properties. The description adds no behavioral context beyond this—failing to mention that logs can be large (implied only in step_uuid parameter description), streaming behavior, or that failures are typically at the end of logs (mentioned in offset parameter description but not main description).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 12 words is efficient and front-loaded with the action verb. However, extreme brevity leaves gaps in behavioral disclosure that slightly undermines its value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema, rich annotations, and 100% parameter coverage, the minimal description is just sufficient. However, for a pagination-enabled tool fetching potentially massive logs, omitting any mention of size risks or pagination strategy leaves it barely adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The tool description itself does not mention parameters, serving only as a high-level summary. With schema coverage high, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb ('Get') and resource ('build log'), with scope ('of a specified build of a Bitrise app'). However, it does not explicitly differentiate from sibling 'get_build' which retrieves build metadata rather than logs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this versus 'get_build_steps' or 'get_build'. The description omits when pagination (limit/offset) is necessary or that logs can be extremely large, though the parameter descriptions partially cover this.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_build_stepsB
Read-onlyIdempotent
Inspect

Get step statuses of a specific build of a given app.

ParametersJSON Schema
NameRequiredDescriptionDefault
verboseNoInclude all build details. Default: false
app_slugYesIdentifier of the Bitrise app
build_slugYesIdentifier of the build
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations comprehensively cover safety (readOnlyHint, destructiveHint, idempotentHint), so the description doesn't need to repeat this. However, it adds no behavioral context beyond the basic action—no mention of what happens if slugs are invalid, what status values mean, or rate limits. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of nine words with action verb front-loaded ('Get'). Zero冗余—every word serves the purpose of identifying the operation and scope.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple two-parameter retrieval tool with strong annotations covering behavioral traits. However, lacking an output schema, it could benefit from describing what step statuses entail (e.g., execution states, durations) to distinguish from other build inspection tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions ('Identifier of the Bitrise app', 'Identifier of the build'). The description maps 'given app' and 'specific build' to these parameters but adds no additional semantic context, syntax details, or examples beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Get') and identifies the resource ('step statuses of a specific build'), distinguishing it from siblings like get_build_log or get_build by focusing on 'steps'. However, it assumes familiarity with what constitutes a 'step' in Bitrise CI/CD context without elaborating.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus similar siblings (e.g., get_build for general metadata vs step statuses) or prerequisites (e.g., needing a build_slug from list_builds).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_cache_item_download_urlC
Read-onlyIdempotent
Inspect

Get the download URL for a cache item.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
cache_item_idYesKey of the cache item
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnly, non-destructive, idempotent) but the description adds no operational context beyond what is already implied by the tool name. Critical behavioral traits like URL expiration/temporariness are omitted.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise 9-word sentence front-loaded with clear verb. While efficient with zero redundancy, the brevity verges on under-specification given that key operational details are missing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With rich annotations covering safety hints and 100% schema coverage, the description meets minimum viability. However, for a download URL tool, the omission of URL temporariness or workflow prerequisites (e.g., needing to call list_cache_items first) leaves functional gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for both parameters. The description adds no semantic clarifications beyond the schema, but the structured documentation sufficiently covers parameter meanings.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Uses specific verb 'Get' and clearly identifies the resource as a 'download URL for a cache item'. Distinguishes from siblings like list_cache_items (metadata enumeration) and delete_cache_item (removal), though it could clarify the URL's temporary nature.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives, nor does it mention the prerequisite workflow (typically calling list_cache_items first to obtain the cache_item_id). No explicit when-not or alternatives named.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_connected_appB
Read-onlyIdempotent
Inspect

Gives back a Release Management connected app for the authenticated account.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesIdentifier of the Release Management connected app
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnly, destructive, idempotent, openWorld). Description adds valuable domain context ('Release Management') and authorization scope ('authenticated account'). Does not describe error behavior (e.g., ID not found) or return structure, though no output schema exists to fill that gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words. Front-loaded with action and resource type. 'Gives back' is slightly idiomatic compared to standard 'Retrieves' or 'Returns', and missing 'by ID' qualifier that would clarify the lookup pattern.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple retrieval operation with 1 required parameter and full schema coverage. Annotations handle safety disclosure. Lacks error handling description (404 scenario) but sufficient given low complexity and absence of output schema requirements.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with 'id' parameter fully documented as 'Identifier of the Release Management connected app'. Description adds no parameter details, but baseline is 3 for high schema coverage per rubric. Could have clarified ID format or source.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb ('Gives back' understood as retrieves) and specific resource ('Release Management connected app'). Mentions 'authenticated account' providing scope context. Slightly informal phrasing and lacks explicit differentiation from list_connected_apps sibling (e.g., doesn't mention 'by ID'), preventing a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this single-item retrieval versus list_connected_apps, nor prerequisites like needing the app ID. 'For the authenticated account' provides minimal context but no decision-making framework.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_installable_artifact_upload_and_proc_statusA
Read-onlyIdempotent
Inspect

Gets the processing and upload status of an installable artifact. An artifact will need to be processed after upload to be usable. This endpoint helps understanding when an uploaded installable artifacts becomes usable for later purposes.

ParametersJSON Schema
NameRequiredDescriptionDefault
connected_app_idYesIdentifier of the Release Management connected app for the installable artifact. This field is mandatory.
installable_artifact_idYesThe uuidv4 identifier for the installable artifact. This field is mandatory.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the operation is read-only and idempotent. The description adds valuable domain-specific behavioral context: that artifacts require processing after upload to become usable, and this endpoint tracks that transitional state. This lifecycle information is not present in the annotations and helps the agent understand polling scenarios.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three efficiently structured sentences with no redundancy: sentence 1 states the action, sentence 2 explains the domain rule (processing required), and sentence 3 explains the use case (determining usability). Information is front-loaded and every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity and the absence of an output schema, the description adequately explains the conceptual states (uploaded vs. processed vs. usable). However, it misses the opportunity to enumerate specific status values (e.g., 'pending', 'processing', 'ready', 'failed') that would help the agent interpret responses without an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the structured fields already fully document both parameters (UUIDv4 format for artifact ID, connected app identification). The description mentions 'installable artifact' and 'upload' but does not add specific semantic guidance beyond what the schema already provides, warranting the baseline score for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('Gets') and clearly identifies the resource (processing and upload status). It distinguishes from siblings like 'generate_installable_artifact_upload_url' (which creates upload URLs) and 'list_installable_artifacts' (which lists artifacts) by focusing specifically on status retrieval for a single artifact.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear contextual guidance by explaining the artifact lifecycle (upload → processing → usable) and stating this helps understand 'when an uploaded installable artifacts becomes usable.' This implicitly defines when to use the tool (after upload, before usage), though it doesn't explicitly name alternative tools for the other lifecycle stages.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_pipelineC
Read-onlyIdempotent
Inspect

Get a pipeline of a given app.

ParametersJSON Schema
NameRequiredDescriptionDefault
verboseNoInclude all pipeline details. Default: false
app_slugYesIdentifier of the Bitrise app
pipeline_idYesIdentifier of the pipeline
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnly/idempotent), but the description adds no behavioral context beyond the tautology. It does not disclose what happens if the pipeline_id is invalid, rate limiting concerns, or cache behavior despite openWorldHint=true.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely brief (single sentence), but suffers from under-specification rather than efficient information density. The sentence is tautological and wastes the opportunity to front-load distinguishing characteristics.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema provided, the description fails to characterize the returned pipeline object (structure, fields, relationships). While annotations cover the safety profile, the agent lacks context to understand what data it will receive or how to handle the results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions ('Identifier of the Bitrise app', 'Identifier of the pipeline'), establishing baseline 3. The description references 'given app' loosely mapping to app_slug but adds no semantic value regarding formats, validation rules, or relationships between parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Get a pipeline of a given app' largely restates the tool name (tautology) and fails to distinguish from sibling tool list_pipelines (which also gets pipelines). It does not explain what constitutes a pipeline in this CI/CD context or the scope of data retrieved.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this specific retrieval endpoint versus list_pipelines, nor any prerequisites (e.g., requiring app_slug from a prior list_apps call). No mention of error conditions or when the tool should be avoided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_potential_testersA
Read-onlyIdempotent
Inspect

Gets a list of potential testers whom can be added as testers to a specific tester group. The list consists of Bitrise users having access to the related Release Management connected app.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesThe uuidV4 identifier of the tester group. This field is mandatory.
pageNoSpecifies which page should be returned from the whole result set in a paginated scenario. Default value is 1.
searchNoSearches for potential testers based on email or username using a case-insensitive approach.
items_per_pageNoSpecifies the maximum number of potential testers to return having access to a specific connected app. Default value is 10.
connected_app_idYesThe uuidV4 identifier of the app the tester group is connected to. This field is mandatory.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnly, non-destructive). Description adds valuable context about the population (Bitrise users having access to the connected app). Missing details on pagination behavior or return structure, but this is partially mitigated by schema annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences. First establishes purpose, second clarifies data source. No redundancy or filler. 'Gets' is front-loaded and every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a list tool with full schema coverage and rich annotations. Covers the Release Management domain context (connected apps). Could improve by acknowledging pagination or relationship to the add_testers workflow, but sufficient for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, baseline is 3. Description reinforces the domain of 'connected_app_id' and 'id' by mentioning 'tester group' and 'connected app' but doesn't add syntax, validation rules, or format details beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb-resource combination ('Gets a list of potential testers') and specific scope (users for a tester group). The phrase 'whom can be added as testers' implicitly distinguishes this from get_testers (likely current members), though it doesn't explicitly name the sibling tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides implied usage context by stating these users 'can be added as testers' and requires the connected app, hinting at the workflow. However, it lacks explicit guidance on when to use this vs get_testers or that it should be paired with add_testers_to_tester_group.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_tester_groupC
Read-onlyIdempotent
Inspect

Gives back the details of the selected tester group.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesThe uuidV4 identifier of the tester group. This field is mandatory.
connected_app_idYesThe uuidV4 identifier of the app the tester group is connected to. This field is mandatory.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations already declare readOnlyHint=true and destructiveHint=false, the description adds no behavioral context beyond what annotations and the name imply. It does not describe error behavior (e.g., what happens if the ID is not found), what specific 'details' are returned, or the relationship between the tester group and the connected_app_id parameter.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence and appropriately brief, but 'gives back' is an informal word choice and 'selected' creates mild ambiguity. It is front-loaded but could be more precise with standard terminology like 'Returns' or 'Retrieves'.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that no output schema exists, the description inadequately covers what data is returned—merely stating 'details' is insufficient. It also fails to clarify that this is a lookup operation requiring a valid UUID pair, or what constitutes a valid 'tester group' in this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already fully documents both the 'id' and 'connected_app_id' parameters as UUIDv4 strings. The description adds no prose explanation of these parameters, but given the high schema coverage, it meets the baseline expectation without needing to compensate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it retrieves ('gives back') details for a tester group, which is clear enough to identify the tool's function. However, 'gives back' is imprecise, and 'selected' is ambiguous—it implies the group is pre-selected rather than explaining that the tool performs a lookup by ID. It does not distinguish from list_tester_groups by explaining this is a single-item fetch.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus list_tester_groups (which returns multiple groups) or create_tester_group/update_tester_group. It does not mention that the IDs required as parameters must be obtained elsewhere, nor does it indicate prerequisites for invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_testersB
Read-onlyIdempotent
Inspect

Gives back a list of testers that has been associated with a tester group related to a specific connected app.

ParametersJSON Schema
NameRequiredDescriptionDefault
pageNoSpecifies which page should be returned from the whole result set in a paginated scenario. Default value is 1.
items_per_pageNoSpecifies the maximum number of testers to be returned that have been added to a tester group related to the specific connected app. Default value is 10.
tester_group_idNoThe uuidV4 identifier of a tester group. If given, only testers within this specific tester group will be returned.
connected_app_idYesThe uuidV4 identifier of the app the tester group is connected to. This field is mandatory.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly/idempotent/destructive status. Description adds valuable context about the hierarchical relationship (testers→groups→apps) but omits behavioral details like pagination format or empty result handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence contains core information. Minor wordiness ('Gives back' vs 'Returns', 'that has been' vs 'associated') prevents a 5.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks output schema; description mentions returning 'a list' but doesn't specify what tester objects contain. Optional parameter behavior (omitting tester_group_id to get all app testers) is implied but not explicit.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, establishing baseline 3. Description reinforces parameter purpose by mapping 'tester group' and 'connected app' to their respective IDs but does not add syntax/format details beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb+resource: retrieves a list of testers. Scope is specific (associated with tester groups for a connected app) and distinguishes from siblings like get_tester_group (returns group metadata) and list_tester_groups (returns groups, not testers).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use vs alternatives (e.g., get_potential_testers), when to apply the optional tester_group_id filter, or how to handle pagination across multiple results.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_workspaceB
Read-onlyIdempotent
Inspect

Get details for one workspace

ParametersJSON Schema
NameRequiredDescriptionDefault
workspace_slugYesSlug of the Bitrise workspace
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The term 'Get' aligns with readOnlyHint=true and destructiveHint=false annotations, confirming read-only behavior. However, the description adds no additional behavioral context such as error handling for missing workspaces, rate limits, or data freshness expectations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise at 5 words with no filler. Front-loaded with verb. However, brevity borders on under-specification given the lack of output schema to clarify what 'details' means.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given comprehensive annotations covering safety profile and 100% input schema coverage, the description is minimally viable. However, absence of output schema coupled with vague 'details' reference leaves return value semantics unexplained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage ('Slug of the Bitrise workspace'), the baseline is 3. The description adds no supplementary parameter guidance beyond the schema, but none is needed given complete schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Uses specific verb 'Get' with resource 'workspace' and scope 'one', implicitly distinguishing from sibling list_workspaces. However, lacks specificity on what 'details' entails and does not explicitly contrast with related workspace operations like get_workspace_groups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus list_workspaces or other workspace-related operations. No mention of prerequisites or typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_workspace_groupsB
Read-onlyIdempotent
Inspect

Get the groups in a workspace

ParametersJSON Schema
NameRequiredDescriptionDefault
workspace_slugYesSlug of the Bitrise workspace
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true, establishing the safety profile. The description adds no behavioral context beyond this (e.g., return format, pagination), meeting the lower bar set by comprehensive annotations without contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of six words is appropriately sized and front-loaded. While extremely brief, it efficiently conveys the core purpose without redundancy or wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read operation with one parameter and comprehensive annotations, the description provides minimum viable context. However, lacks explanation of what constitutes a 'group' or return value details, which would be helpful given the absence of an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage with 'workspace_slug' defined as 'Slug of the Bitrise workspace'. The description implicitly references the workspace scope but adds no syntax, format details, or semantics beyond the schema, warranting the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Uses specific verb 'Get' with clear resource 'groups in a workspace', identifying the scope and target. However, does not explicitly differentiate from similar sibling tools like get_workspace_members or create_workspace_group.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives such as get_workspace_members or create_workspace_group. No mention of prerequisites or conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_workspace_membersC
Read-onlyIdempotent
Inspect

Get the members of a workspace

ParametersJSON Schema
NameRequiredDescriptionDefault
workspace_slugYesSlug of the Bitrise workspace
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover read-only/idempotent/destructive traits, but description adds no behavioral context regarding return format, pagination, or whether pending invitations are included.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise at six words with no redundancy, though minimalism sacrifices important contextual details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Sufficient for a simple single-parameter read operation, but lacks output description (no output schema exists) and omits Bitrise context (only mentioned in parameter schema).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with workspace_slug described as 'Slug of the Bitrise workspace'. Description adds no additional parameter context, meeting baseline for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

States clear verb (Get) and resource (members of workspace) but lacks specificity about what constitutes a member and doesn't distinguish from sibling tools like get_workspace_groups or invite_member_to_workspace.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives such as invite_member_to_workspace, or prerequisites like workspace existence.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

invite_member_to_workspaceBInspect

Invite new Bitrise users to a workspace.

ParametersJSON Schema
NameRequiredDescriptionDefault
emailYesEmail address of the user
workspace_slugYesSlug of the Bitrise workspace
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnly=false, destructive=false, openWorld=true). The description adds minimal context beyond annotations—'invite' implies email delivery but doesn't confirm external side effects, error cases (duplicate invites), or permission requirements. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, 9 words, front-loaded with action verb. No redundant or wasted text. However, extreme brevity leaves room for ambiguity about behavioral details (email delivery, idempotency) that could fit in one additional clause.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With simple 2-parameter schema (fully described), output schema absent, and annotations covering mutation/external traits, the description meets minimum viability. Missing: explicit confirmation that emails are sent (openWorld implication), error scenarios, and required caller permissions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with complete descriptions for 'email' and 'workspace_slug'. With high schema coverage, baseline is 3. Description lists no additional constraints (email format validation, workspace slug format) or examples beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific action verb 'Invite' and resource 'workspace'. Distinguishes from sibling 'add_member_to_group' by specifying 'new Bitrise users' (external onboarding) vs existing member management. However, does not explicitly name sibling alternatives or clarify the distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use versus alternatives (add_member_to_group), prerequisites (admin permissions), or conditions (user already invited). The word 'new' implies external users, but lacks explicit when/when-not rules.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_appsB
Read-onlyIdempotent
Inspect

List all apps for the currently authenticated user account

ParametersJSON Schema
NameRequiredDescriptionDefault
nextNoSlug of the first app in the response
limitNoMax number of elements per page (default: 50)
titleNoFilter apps by title
sort_byNoOrder of the apps. If set, you should accept the response as sortedlast_build_at
project_typeNoFilter apps by project type (e.g., 'ios', 'android')
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds auth scope context ('currently authenticated user account') not present in annotations, but fails to disclose pagination behavior (cursor-based 'next' token), default limit behavior, or sorting implications despite these being significant behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 9 words is appropriately front-loaded with the action verb. However, extreme brevity sacrifices completeness—no room for pagination hints or filtering capabilities despite the rich parameter set.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a basic list operation with good safety annotations, but underserves the tool's complexity. With 5 parameters supporting filtering, pagination, and sorting, the description should acknowledge these capabilities rather than implying a simple unfiltered dump.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema adequately documents all 5 parameters. The description mentions no parameters, relying entirely on structured schema documentation, meeting the baseline for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb (List) and resource (apps) with clear scope (currently authenticated user account). However, it fails to explicitly distinguish from sibling 'get_app' which retrieves a single app versus listing multiple.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this versus 'get_app' or other list operations. No mention of prerequisites or when filtering via 'title' or 'project_type' parameters is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_artifactsC
Read-onlyIdempotent
Inspect

Get a list of all build artifacts.

ParametersJSON Schema
NameRequiredDescriptionDefault
nextNoSlug of the first artifact in the response
limitNoMax number of elements per page (default: 50)
app_slugYesIdentifier of the Bitrise app
build_slugYesIdentifier of the build
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover read-only/idempotent/destructive traits, but the description adds no behavioral context about pagination (despite next/limit parameters), response format, or scope limitations. It should clarify results are scoped to the specified build.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single short sentence avoids bloat, but constitutes under-specification rather than effective conciseness. It wastes the opportunity to front-load critical differentiators (e.g., per-build scope, pagination).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Inadequate for a paginated list operation requiring parent resource context. Lacks explanation of pagination mechanics, the relationship between app_slug/build_slug parameters and the returned artifacts, and how this differs from artifact deletion or update workflows.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with all 4 parameters documented. The description adds no parameter context (e.g., that 'next' handles pagination cursors, or that results require providing valid app/build slugs), maintaining baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb+resource (Get/list + build artifacts) and specifies 'all' suggesting comprehensive listing. However, it fails to distinguish from sibling tool 'list_installable_artifacts' or clarify that this retrieves artifacts for a specific build (not globally).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use versus alternatives (e.g., get_artifact for single artifact retrieval), nor prerequisites that app_slug and build_slug must be obtained from prior API calls (e.g., list_builds).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_available_stacksA
Read-onlyIdempotent
Inspect

List available stacks with their machine configurations and version information. When a workspace_slug is provided, returns stacks available for that workspace including any custom stacks. When omitted, returns globally available stacks.

ParametersJSON Schema
NameRequiredDescriptionDefault
workspace_slugNoSlug of the Bitrise workspace. When provided, lists stacks available for that workspace (including custom stacks). When omitted, lists globally available stacks.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnly/idempotent safety. Description adds valuable behavioral context: explains return data includes 'machine configurations and version information' and clarifies the workspace/global scoping logic not evident in annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences with zero waste. Front-loaded with action and return value, followed by conditional parameter logic. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Appropriately complete for a list tool with good annotations. Explains the optional parameter's effect, mentions return data characteristics, and covers the workspace/global dichotomy. No output schema exists, but the description adequately covers what the tool returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so schema carries the descriptive load. The description reinforces the parameter logic (workspace vs. global) but does not add syntax details, validation rules, or examples beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Excellent specificity: verb 'List' + resource 'available stacks' + return data 'machine configurations and version information'. The workspace vs. global scoping clearly distinguishes this from generic list operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Clearly explains conditional behavior: when to provide workspace_slug (for workspace-specific and custom stacks) vs. when to omit (for globally available stacks). Lacks explicit naming of sibling alternatives, but the parameter guidance is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_branchesA
Read-onlyIdempotent
Inspect

List the branches with existing builds of an app's repository.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already cover readOnly/destructive/idempotent behavior. The description adds valuable behavioral context: the filter condition 'with existing builds' clarifies this doesn't return all branches, only those with associated builds. Does not disclose pagination behavior or empty results handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with verb 'List'. Zero wasted words. The qualifying phrase 'with existing builds' is essential for scoping and earns its place. Appropriately sized for tool complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 100% schema coverage, strong annotations, and only 1 parameter, the description adequately covers the tool's purpose. Without output schema, a brief indication of return format (branch names vs. objects) would be helpful but not required for this simple list operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with 'app_slug' described as 'Identifier of the Bitrise app'. Description references 'app's repository' which loosely maps to the parameter but adds no syntax details, constraints, or examples beyond the schema definition. Baseline 3 appropriate when schema does heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Specific verb 'List' paired with clear resource 'branches with existing builds' and scope 'of an app's repository'. The phrase 'with existing builds' effectively distinguishes this from sibling list_builds (which returns build objects) and list_apps (which returns apps).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'with existing builds' implies this tool filters to branches that have build history, suggesting when to use it (when you need to find buildable branches vs. all branches). However, no explicit when/when-not guidance or comparison to alternatives like list_builds is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_build_distribution_versionsC
Read-onlyIdempotent
Inspect

Lists Build Distribution versions. Release Management offers a convenient, secure solution to distribute the builds of your mobile apps to testers without having to engage with either TestFlight or Google Play. Once you have installable artifacts, Bitrise can generate both private and public install links that testers or other stakeholders can use to install the app on real devices via over-the-air installation. Build distribution allows you to define tester groups that can receive notifications about installable artifacts. The email takes the notified testers to the test build page, from where they can install the app on their own device. Build distribution versions are the app versions available for testers.

ParametersJSON Schema
NameRequiredDescriptionDefault
pageNoSpecifies which page should be returned from the whole result set in a paginated scenario. Default value is 1.
items_per_pageNoSpecifies the maximum number of build distribution versions returned per page. Default value is 10.
connected_app_idYesThe uuidV4 identifier of the app the build distribution is connected to. This field is mandatory.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true, so safety characteristics are covered. The description adds domain context defining what versions are, but does not disclose API-specific behavioral traits like error conditions (e.g., empty result sets), rate limits, or pagination behavior beyond the schema defaults.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the key action but then expands into four sentences of generic domain explanation about Build Distribution concepts (TestFlight alternatives, OTA installation, tester notifications) that apply to multiple tools in this domain. Only the first and last sentences are specific to this listing endpoint; the middle content is noise for tool selection.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema provided, the description compensates minimally by defining what versions are, but fails to describe the response structure, fields returned, or pagination handling. Given the 100% input schema coverage, it meets minimum viability but leaves significant gaps for an agent expecting to process the output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with detailed descriptions for all three parameters (page, items_per_page, connected_app_id). The description text adds no parameter-specific guidance, but the schema completeness merits the baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with 'Lists Build Distribution versions' and defines them as 'the app versions available for testers,' providing a specific verb and resource. However, it fails to explicitly distinguish this from the sibling tool `list_build_distribution_version_test_builds` (which lists builds for a specific version), potentially causing confusion between listing versions versus listing test builds for a version.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides extensive domain background about what Build Distribution is and how it works (OTA installation, tester groups, emails), but offers no guidance on when to invoke this tool versus alternatives, prerequisites (e.g., needing a `connected_app_id` from elsewhere), or workflow sequencing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_build_distribution_version_test_buildsB
Read-onlyIdempotent
Inspect

Gives back a list of test builds for the given build distribution version.

ParametersJSON Schema
NameRequiredDescriptionDefault
pageNoSpecifies which page should be returned from the whole result set in a paginated scenario. Default value is 1.
versionYesThe version of the build distribution. This field is mandatory.
items_per_pageNoSpecifies the maximum number of test builds to return for a build distribution version per page. Default value is 10.
connected_app_idYesThe uuidV4 identifier of the app the build distribution is connected to. This field is mandatory.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering safety and mutation profiles. The description adds the scoping constraint that results are filtered to a specific build distribution version, but omits behavioral details like pagination limits or expiration policies despite the presence of pagination parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no redundant words. It is front-loaded with the core action and resource. Minor deduction for the slightly informal 'Gives back' instead of a precise verb like 'Returns' or 'Lists'.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of pagination parameters and openWorldHint annotations, the description is minimally adequate but incomplete. It fails to mention pagination behavior or the fact that results may change based on external state (open world), which would help an agent understand result variability.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema fully documents all four parameters including pagination controls. The description implies the 'version' parameter through the phrase 'for the given build distribution version', but adds no syntax guidance or semantic details beyond what the schema already provides, meeting the baseline for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description identifies the resource ('test builds') and scope ('for the given build distribution version'), distinguishing it from the sibling tool 'list_build_distribution_versions'. However, 'Gives back' is an awkward verb choice compared to the standard 'Lists' or 'Returns', slightly reducing clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites (e.g., that a user likely needs to obtain build distribution versions from 'list_build_distribution_versions' before using this tool).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_buildsB
Read-onlyIdempotent
Inspect

List all the builds of a specified Bitrise app or all accessible builds.

ParametersJSON Schema
NameRequiredDescriptionDefault
nextNoSlug of the first build in the response
limitNoMax number of elements per page (default: 50)
branchNoFilter builds by branch
statusNoFilter builds by status (0: not finished, 1: successful, 2: failed, 3: aborted, 4: in-progress)
sort_byNoOrder of builds: created_at (default), running_firstcreated_at
verboseNoInclude all build details. Default: false
app_slugNoIdentifier of the Bitrise app
workflowNoFilter builds by workflow
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true, covering safety profile. The description adds minimal behavioral context beyond this, failing to clarify pagination behavior despite presence of 'next'/'limit' parameters, or what 'accessible' means regarding permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The single-sentence description is appropriately concise at 12 words with no redundancy. However, it lacks structural elements like prerequisite hints ('Requires app_slug for specific app builds') that could improve scannability without sacrificing brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters with full schema coverage and read-only annotations, the description suffices for basic invocation. However, with no output schema provided, the description should ideally clarify return structure or pagination semantics—the gap is noticeable for a listing operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline score is 3. The description provides minimal additional semantic value—only loosely mapping 'specified Bitrise app' to the app_slug parameter. It does not explain parameter interactions (e.g., that 'next' requires previous response data) or validation constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'List' with clear resource 'builds' and defines scope ('specified Bitrise app or all accessible builds'). However, it doesn't explicitly distinguish from sibling 'get_build' which retrieves a single build by slug.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the optional nature of app_slug by contrasting 'specified...app' with 'all accessible builds', giving implicit guidance on filtering. However, it lacks explicit when-to-use guidance versus 'get_build' or other build-related tools, and doesn't explain pagination workflows.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_build_workflowsC
Read-onlyIdempotent
Inspect

List the workflows of an app.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations correctly identify the operation as read-only, idempotent, and non-destructive, the description adds no behavioral context such as what data the workflows contain, whether the list is paginated, caching behavior, or what the response structure looks like given the absence of an output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The five-word single sentence contains no redundant information and efficiently states the core action. While appropriately front-loaded, its extreme brevity borders on under-specification given the lack of output schema documentation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Considering the low complexity (single required parameter) and rich annotations covering safety properties, the description minimally suffices for invocation. However, the absence of both an output schema and any description of return values or workflow structure leaves a significant gap in contextual completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage for the app_slug parameter ('Identifier of the Bitrise app'), the description meets the baseline by implicitly referencing 'an app' which maps to the required parameter. It adds no additional semantic details about the parameter format or constraints beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'List the workflows of an app' provides a clear verb and resource, identifying that it retrieves workflow configurations. However, it does not differentiate from sibling tools like list_builds or list_pipelines, nor does it clarify what distinguishes a 'workflow' from a 'build' or 'pipeline' in this context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives such as get_build or list_pipelines. It omits prerequisites, failure conditions, or specific scenarios where this listing would be preferred over other retrieval tools in the sibling set.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_cache_itemsB
Read-onlyIdempotent
Inspect

List the key-value cache items belonging to an app.

ParametersJSON Schema
NameRequiredDescriptionDefault
nextNoGetting cache items created before the given parameter (RFC3339 time format)
limitNoMax number of elements per page (default: 100)
app_slugYesIdentifier of the Bitrise app
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly/idempotent/destructive hints, so description doesn't need to cover safety. It adds 'key-value' specificity which is helpful context. However, it fails to disclose the pagination behavior implied by limit/next parameters or what the return structure contains (keys versus full objects).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, appropriately brief. However, given the pagination complexity (RFC3339 cursor-based 'next' parameter), one sentence is insufficient to convey necessary usage context, making it under-structured rather than optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description should indicate what gets returned (cache keys? metadata?). While 100% parameter coverage helps, the lack of pagination guidance and return value description leaves gaps for a tool requiring cursor-based navigation through potentially large datasets.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all three parameters. The description only reinforces the 'app' requirement via 'belonging to an app', adding no additional syntax guidance, format details, or pagination strategy beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'List' with specific resource 'key-value cache items' and scope 'belonging to an app'. However, it does not explicitly distinguish from sibling 'get_cache_item_download_url' (which retrieves a specific item's download URL) or clarify whether this returns keys, metadata, or full values.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this versus 'delete_cache_item' or 'get_cache_item_download_url'. Does not explain pagination workflow (that 'next' parameter should be populated from previous response cursors) or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_connected_appsA
Read-onlyIdempotent
Inspect

List Release Management connected apps available for the authenticated account within a workspace.

ParametersJSON Schema
NameRequiredDescriptionDefault
pageNoSpecifies which page should be returned from the whole result set in a paginated scenario. Default value is 1.
searchNoSearch by bundle ID (for ios), package name (for android), or app title (for both platforms). The filter is case-sensitive.
platformNoFilters for a specific mobile platform for the list of connected apps. Available values are: 'ios' and 'android'.
project_idNoSpecifies which Bitrise Project you want to get associated connected apps for
items_per_pageNoSpecifies the maximum number of connected apps returned per page. Default value is 10.
workspace_slugYesIdentifier of the Bitrise workspace for the Release Management connected apps. This field is mandatory.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and idempotentHint=true, covering safety profile. The description adds domain context ('Release Management') and auth scope ('authenticated account') but omits behavioral details like pagination limits, default sorting, or total count availability that would help the agent handle the paginated results.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 14 words with zero redundancy. Front-loaded with verb and resource, follows with scope and auth context. Every word earns its place with no filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 100% schema coverage and comprehensive annotations covering safety/idempotency, the description successfully conveys the high-level purpose without needing to document return values (no output schema). Minor gap: could acknowledge the filtering/pagination capabilities briefly since those are key to effective usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema carries the full documentation burden. The description reinforces the workspace context but does not add syntax details, examples, or cross-parameter relationships (e.g., how search interacts with platform) beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verb 'List' with resource 'Release Management connected apps' and scope 'within a workspace'. It implicitly distinguishes from sibling 'get_connected_app' by using plural form and 'List' vs 'Get', but does not explicitly clarify when to use this versus the singular fetch operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'within a workspace' providing context for the required workspace_slug parameter, but provides no explicit when-to-use guidance, filtering recommendations, or comparisons to siblings like 'get_connected_app' or 'create_connected_app'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_group_rolesC
Read-onlyIdempotent
Inspect

List group roles for an app

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
role_nameYesName of the role
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly/idempotent/destructive status, but the description fails to add context about what data structure is returned, pagination behavior, or how the 'role_name' filter constrains results. The openWorldHint annotation suggests external resource access, but the description doesn't explain what external data might be retrieved.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely brief at five words with no redundant text. However, the brevity sacrifices necessary contextual detail about the query pattern and return values.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Incomplete given the tool's likely complexity. With no output schema provided, the description omits crucial information: the return format (groups? users? permissions?), the semantic meaning of querying by both app and role_name, and how results relate to group membership management.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with 'Identifier of the Bitrise app' and 'Name of the role' adequately documenting the parameters. The description doesn't add syntax details, format constraints, or explain why role_name is required for a 'list' operation (implying filtering), warranting the baseline score for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

States the basic action (List) and resource (group roles) but remains ambiguous about the relationship between inputs and outputs. The required 'role_name' parameter suggests this filters or queries specific role assignments rather than enumerating all roles, but the description doesn't clarify this scope limitation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this versus mutation alternatives like 'replace_group_roles' or when querying group membership via sibling tools. No prerequisites or exclusion criteria are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_installable_artifactsB
Read-onlyIdempotent
Inspect

List Release Management installable artifacts of a connected app available for the authenticated account.

ParametersJSON Schema
NameRequiredDescriptionDefault
pageNoSpecifies which page should be returned from the whole result set in a paginated scenario. Default value is 1.
branchNoFilters for the Bitrise CI branch of the installable artifact on which it has been generated on.
searchNoSearch by version, filename or build number (Bitrise CI). The filter is case-sensitive.
sourceNoFilters for the source of installable artifacts to be returned. Available values are 'api' and 'ci'.
versionNoFilters for the version this installable artifact was created for. This field is required if the distribution_ready filter is set to true.
platformNoFilters for a specific mobile platform for the list of installable artifacts. Available values are: 'ios' and 'android'.
workflowNoFilters for the Bitrise CI workflow of the installable artifact it has been generated by.
after_dateNoA date in ISO 8601 string format specifying the start of the interval when the installable artifact to be returned was created or uploaded. This value will be defaulted to 1 month ago if distribution_ready filter is not set or set to false.
before_dateNoA date in ISO 8601 string format specifying the end of the interval when the installable artifact to be returned was created or uploaded. This value will be defaulted to the current time if distribution_ready filter is not set or set to false.
store_signedNoFilters for store ready installable artifacts. This means signed .aab and .ipa (with distribution type app-store) installable artifacts.
artifact_typeNoFilters for a specific artifact type or file extension for the list of installable artifacts. Available values are: 'aab' and 'apk' for android artifacts and 'ipa' for ios artifacts.
items_per_pageNoSpecifies the maximum number of installable artifacts to be returned per page. Default value is 10.
connected_app_idYesIdentifier of the Release Management connected app for the installable artifacts. This field is mandatory.
distribution_readyNoFilters for distribution ready installable artifacts. This means .apk and .ipa (with distribution type ad-hoc, development, or enterprise) installable artifacts.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent, and open-world behavior. The description adds value by specifying 'authenticated account' (auth context) and 'Release Management' (domain), but lacks details on rate limits, pagination behavior, or filter interactions (e.g., version being required when distribution_ready is true).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficiently constructed sentence (12 words) that front-loads the verb ('List') and uses precise domain terminology without redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high schema coverage (100%) and presence of annotations, the description adequately orients the agent to the tool's purpose. However, for a complex tool with 14 filtering parameters and no output schema, it lacks guidance on the filtering capabilities and typical usage patterns, leaving the agent to discover this through the schema alone.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all 14 parameters thoroughly. The description adds high-level context ('Release Management installable artifacts', 'connected app') that gives semantic meaning to parameters like 'connected_app_id' and 'artifact_type', but does not explain individual parameters or their relationships beyond this.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the action ('List'), resource ('Release Management installable artifacts'), and scope ('of a connected app available for the authenticated account'). However, it does not explicitly differentiate from the sibling tool 'list_artifacts', relying only on implicit domain-specific qualifiers ('Release Management', 'installable').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'list_artifacts' or 'get_artifact', nor does it mention prerequisites (e.g., needing a valid connected_app_id) or when to avoid using it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_outgoing_webhooksB
Read-onlyIdempotent
Inspect

List the outgoing webhooks of an app.

ParametersJSON Schema
NameRequiredDescriptionDefault
nextNoSlug of the first outgoing webhook in the response
limitNoMax number of elements per page (default: 50)
app_slugYesIdentifier of the Bitrise app
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description aligns with annotations (readOnly/idempotent) by using 'List' and does not contradict them. However, it adds minimal behavioral context beyond annotations (e.g., silent on pagination behavior despite next/limit parameters).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single efficient sentence with action front-loaded. Extremely terse to the point of under-specification, but contains no wasted words or structural issues.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple list operation given rich annotations and complete schema coverage, but omits pagination behavior and return value description despite lacking output schema guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline applies. Description mentions 'of an app' which links to the app_slug parameter, but adds no semantic detail beyond schema descriptions for pagination params (next/limit).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Uses specific verb 'List' with clear resource 'outgoing webhooks of an app', distinguishing it from sibling create/update/delete operations. Lacks platform context (Bitrise) and scope details (e.g., whether it lists all or paginated results).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use versus siblings (create_outgoing_webhook, update_outgoing_webhook) or prerequisites like app registration or permission requirements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_pipelinesB
Read-onlyIdempotent
Inspect

List all pipelines and standalone builds of an app.

ParametersJSON Schema
NameRequiredDescriptionDefault
afterNoList pipelines/standalone builds run after a given date (RFC3339 time format)
limitNoMax number of elements per page (default: 10)
beforeNoList pipelines/standalone builds run before a given date (RFC3339 time format)
branchNoFilter by the branch which was built
statusNoFilter by the status of the pipeline/standalone build
verboseNoInclude all pipeline details. Default: false
app_slugYesIdentifier of the Bitrise app
pipelineNoFilter by the name of the pipeline
workflowNoFilter by the name of the workflow used for the pipeline/standalone build
build_numberNoFilter by the pipeline/standalone build number
commit_messageNoFilter by the commit message of the pipeline/standalone build
trigger_event_typeNoFilter by the event that triggered the pipeline/standalone build
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnlyHint, destructiveHint, idempotentHint) adequately. Description adds scope ('all') but omits pagination behavior despite the presence of a `limit` parameter, and doesn't mention return format or filtering capabilities.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no redundant words. However, given the complexity (11 parameters, filtering capabilities), it may be overly terse rather than appropriately concise—omitting crucial behavioral context like pagination.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Minimal but viable for a listing tool. Acknowledges both pipelines and standalone builds (important domain distinction). However, with 11 parameters including date ranges and status filters, the description should explicitly mention that results are filterable and paginated. No output schema exists to compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, establishing a baseline of 3. The description mentions 'standalone builds' which provides minor context for the `app_slug` parameter, but adds no syntax details, examples, or constraints beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb (List) and resource (pipelines and standalone builds) clearly. However, it fails to differentiate from sibling tool `list_builds` or clarify the relationship between pipelines and standalone builds, which could confuse selection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this versus `get_pipeline` (single retrieval) or `list_builds` (overlapping domain), or when to apply specific filters. No prerequisites or contextual triggers are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_tester_groupsA
Read-onlyIdempotent
Inspect

Gives back a list of tester groups related to a specific Release Management connected app.

ParametersJSON Schema
NameRequiredDescriptionDefault
pageNoSpecifies which page should be returned from the whole result set in a paginated scenario. Default value is 1.
items_per_pageNoSpecifies the maximum number of tester groups to return related to a specific connected app. Default value is 10.
connected_app_idYesThe uuidV4 identifier of the app the tester group is connected to. This field is mandatory.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true. The description adds valuable domain context ('Release Management') and clarifies the data relationship (groups related to a specific app). It does not contradict annotations, though it could enhance transparency by noting pagination behavior or empty result handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single front-loaded sentence with no redundant words. Every clause earns its place by specifying action, resource, and domain constraint. Minor deduction for 'Gives back' being slightly informal; 'Returns' would be more standard.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Appropriately complete given the simple read-only list operation. The 100% schema coverage handles parameter details, while annotations cover safety traits. The description captures the essential business context (Release Management domain) without needing to elaborate return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description adds semantic value by contextualizing the connected_app_id parameter as referring to a 'Release Management connected app,' helping the agent understand the domain relationship beyond the raw UUID parameter description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the action (returns a list) and resource (tester groups), with specific domain context ('Release Management connected app'). It distinguishes this from sibling operations like create_tester_group or get_tester_group through the plural 'list' framing and app-specific scoping. Deducting one point for 'Gives back' being less precise than 'Returns' or 'Lists'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides implied usage context by requiring a 'specific Release Management connected app,' which hints at the prerequisite (connected_app_id). However, it lacks explicit guidance on when to use this versus get_tester_group (single retrieval) or how it relates to the broader tester management workflow.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_workspacesB
Read-onlyIdempotent
Inspect

List the workspaces the user has access to

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and destructiveHint. The description adds useful context about permission boundaries ('user has access to') not captured in annotations. However, it lacks disclosure about pagination behavior, response structure, or rate limiting despite having no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of seven words with no redundancy. Front-loaded with the action verb. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the purpose is clear and annotations cover safety properties, the description lacks output structure details needed given the absence of an output schema. No mention of pagination or workspace object fields returned. Adequate for a simple list operation but incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema contains zero parameters. Per scoring rules, zero parameters establishes a baseline of 4. The description appropriately reflects this simplicity without adding unnecessary parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb ('List') and resource ('workspaces') with specific scope ('user has access to'). Distinguishes from get_workspace by implying a collection return filtered by user permissions, though it doesn't explicitly contrast with siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this versus get_workspace or other workspace-related tools. Missing indication that this is typically a prerequisite for calling workspace-specific operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

meA
Read-onlyIdempotent
Inspect

Get user info for the currently authenticated user account

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover safety profile (readOnly, idempotent, non-destructive), lowering the bar. The description adds valuable scoping context ('currently authenticated') not in annotations, but omits return format details or data fields included in 'user info'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single efficient sentence with zero redundancy. The qualifier 'currently authenticated' is front-loaded and essential for distinguishing the resource scope.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a zero-parameter read-only operation with good annotation coverage. Would benefit from mentioning what user attributes are returned (email, ID, etc.), but sufficient for tool selection given the simple 'me' pattern.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Zero parameters present (empty object schema), which establishes baseline 4. No parameter documentation needed or expected, and the description correctly implies no filtering arguments are accepted.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Specific verb 'Get' and explicit resource 'currently authenticated user account' clearly define the scope. The 'currently authenticated' qualifier effectively distinguishes this from sibling tools like get_workspace_members or get_testers that require IDs for other users.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides implied usage through the 'currently authenticated' qualifier (use when you need the current user's identity), but lacks explicit when/when-not guidance or named alternatives for fetching other users' data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notify_tester_groupCInspect

Notifies a tester group about a new test build.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesThe uuidV4 identifier of the tester group whose members will be notified about the test build.
test_build_idYesThe unique identifier of the test build what will be sent in the notification of the tester group.
connected_app_idYesThe uuidV4 identifier of the related Release Management connected app.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations indicate this is a non-readonly, non-idempotent, open-world operation, the description adds no context about what 'notify' entails (email, push, in-app), nor does it warn about the implications of idempotentHint=false (duplicate notifications if called twice).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The single-sentence description is efficiently worded with no redundancy. However, for a mutation tool with external side effects (openWorldHint=true), it may be overly terse, lacking necessary behavioral caveats that would justify a fifth sentence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the rich schema coverage and safety annotations, the description meets minimum viability by identifying the core action. However, it lacks expected context for a notification tool: notification mechanism, idempotency warnings, and success/failure indicators.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the baseline is 3. The description references 'tester group' and 'test build', mapping to the 'id' and 'test_build_id' parameters, but adds no semantic relationships (e.g., that the build must belong to the connected app) or format constraints beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description provides a specific verb ('Notifies') and clear resources ('tester group', 'test build'), establishing what the tool does. However, it does not explicitly differentiate from siblings like 'add_testers_to_tester_group' (which manages membership rather than sending notifications).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, prerequisites (e.g., whether the test build must exist first), or sequencing with related operations. The description states only the action, not the conditions for its use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rebuild_pipelineC
Destructive
Inspect

Rebuild a pipeline.

ParametersJSON Schema
NameRequiredDescriptionDefault
partialNoWhether to rebuild only unsuccessful workflows and their dependents
app_slugYesIdentifier of the Bitrise app
pipeline_idYesIdentifier of the pipeline
triggered_byNoWho triggered the rebuild
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is destructive and non-idempotent, but the description adds no behavioral context beyond the tautology. It does not explain what gets destroyed, whether this creates a new pipeline instance, or the side effects of the operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While brief and front-loaded, the single sentence fails to earn its place—it conveys no information beyond the tool name itself, representing under-specification rather than efficient communication.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive operation with no output schema, the description is inadequate. It lacks explanation of the rebuild lifecycle, interaction with existing pipeline runs, or the implications of the partial flag in the CI/CD workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, establishing a baseline of 3. The description adds no additional parameter semantics (e.g., explaining the relationship between partial and pipeline_id, or valid formats for triggered_by).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Rebuild a pipeline' is a tautology that restates the tool name 'rebuild_pipeline'. While the verb and resource are identifiable, it fails to distinguish from sibling tools like 'trigger_bitrise_build' or explain what rebuilding means in the Bitrise context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives (e.g., trigger_bitrise_build), prerequisites, or when to use the 'partial' parameter versus a full rebuild.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_appAInspect

Add a new app to Bitrise. After this app should be finished on order to be registered completely on Bitrise (via the finish_bitrise_app tool). Before doing this step, try understanding the repository details from the repository URL. This is a two-step process. First, you register the app with the Bitrise API, and then you finish the setup. The first step creates a new app in Bitrise, and the second step configures it with the necessary settings. If the user has multiple workspaces, always prompt the user to choose which one you should use. Don't prompt the user for finishing the app, just do it automatically.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNoThe title of the application (if not specified, will use the git repository's name)
providerNoThe git provider of the repositorygithub
repo_urlYesRepository URL
is_publicYesWhether the app's builds visibility is "public"
organization_slugYesThe organization (aka workspace) the app to add to
default_branch_nameNoThe default branch of the repository. Verify this branch exists in the remote repository.master
manual_approval_enabledNoToggles whether manual approval should be enabled for the app's builds
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate write operation (readOnlyHint:false) and non-destructive nature. Description adds valuable behavioral context that registration is incomplete after this call and requires automatic follow-up with finish_bitrise_app to configure settings. Explains the partial state creation behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Contains redundant exposition of the two-step process ('First, you register... and then you finish' followed by 'The first step creates... and the second step configures'). Opening sentence is strong but contains grammatical errors ('After this app should be finished on order'). Every sentence provides value but could be tightened.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Thoroughly addresses the multi-step workflow complexity, prerequisite steps, automatic orchestration requirements with finish_bitrise_app, and workspace selection logic. Complete for a task-oriented tool with clear annotations and no output schema to document.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with all 7 parameters fully documented. Description mentions repository URL and workspace selection in workflow context, adding usage context for required fields, but given complete schema coverage, baseline 3 is appropriate as description doesn't add detailed format/syntax beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Add a new app to Bitrise' with specific verb and resource. Explicitly distinguishes from sibling finish_bitrise_app by positioning this as 'step 1' of a two-step registration process, clarifying this only creates the app while the sibling configures it.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit workflow guidance: prerequisite to understand repository details from URL first, instruction to prompt user for workspace selection if multiple exist, and clear directive to automatically invoke finish_bitrise_app without prompting. Mentions alternative/follow-up tool by name.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_ssh_keyC
Destructive
Inspect

Add an SSH-key to a specific app.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
auth_ssh_public_keyYesPublic SSH key
auth_ssh_private_keyYesPrivate SSH key
is_register_key_into_provider_serviceNoRegister the key in the provider service
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states 'Add' but fails to explain the destructive nature flagged in annotations (does it overwrite existing keys?), the idempotency risks, or the openWorld implications of registering with external provider services (implied by the boolean parameter). No security warnings about handling private keys are provided despite the sensitive credential type.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise single sentence with no redundancy. However, for a destructive operation handling sensitive credentials (SSH private keys), the brevity may be excessive as it omits critical safety context that should be front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Inadequate for a credential-management tool with destructive annotations and no output schema. Missing critical context: side effects on existing keys, security handling of private keys in transit/storage, relationship to the provider service registration parameter, and expected outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema fully documents all 4 parameters including the optional provider registration flag. The description adds no additional parameter guidance (e.g., format expectations, security concerns for private_key), warranting the baseline score for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb (Add) and resource (SSH-key) with scope (to a specific app). However, it lacks detail on what the SSH key is used for (e.g., Git authentication vs deployment) and doesn't explicitly distinguish from potential sibling operations like updating/removing keys.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool versus alternatives, prerequisites (e.g., app exists, permissions required), or specific use cases. The description assumes the user already knows why they need to register an SSH key.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_webhookBInspect

Register an incoming webhook for a specific application.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations indicate openWorldHint=true and idempotentHint=false, the description fails to disclose what registration entails (e.g., whether it generates a URL/secret, if it can be reversed, or that repeated calls create duplicate registrations given idempotentHint=false). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with the action verb, zero redundancy. Appropriate length for the parameter complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Minimal but marginally sufficient for a single-parameter tool. However, given the openWorldHint and lack of output schema, the description should explain what the registration produces (e.g., a webhook URL) or side effects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema fully documents the app_slug parameter. The description implicitly maps to this parameter via 'for a specific application' but adds no semantic details about accepted formats or validation rules beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb (Register) and resource (webhook) and adds 'incoming' to distinguish from sibling tools like create_outgoing_webhook. However, it doesn't clarify the functional difference between 'register' and 'create' operations or when to prefer this over outgoing webhooks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives (e.g., create_outgoing_webhook), prerequisites for registration, or expected workflow. The phrase 'for a specific application' merely restates the app_slug parameter requirement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

replace_group_rolesC
DestructiveIdempotent
Inspect

Replace group roles for an app.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
role_nameYesName of the role
group_slugsYesList of group slugs
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations indicate destructive, idempotent, and open-world behavior, the description adds no explanatory context about what data is destroyed, whether the operation is reversible, or how idempotency manifests (e.g., safe retries). The word "replace" minimally aligns with but does not enrich the annotation hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The five-word description contains no redundancy or wasted language, presenting the core operation upfront. However, extreme brevity sacrifices necessary detail for a destructive permission operation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive, idempotent operation affecting access control, the description inadequately clarifies the replacement scope (specific group-role assignments vs. wholesale replacement), failure modes, or expected outcomes, leaving critical gaps despite rich annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage defining app_slug, role_name, and group_slugs, the description provides no additional parameter semantics (e.g., format constraints, valid role values, or group slug requirements), earning the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

States the verb "replace" and resource "group roles" with context "for an app", but lacks specificity regarding replacement scope (whether it replaces all existing app group roles or only modifies assignments for specified groups) and fails to distinguish from sibling read operations like list_group_roles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives (e.g., add_member_to_group or create_workspace_group), omits prerequisites such as existing group requirements, and fails to mention the destructive nature implied by annotations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_installable_artifact_public_install_pageB
Destructive
Inspect

Changes whether public install page should be available for the installable artifact or not.

ParametersJSON Schema
NameRequiredDescriptionDefault
connected_app_idYesIdentifier of the Release Management connected app for the installable artifact. This field is mandatory.
with_public_pageYesBoolean flag for enabling/disabling public install page for the installable artifact. This field is mandatory.
installable_artifact_idYesThe uuidv4 identifier for the installable artifact. This field is mandatory.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare it is a destructive, non-idempotent write operation. The description adds minimal behavioral context beyond this, failing to clarify what 'public install page' means (a shareable URL?), what happens to existing public links when disabled, or why the operation is considered non-idempotent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with the verb. Length is appropriate, though 'whether...or not' is redundant. No significant structural waste, but could be more direct (e.g., 'Enables or disables the public install page...').

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the destructive hint and lack of idempotency, the description should explain the implications of disabling the page (e.g., broken links) and what constitutes a public install page conceptually. As a side-effect-laden mutation with no output schema, it under-delivers on behavioral explanation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema fully documents each parameter. The description provides high-level semantic framing ('availability') for the boolean flag, meeting the baseline for well-documented schemas without adding parameter-specific syntax or format details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the target resource ('public install page') and the operation ('Changes... availability'), distinguishing it from sibling artifact tools like generate_installable_artifact_upload_url. However, the verb 'Changes' is generic and the phrasing 'whether...or not' is slightly passive, missing an opportunity to explicitly state this toggles/enables the page.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool versus alternatives (e.g., artifact deletion vs. disabling public access), nor any mention of prerequisites like requiring specific permissions on the connected app or the installable artifact to already exist.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

step_inputsB
Read-onlyIdempotent
Inspect

List inputs of a step with their defaults, allowed values etc.

ParametersJSON Schema
NameRequiredDescriptionDefault
step_refYesStep reference formatted as `step_lib_source::step_id@version`. `step_id` and an exact `version` are required, `step_lib_source` is only necessary for custom step sources.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations clearly mark this as read-only, idempotent, and non-destructive. The description adds value by disclosing what data is returned (defaults, allowed values) beyond the annotations, but lacks details on error handling, rate limits, or output format since no output schema exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single efficient sentence front-loaded with the action verb. However, 'etc.' introduces vagueness, and the brevity may be insufficient given the complex step_ref format requirements and lack of output schema that could use elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a single-parameter read operation. The description mentions key output fields (defaults, allowed values) compensating somewhat for the missing output schema, though it could specify the return structure type (array/object) or error cases for invalid step references.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema adequately documents the complex step_ref format. The description does not mention parameters, which is acceptable given the schema's completeness, meeting the baseline score for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Uses specific verb 'List' + resource 'inputs of a step' and details what information is returned ('defaults, allowed values'). However, it does not explicitly distinguish from sibling tools like step_search (which finds steps) vs this tool (which inspects a specific step's inputs).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus alternatives like step_search or get_build_steps. No mention of prerequisites or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trigger_bitrise_buildBInspect

Trigger a new build/pipeline for a specified Bitrise app

ParametersJSON Schema
NameRequiredDescriptionDefault
branchNoThe branch to buildmain
app_slugYesIdentifier of the Bitrise app (e.g., "d8db74e2675d54c4" or "8eb495d0-f653-4eed-910b-8d6b56cc0ec7")
commit_hashNoThe commit hash for the build
pipeline_idNoThe pipeline to build
workflow_idNoThe workflow to build
environmentsNoCustom environment variables for the build.
commit_messageNoThe commit message for the build
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover the mutation nature (readOnlyHint=false) and idempotency (idempotentHint=false). Description adds 'new' confirming non-idempotent behavior, but lacks details on return values, side effects, or rate limits despite having no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise at 9 words, but suffers from under-specification rather than efficient density. Front-loaded with action verb, though missing complementary context sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Insufficient for a mutation tool with 7 parameters and no output schema. Should explain what constitutes successful triggering, expected latency, or interaction with the 'openWorld' external effects hinted by annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema carries full semantic weight. Description mentions no specific parameters or their relationships (e.g., pipeline_id vs workflow_id mutual exclusivity), earning baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verb 'Trigger' with clear resource 'build/pipeline' and target 'Bitrise app'. However, it does not distinguish from sibling tool 'rebuild_pipeline', which could confuse selection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this versus 'rebuild_pipeline' or 'abort_build', nor prerequisites like authentication or required app configuration. Agent must infer usage solely from the name.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_appB
DestructiveIdempotent
Inspect

Update an app. Only app_slug is required, add only fields you wish to update

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNoThe new title of the application
app_slugYesIdentifier of the Bitrise app
default_branchNoThe new default branch for the application
repository_urlNoThe new repository URL for the application
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable partial-update semantics ('add only fields you wish to update') which aligns with the idempotentHint annotation. However, it fails to disclose the destructiveHint=true behavior or warn what happens to unspecified fields (they are preserved vs deleted).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no redundancy. Front-loaded with the action verb. Could be improved by specifying 'Bitrise app' instead of 'an app' but otherwise appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a 4-parameter update operation with good annotations, but lacks description of return values or success behavior (no output schema exists). Given the destructive annotation, more warnings about data loss would be appropriate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the baseline is 3. The description explicitly states the required nature of app_slug and emphasizes the optional partial-update pattern for other fields, adding usable guidance beyond the structured schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

'Update an app' uses a clear verb but the resource is vague given sibling tools like update_connected_app and update_artifact exist. The description does not clarify this operates on Bitrise apps specifically (only the schema's app_slug parameter description reveals this).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides operational guidance ('Only app_slug is required, add only fields you wish to update') indicating partial update semantics, but lacks strategic guidance on when to choose this over update_connected_app or other alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_artifactC
DestructiveIdempotent
Inspect

Update a build artifact.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app
build_slugYesIdentifier of the build
artifact_slugYesIdentifier of the artifact
is_public_page_enabledYesEnable public page for the artifact
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations indicate destructiveHint=true and idempotentHint=true, the description adds no explanatory context for these behaviors. It does not clarify what is destroyed (privacy/access control changes when toggling public visibility), what idempotency means in this context, or implications of the openWorldHint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely brief (four words), which prevents verbosity, but errs toward under-specification rather than efficient precision. It front-loads nothing actionable since it merely restates the tool name.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive mutation operation requiring three different identifier slugs to modify a boolean visibility setting, the description is incomplete. It fails to explain the public page functionality, the destructive nature of the operation, or how to obtain prerequisite identifiers.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the structured schema already documents all four parameters adequately. The description adds no semantic value beyond the schema (e.g., explaining the hierarchical relationship between app/build/artifact slugs), warranting the baseline score for complete schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Update a build artifact' is a tautology that restates the tool name without adding specificity. It fails to distinguish this tool from sibling operations like delete_artifact, get_artifact, or list_artifacts, nor does it indicate that this specifically modifies public page visibility settings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., delete_artifact for removal, get_artifact for retrieval), nor does it mention prerequisites such as obtaining the required slugs from other API calls.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_bitrise_ymlA
DestructiveIdempotent
Inspect

Update the Bitrise YML config stored on Bitrise. This has no effect if it is stored in the repository.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugYesIdentifier of the Bitrise app (e.g., "d8db74e2675d54c4" or "8eb495d0-f653-4eed-910b-8d6b56cc0ec7")
bitrise_yml_as_jsonYesThe new Bitrise YML config file content to be updated. It must be a string. Important: these configs are large files, so get these by running: cat <filepath> via the Bash tool.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover idempotency and destructiveness, but the description adds the crucial repository-storage behavioral caveat. Could improve by disclosing that this performs a full config replacement (not a patch) and whether validation is automatically performed or required beforehand.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Exactly two sentences with zero redundancy. Information is front-loaded with the action, followed immediately by the operational constraint. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the rich annotations (destructive, idempotent) and 100% schema coverage, the description provides the essential operational context (repository limitation). Minor gap: does not mention the validation workflow suggested by the existence of validate_bitrise_yml in siblings.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions (including the Bash tool hint for large files). The main description mentions 'Bitrise YML config' aligning with the bitrise_yml_as_json parameter but adds no semantic detail beyond the schema definitions, warranting the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'Update' with specific resource 'Bitrise YML config' and scope 'stored on Bitrise'. The repository caveat distinguishes it from generic file updates and aligns with the sibling get_bitrise_yml (which likely reads from same source).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides critical negative constraint alerting that the tool has 'no effect if it is stored in the repository'. However, lacks positive guidance on prerequisites (e.g., retrieving current config via get_bitrise_yml) or recommended workflow (e.g., validating first with validate_bitrise_yml given the destructive nature).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_connected_appC
Destructive
Inspect

Updates a connected app.

ParametersJSON Schema
NameRequiredDescriptionDefault
store_app_idNoThe store identifier for your app. You can change the previously set store_app_id to match the one in the App Store or Google Play depending on the app platform. This is especially useful if you want to connect your app with the store as the system will validate the given store_app_id against the Store. In case of iOS platform it is the bundle id. In case of Android platform it is the package name.
connect_to_storeNoIf true, will check connected app validity against the Apple App Store or Google Play Store (dependent on the platform of your connected app). This means, that the already set or just given store_app_id will be validated against the Store, using the already set or just given store credential id.
connected_app_idYesThe uuidV4 identifier for your connected app.
store_credential_idNoIf you have credentials added on Bitrise, you can decide to select one for your app. In case of ios platform it will be an Apple API credential id. In case of android platform it will be a Google Service credential id.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover destructive/non-idempotent/openWorld traits, but the description adds no behavioral context about external store validation against Apple/Google servers, credential security implications, or the irreversibility of changes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three words is excessively terse to the point of underspecification. The single sentence fails to front-load any useful behavioral cues beyond the tool name itself.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of App Store/Google Play integration, credential management, and destructive mutations, the description is woefully inadequate. It omits critical context about platform-specific behaviors (iOS bundle ID vs Android package name) that appear only in parameter descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, establishing baseline 3. The description mentions no parameters, but the schema adequately documents the interdependent relationship between store_app_id, store_credential_id, and connect_to_store.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Updates a connected app' restates the tool name (tautology) with minimal elaboration. It fails to distinguish from sibling tool create_connected_app or explain what 'connected' means in this Bitrise context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use update versus create_connected_app, prerequisites for store validation, or warnings about the destructive nature of changing store credentials.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_outgoing_webhookA
DestructiveIdempotent
Inspect

Update an outgoing webhook for an app. Even if you do not want to change one of the parameters, you still have to provide that parameter as well: simply use its existing value.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlNoURL of the webhook
eventsNoList of events to trigger the webhook
headersNoHeaders to be sent with the webhook
app_slugYesIdentifier of the Bitrise app
webhook_slugYesIdentifier of the webhook
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate destructiveHint: true and idempotentHint: true. The description adds critical behavioral context beyond annotations by explaining the API follows replacement semantics (requiring all parameters even when unchanged), clarifying this is not a partial update/PATCH operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences total. The first states purpose immediately. The second delivers a critical usage constraint without redundancy. Every sentence earns its place with zero waste; appropriately sized for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a state-mutating operation (destructiveHint: true) with openWorldHint: true, the description covers the essential replacement constraint. While it could mention error cases (e.g., webhook not found), it provides sufficient context for agent selection and correct invocation parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, baseline is 3. The description elevates this by adding crucial semantic context: it clarifies that optional-looking parameters (url, events, headers) must actually be provided with existing values when unchanged, interpreting the schema as a full replacement operation rather than partial update.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with 'Update an outgoing webhook for an app' providing a specific verb (update), resource (outgoing webhook), and scope (for an app). It clearly distinguishes from siblings like create_outgoing_webhook and delete_outgoing_webhook through the specific verb choice.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The second sentence provides explicit guidance on how to invoke the tool: 'Even if you do not want to change one of the parameters, you still have to provide that parameter as well.' While it doesn't explicitly name alternatives (create vs update), it provides crucial operational context for correct usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_tester_groupB
Destructive
Inspect

Updates the given tester group. The name and the auto notification setting can be updated optionally.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesThe uuidV4 identifier of the tester group to which testers will be added.
nameNoThe new name for the tester group. Must be unique in the scope of the related connected app.
auto_notifyNoIf set to true it indicates the tester group will receive email notifications automatically from now on about new installable builds.
connected_app_idYesThe uuidV4 identifier of the related Release Management connected app.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnly/destructive hints. Description adds that fields can be updated 'optionally', indicating partial updates are supported (relevant to idempotentHint=false). Does not address openWorldHint=true implications (external email delivery) or retry safety.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. First establishes the operation, second clarifies optional field updates. Efficient front-loading with no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple 4-parameter mutation tool with complete schema coverage and safety annotations. Covers partial update capability but omits disclosure of external side effects (email notifications) and error conditions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage. Description bridges terminology by mapping 'auto notification setting' to the 'auto_notify' parameter and reinforcing optionality, adding semantic value beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'Updates' with resource 'tester group'. Specifies updatable fields (name, auto notification setting). Implicitly distinguishes from sibling 'add_testers_to_tester_group' by focusing on metadata rather than membership, though explicit differentiation is absent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this versus 'create_tester_group' or 'add_testers_to_tester_group'. No mention of prerequisites (e.g., group must exist) or mutation consequences.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_bitrise_ymlA
Read-onlyIdempotent
Inspect

Validate a Bitrise YML config file. Use this tool to verify any changes made in bitrise.yml.

ParametersJSON Schema
NameRequiredDescriptionDefault
app_slugNoSlug of a Bitrise app (as returned by the list_apps tool). Specifying this value allows for validating the YML against workspace-specific settings like available stacks, machine types, license pools etc.
bitrise_ymlYesThe Bitrise YML config file content to be validated. It must be a string. Important: these configs are large files, so get these by running: cat <filepath> via the Bash tool.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description confirms the read-only validation nature consistent with annotations (readOnlyHint=true, destructiveHint=false) and adds workflow context (verifying changes). However, it adds minimal technical behavioral detail beyond annotations, such as what validation encompasses (syntax, schema, workspace settings) or error behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, zero waste. The first states purpose, the second states usage context. Information is front-loaded with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the rich schema (100% coverage, helpful param descriptions) and complete annotations covering safety profiles, the description provides sufficient context. It appropriately focuses on purpose and use case without needing to explain return values (no output schema) or safety constraints already covered by annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline score is 3. The main description does not add parameter-specific details, but the schema comprehensively documents both parameters (including the important hint about using Bash/cat for large files in bitrise_yml description).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action (validate) and resource (Bitrise YML config file). It effectively distinguishes from siblings 'get_bitrise_yml' (retrieval) and 'update_bitrise_yml' (application) by specifying this is for verification of changes rather than retrieval or modification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The second sentence provides clear contextual guidance: 'Use this tool to verify any changes made in bitrise.yml,' establishing exactly when to invoke it (after modifications). However, it lacks explicit 'when-not' guidance or named alternatives (e.g., distinguishing from 'update_bitrise_yml' which applies changes).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.