rubrkit

by com.rubrkit

Server Details

MCP-native AI evaluation: rubric audits, eval suites, and proof reports for AI/LLM output.

Status: Healthy
Last Tested: 2026-07-23 01:43
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

B3.3/5.0

Tool DescriptionsC

Average 3.2/5 across 30 of 30 tools scored. Lowest: 2.4/5.

Server CoherenceA

Disambiguation4/5

Most tools target distinct resource-action pairs, and descriptions clarify differences like soft vs hard delete or list bundles vs list versions. However, some names are similar (e.g., delete_artifact_bundle vs hard_delete_artifact_bundle), which could cause minor confusion.

Naming Consistency5/5

All tools follow a uniform rubrkit_verb_noun pattern with imperative verbs and snake_case, making them predictable and easy to navigate. No mixed conventions or inconsistencies.

Tool Count4/5

With 30 tools, the server is comprehensive but slightly heavy for a single-purpose MCP. The tools cover multiple sub-domains (bundles, files, audits, evals, conversions, docs), which justifies the count, but consolidation could reduce complexity.

Completeness4/5

The set covers CRUD for artifact bundles, file operations, audit/eval lifecycle, and API documentation. Minor gaps include missing bundle metadata update and file deletion, but core workflows are supported.

Available Tools

41 tools

rubrkit_add_golden_caseAdd golden caseCInspect

Add a golden case (a real edge-case input with a confirmed-correct output) to an artifact bundle via POST /api/v1/artifact-bundles/{artifactBundleId}/golden-cases. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`tags`	No
`input`	Yes
`origin`	No
`weight`	No
`criteria`	No
`originRef`	No
`expectedOutput`	No
`artifactBundleId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full behavioral disclosure burden. It states the tool adds a golden case (a create operation) but does not disclose side effects (e.g., idempotency, overwrite behavior, limits on golden case count, or impact on existing cases). This is insufficient for an agent to understand behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences) and front-loaded with the core purpose. It includes the HTTP method and endpoint, as well as the permission requirement. However, it could slightly improve by explicitly stating the required parameter 'expectedOutput' is implied but not mandatory.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 8 parameters, nested objects, and no output schema or annotations, the description lacks essential context about return values, error handling, and parameter usage. For example, it doesn't explain how 'criteria' or 'originRef' are used. This forces the agent to rely on external knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning no parameter descriptions in the schema. The description adds no details about the 8 parameters (e.g., what 'input', 'expectedOutput', 'tags', 'origin' mean or how they affect the golden case). This forces the agent to guess parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (add a golden case) and the resource (artifact bundle), including the HTTP endpoint. However, it does not explicitly differentiate from sibling tools like rubrkit_retire_golden_case, leaving room for confusion about when to use this specific tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only mentions the required permission (artifact_bundles:write) but provides no guidance on when to use this tool versus alternatives (e.g., rubrkit_retire_golden_case) or when not to use it. Context signals show no usage advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_apply_audit_rewriteApply audit rewriteAInspect

Apply a completed audit's AI rewrite to the bundle files (a new version per changed file). Requires audits:run for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`auditRunId`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the burden of behavioral disclosure. It mentions that new versions are created per changed file, which is a key behavioral trait. However, it does not disclose potential destructiveness, error conditions, or side effects beyond version creation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one sentence plus a requirement, with no wasted words. The main action is front-loaded and the prerequisite follows cleanly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool that modifies bundle files, the description is too minimal. It lacks details on side effects, error handling, output (beyond 'new version'), and whether existing versions are affected. Given no output schema and no annotations, more context is needed for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description adds no meaning to the two UUID parameters ('auditRunId', 'artifactBundleId') beyond their names. The schema's property names are somewhat self-explanatory, but the description should clarify what each ID refers to.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Apply a completed audit's AI rewrite'), the resource ('bundle files'), and the outcome ('a new version per changed file'). It is specific and distinguishes from sibling tools like rubrkit_start_audit or rubrkit_read_audit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a prerequisite ('Requires audits:run for API keys') but lacks guidance on when to use this tool vs alternatives, or any when-not-to-use conditions. It gives minimal context about its appropriate usage scenario.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_convert_to_rubr_flowConvert to rubr_flowCInspect

Start an async rubr_flow conversion job. Requires rubr_flow:convert for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`modelTier`	No
`outputPath`	No
`targetFileId`	No
`reasoningEffort`	No
`artifactBundleId`	Yes
`targetVersionNumber`	No
`artifactBundleVersionNumber`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It states the job is async but does not explain what triggers the conversion, how to track progress, or any side effects. The permission requirement is noted, but details on rate limits, cancellation, or data retention are absent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (two sentences) but omits essential information. It is not verbose, but brevity comes at the cost of completeness. The front-loading is fine.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 7 parameters with no descriptions, no output schema, and no annotations, the description is severely incomplete. It does not cover return values, job lifecycle, or how to map parameters to inputs. A complex async job requires far more context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the description does not explain any of the 7 parameters (e.g., modelTier, outputPath, reasoningEffort). The agent must rely solely on the schema, which lacks descriptions, making it impossible to understand how to correctly fill in parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Start' and the resource 'rubr_flow conversion job'. It also mentions the required permission, distinguishing it from read/list operations like rubrkit_list_rubr_flow_conversions and rubrkit_read_rubr_flow_conversion, though not explicitly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives such as rubrkit_poll_job for checking job status or rubrkit_list_rubr_flow_conversions for listing existing conversions. No 'when-not' or alternative tools mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_create_artifact_bundleCreate artifact bundleCInspect

Create an artifact bundle through /api/v1/artifact-bundles. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes
`settings`	No
`description`	No
`customRubric`	No

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description mentions the required permission, which is a behavioral trait. However, no annotations exist, and the description does not disclose other important behaviors such as idempotency, effects of creation, or error conditions. It lacks sufficient detail for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short (two sentences), but this conciseness comes at the expense of missing critical information about parameters and behavior. It is under-specified for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has four parameters, no output schema, and no annotations, the description is extremely incomplete. It fails to provide context about the return value, parameter meanings, or usage examples, making it inadequate for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the description must explain parameters. It does not explain any of the four parameters (name, settings, description, customRubric) beyond what is in the schema. The agent has no guidance on how to use them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'Create an artifact bundle' and specifies the API endpoint and required permission. This verb-resource combination is specific and distinguishes from sibling tools like delete, list, and read.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides the required permission 'artifact_bundles:write', which helps agents know when they can use the tool. However, it does not explicitly state when not to use it or suggest alternatives, though the sibling tools have distinct purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_create_drift_monitorCreate drift monitorBInspect

Create a drift monitor pinned to a file version via POST /api/v1/artifact-bundles/{artifactBundleId}/drift-monitors. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`cadence`	No
`repeats`	No
`pinnedVersionId`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It only states the HTTP method and required auth, lacking details on side effects, rate limits, or error behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences: one for the core action and one for authentication. It is front-loaded and contains no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with four parameters and no output schema, the description is severely incomplete. It fails to explain parameters, return values, or usage context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no information about the four parameters (cadence, repeats, pinnedVersionId, artifactBundleId), despite schema description coverage being 0%.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a drift monitor pinned to a file version via a specific HTTP POST endpoint. It distinguishes the tool from siblings like list or set status tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions required permissions (artifact_bundles:write) but does not provide explicit guidance on when to use this tool versus alternatives or any exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_delete_artifact_bundleDelete artifact bundleAInspect

Archive an artifact bundle (soft delete) through DELETE /api/v1/artifact-bundles/{artifactBundleId}. The bundle moves to the archived state and can be restored or permanently deleted later. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`artifactBundleId`	Yes

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses soft delete (archiving) and the ability to restore or permanently delete later, but could elaborate on whether the bundle is removed from listing or other side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no unnecessary words, efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple single-parameter tool with no output schema, the description covers purpose, behavior, and auth requirements completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description adds context by mentioning the endpoint path that uses the parameter, clarifying its role as the bundle identifier.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Archive an artifact bundle (soft delete)' with the HTTP method and endpoint, distinguishing it from the sibling hard delete tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly describes the soft delete behavior and state transition, and specifies required auth scope 'artifact_bundles:write', with implied alternative of hard delete from sibling name.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_export_proof_reportExport proof reportCInspect

Start an async proof report job with export storage enabled by default. Requires evals:run for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`evalRunId`	No
`auditRunId`	No
`sourceFileId`	No
`includeExport`	No
`rubricVersion`	No
`candidateFileId`	No
`artifactBundleId`	Yes
`sourceFileVersionId`	No
`sourceVersionNumber`	No
`candidateFileVersionId`	No
`candidateVersionNumber`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It states the job is async and export storage is enabled by default, but omits critical behavioral details like side effects, how to check job status, or whether the operation is idempotent. This is insufficient for an async job tool with 11 parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no filler, front-loading the key action and requirement. However, it could be more concise if it integrated parameter guidance, but as is, it earns a 4 for efficiency.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given high complexity (11 params, async job, no output schema), the description is far from complete. It does not explain the job output, how to poll for results, or the purpose of the many parameters. The flow implied by sibling tools (e.g., rubrkit_poll_job) is not mentioned.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the description does not explain any of the 11 parameters. The only parameter hint is 'export storage enabled by default' implicitly referencing includeExport, but no details on evalRunId, auditRunId, sourceFileId, etc. The agent has no guidance on how to fill these parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool starts an async proof report job with export storage enabled by default. It uses a specific verb ('start') and identifies the resource ('proof report job'), which distinguishes it from sibling read tools like rubrkit_read_proof_report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a prerequisite ('Requires evals:run for API keys') but provides no guidance on when to use this tool vs alternatives such as rubrkit_start_audit or rubrkit_run_evals. No explicit when-not-to-use or alternative tool names are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_hard_delete_artifact_bundlePermanently delete artifact bundleAInspect

Permanently delete an archived artifact bundle and all of its files, versions, and run history through DELETE /api/v1/artifact-bundles/{artifactBundleId}/permanent. This is irreversible and only works on bundles that are already archived. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`artifactBundleId`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description fully covers behavioral aspects: irreversibility, the deletion of all associated files/versions/run history, the requirement that bundle be archived, and needed API key permission. This provides good transparency for a destructive action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each adding essential information: action scope, irreversibility/condition, and permission requirement. No extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple single-parameter tool with no output schema or annotations, the description covers all necessary aspects: purpose, prerequisites, permission, and endpoint. It does not describe return value, but that is acceptable given the tool's nature.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains that the single parameter (artifactBundleId) is the bundle ID used in the DELETE endpoint URL, and it clarifies that the bundle must be archived. This adds meaning beyond the schema's basic type and format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it permanently deletes an archived artifact bundle including all files, versions, and run history. It distinguishes itself from the soft delete sibling tool (rubrkit_delete_artifact_bundle) by emphasizing irreversibility and the 'archived' condition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description specifies that the tool only works on archived bundles and requires write permissions. It implies when to use this hard delete over the soft delete variant, though it does not explicitly name the alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_artifact_bundlesList artifact bundlesCInspect

List artifact bundles through /api/v1/artifact-bundles. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`status`	No		active

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses the required permission (artifact_bundles:read for API keys), which is helpful for authorization. However, it lacks details on pagination, rate limits, or side effects. The endpoint URL is provided, adding some transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences. The first sentence states the purpose, and the second adds a permission note. It is front-loaded and efficient, though it could include brief parameter explanations without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and two parameters with no description, the description is incomplete. It does not explain what the list returns, how pagination works, or filter options (status enum). Critical context for a list operation is missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description does not describe any of the two parameters (limit, status) or their effects. It provides no additional meaning beyond the schema, failing to compensate for the lack of parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List artifact bundles' which is a specific verb and resource, and it provides the API endpoint. However, it does not explicitly differentiate itself from other list tools among siblings, though the resource name is unique.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a permission requirement but provides no guidance on when to use this tool versus alternatives like rubrkit_read_artifact_bundle or other list tools. No when-not or alternative recommendations are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_artifact_bundle_versionsList artifact bundle versionsCInspect

List bundle-level version events. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`artifactBundleId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds the permission requirement, which is helpful given no annotations. However, it does not disclose whether the operation is read-only, idempotent, or other behavioral traits like pagination defaults.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundancy. Purpose and permission are stated upfront. Concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and many sibling tools, the description lacks context about return values, how version events differ from file version events, and pagination behavior. Incomplete for a 2-parameter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and the description does not explain any parameters. The meaning of artifactBundleId and limit is left entirely to the schema, which lacks descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it lists 'bundle-level version events', distinguishing it from sibling tools that list bundles or files. However, 'version events' could be more explicit about what is returned.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like list_artifact_bundles or list_file_versions. Only a permission requirement is mentioned, but not usage context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_artifact_filesList artifact filesBInspect

List files in an artifact bundle. Requires files:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`artifactBundleId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It only indicates a read operation implicitly via 'List' and mentions auth requirements, but fails to disclose pagination behavior, default limits, what fields are returned, or any side effects. For a read operation, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences, no redundant phrasing. However, it lacks structure (e.g., bullet points or separate sections) and could benefit from a note about pagination or the limit parameter. Still, every word serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, no annotations, and a simple list operation, the description is incomplete. It omits details about the response format, pagination (even though a limit parameter exists), ordering, and whether the tool returns file metadata or paths. A more complete description would help the agent understand what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 2 parameters (artifactBundleId, limit) with 0% schema description coverage, yet the description does not explain either parameter. It does not clarify that artifactBundleId is a UUID or what the limit parameter controls, leaving the agent without additional context beyond the schema types.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'List files in an artifact bundle,' which clearly identifies the verb (list) and resource (files in a bundle). It is distinct from sibling tools like rubrkit_list_artifact_bundles and rubrkit_list_artifact_bundle_versions, which serve different purposes. The added auth note further clarifies scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only mentions authorization requirements ('Requires files:read for API keys') but does not provide guidance on when to use this tool versus alternatives, nor does it explain when not to use it. The context is minimal; an agent would need to infer usage from the name and sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_auditsList auditsBInspect

List persisted audit history. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`artifactBundleId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavior. It mentions listing and permission requirements but does not clarify return format, pagination behavior (though 'limit' parameter suggests it), or whether results are ordered. This is insufficient for a read operation with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: two short sentences with no superfluous words. Every sentence is informational.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks essential information given no output schema. Does not describe the return type (e.g., list of audit objects), pagination details, or any ordering. The minimal description leaves room for confusion about the tool's output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description does not explain the 'artifactBundleId' or 'limit' parameters. While parameter names are somewhat self-explanatory, the description adds no extra context about their purpose or constraints (e.g., what artifactBundleId represents).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it lists persisted audit history, distinguishing from 'read_audit' which retrieves a single audit. However, it does not explicitly differentiate from other list operations like 'list_evals' or 'list_artifact_bundles'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes a permission requirement ('Requires artifact_bundles:read for API keys'), which is a helpful prerequisite. Does not provide guidance on when to use this tool versus alternatives such as 'read_audit' or other list tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_drift_monitorsList drift monitorsAInspect

List the drift monitors for an artifact bundle via /api/v1/artifact-bundles/{artifactBundleId}/drift-monitors. Requires artifact_bundles:read (or artifacts:pull) for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`artifactBundleId`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses required permissions (artifact_bundles:read or artifacts:pull), but does not mention pagination, rate limits, or other behavioral traits common for listing endpoints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences: one stating the purpose with endpoint, one stating required permissions. No wasted words, front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one parameter and no output schema, the description covers purpose and auth. However, it could mention that the return is a list of drift monitors or provide slight more detail on the response.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% (no parameter descriptions in schema). The description only implicitly references artifactBundleId via 'for an artifact bundle', adding little meaning beyond the schema. It does not explain the parameter format or purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list' and the resource 'drift monitors for an artifact bundle', including the API endpoint. This distinguishes it from sibling tools like rubrkit_create_drift_monitor and rubrkit_set_drift_monitor_status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing to view existing drift monitors for a specific bundle, but lacks explicit guidance on when not to use or alternatives. The included permission requirement provides some context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_evalsList evalsBInspect

List eval history. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`artifactBundleId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the auth requirement, adding behavioral context beyond schema fields. However, it does not explicitly state that the tool is read-only or idempotent, and no annotations exist to fill this gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with only two short sentences, no redundancy, and all information provided is directly useful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given two parameters, no output schema, and no annotations, the description is incomplete. It lacks explanations of output format, pagination, or how the 'limit' parameter works, leaving significant gaps for correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 0%, the description fails to explain the parameters 'artifactBundleId' and 'limit.' It adds no meaning beyond the schema's basic type definitions, making it unhelpful for parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'List eval history,' which is a specific verb and resource. It is distinct from siblings like 'rubrkit_read_eval' (singular) and 'rubrkit_run_evals' (run), though it does not explicitly differentiate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a prerequisite ('Requires artifact_bundles:read for API keys'), which helps in understanding authentication, but it offers no guidance on when to use this list tool versus alternatives like 'rubrkit_read_eval' or 'rubrkit_run_evals.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_file_versionsList file versionsBInspect

List immutable versions for a file. Requires files:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`fileId`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must convey behaviors. It states the read-only nature ('list') and auth requirement, but does not mention pagination limits (only implicit in schema), ordering, or any additional traits. Adequate for simple tool but lacks depth.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states purpose, second adds auth requirement. No fluff, front-loaded with core info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 0% parameter coverage, the description lacks detail on return format, version ordering, or whether it lists all versions. Incomplete for an agent to use effectively without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description adds no parameter details beyond the schema. It does not explain artifactBundleId, fileId, or limit in context, leaving the agent to infer from names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'List immutable versions for a file', specifying the action (list) and resource (file versions). Distinguishes from sibling tools focused on artifact bundles, like rubrkit_list_artifact_bundle_versions, by targeting file-specific versions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Only mentions a prerequisite (requires files:read for API keys) but provides no guidance on when to use this tool versus alternatives, such as rubrkit_list_artifact_bundle_versions or rubrkit_read_artifact_file.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_golden_casesList golden casesCInspect

List the golden cases for an artifact bundle via /api/v1/artifact-bundles/{artifactBundleId}/golden-cases. Requires artifact_bundles:read (or artifacts:pull) for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`status`	No
`artifactBundleId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided. Description does not disclose behavior such as pagination, result limits, default ordering, or error handling. Only states the endpoint and permission requirement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short (one sentence plus permission note), front-loaded with purpose. However, underspecification reduces efficiency; every sentence earns its place but lacks necessary content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, so description should explain return values or pagination. It does not. Parameter semantics incomplete. For a list operation, missing details on result format and constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 2 parameters with 0% description coverage. Description does not explain meaning of 'status' enum values (active, retired, null) or the 'artifactBundleId' format. Schema itself provides some structure but description adds no value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List the golden cases for an artifact bundle' with the specific API endpoint. Verb is 'list' and resource is 'golden cases for artifact bundle', distinguishing it from sibling list tools for other resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like list_artifact_bundles or other list tools. Permission note ('Requires artifact_bundles:read') is present but does not inform usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_job_eventsList job eventsAInspect

Read append-only job progress events. Requires jobs:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`jobId`	Yes
`limit`	No

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the read-only nature and append-only behavior of events, and mentions auth requirements. However, it does not cover error handling or pagination details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: one sentence plus an auth note, with no extraneous information. It is well-structured and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 2 parameters, no output schema, and a list operation, the description is incomplete. It lacks details about what is returned, pagination behavior, and event contents, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not mention the parameters (jobId, limit) at all, leaving the agent to infer their purpose solely from names and schema constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Read' and resource 'append-only job progress events', distinguishing it from sibling list tools that list other entities like artifact bundles or audits.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes an authorization requirement ('Requires jobs:read for API keys'), which provides context for when the tool can be used, but lacks guidance on when to choose this over alternative list tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_list_rubr_flow_conversionsList rubr_flow conversionsBInspect

List conversion history. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`artifactBundleId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the permission requirement (read) but does not explicitly state the operation is read-only, nor does it describe pagination, ordering, or output format. Some value is added by the permission hint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two short sentences that convey core purpose and a permission requirement. It is well-structured and front-loaded without any wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has two parameters (one required) and no output schema, the description is too minimal. It lacks parameter explanations, return value description, and details on pagination or effective usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must explain parameters. It does not mention artifactBundleId or limit, leaving their semantics entirely to the schema. The description adds no value for parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List conversion history', directly aligning with the tool name and title. It also adds a permission requirement, providing useful context for usage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only mentions a permission requirement but offers no guidance on when to use this tool versus siblings like read_rubr_flow_conversion or convert_to_rubr_flow. No context on filtering or alternatives is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_poll_jobPoll job progressBInspect

Read async job progress. Requires jobs:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`jobId`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full burden. It discloses the read nature and permission requirement, but fails to describe the return format (e.g., what 'progress' means, status fields, polling behavior), leaving behavior partially ambiguous.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with clear separation of purpose and auth requirement—efficient and front-loaded, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple tool with one parameter, but lacks description of the return value or behavior beyond 'progress', which is important since no output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no meaning beyond the input schema; the schema covers 0% of parameter semantics, and while 'jobId' is self-explanatory, the description does not elaborate on format, constraints, or how to obtain the ID.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Read' and the resource 'async job progress', which is specific and distinguishes it from sibling tools like rubrkit_list_job_events that list events rather than progress.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No usage guidelines are provided; the description only mentions a permission requirement ('Requires jobs:read for API keys') but does not specify when to use this tool over alternatives or any prerequisites beyond auth.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_api_docRead Rubrkit API docAInspect

Read a docs/api Markdown guide or OpenAPI source by slug, file name, or docs/api path.

ParametersJSON Schema

Name	Required	Description	Default
`slugOrPath`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states a non-destructive read operation but doesn't disclose any potential behaviors such as error cases, permissions, or return format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one concise sentence that front-loads the verb and resource. It could be slightly more structured but is efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read tool with one parameter and no output schema or annotations, the description provides the core purpose. However, it lacks details on return format or any constraints, which would add completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'slugOrPath' has minimal schema definition (just type and minLength). The description adds significant meaning by explaining it can be a slug, file name, or docs/api path, which compensates for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the verb 'Read' and the resource 'docs/api guide or OpenAPI source', along with how to locate it (by slug, file name, or path). This distinguishes it effectively from sibling tools like search_api_docs or read_artifact_bundle.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for reading specific API docs but doesn't explicitly state when to use this tool over alternatives or provide any context about prerequisites or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_artifact_bundleRead artifact bundleAInspect

Read artifact bundle details, files, and file tree. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`artifactBundleId`	Yes

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It clearly states it is a read operation ('Read'), which implies no side effects, and specifies what is returned (details, files, file tree). While it could explicitly note idempotency, the description adequately conveys the safe read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no superfluous information. The first sentence states the tool's function, and the second adds a permission requirement. Every word contributes value, making it highly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description provides a high-level overview ('details, files, and file tree') but does not specify what details include. It is adequate for a simple read tool but could be more complete by listing typical fields or explaining the output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'artifactBundleId' is not explained in the description beyond its presence in the schema. With 0% schema description coverage, the description should add context (e.g., where to find the ID, format expectations), but it does not. The parameter name is self-explanatory, but lacking guidance on how to obtain it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Read artifact bundle details, files, and file tree,' specifying the verb (read), resource (artifact bundle), and the sub-resources (details, files, file tree). This distinguishes it from sibling tools like rubrkit_read_artifact_file and rubrkit_list_artifact_bundles, which have different scopes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only mentions a permission requirement ('Requires artifact_bundles:read for API keys') but does not provide explicit guidance on when to use this tool versus alternatives (e.g., rubrkit_list_artifact_bundles for listing all bundles). No when-to-use or when-not-to-use information is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_artifact_fileRead artifact fileCInspect

Read the latest file metadata and content. Requires files:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`fileId`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must convey behavioral traits. It states 'Read' (non-destructive) and the required permission, but does not disclose whether it returns multiple versions, pagination, error handling, or if the 'latest' file is always the current version. The description is vague about the operation's exact behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence, which is concise and front-loaded. However, it sacrifices essential details, making it under-specified. It earns no extra credit for brevity because it omits critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With two required parameters, no output schema, and no annotations, the description is very incomplete. It lacks information about return values, error conditions, parameter relationships, and how this tool fits among many sibling read tools. The agent would struggle to use this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the parameters have no descriptions in the schema. The tool description does not add any meaning to 'fileId' or 'artifactBundleId', nor explain their relationship. The agent must infer their purpose from names alone, which is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Read the latest file metadata and content,' which clearly identifies the action (read) and resource (artifact file). It distinguishes from sibling tools like 'rubrkit_list_artifact_files' (which lists files) and 'rubrkit_read_artifact_bundle' (which reads a bundle). However, it does not explicitly mention the required IDs (artifactBundleId, fileId) in the description, relying on the schema.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only mentions a prerequisite ('Requires files:read for API keys') but provides no guidance on when to use this tool versus alternatives like 'rubrkit_list_artifact_files' or 'rubrkit_read_artifact_bundle'. It does not explain the use case or conditions for invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_auditRead auditCInspect

Read a single audit run. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`auditRunId`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description should disclose behavioral traits. It only mentions 'Read' and a permission requirement, but does not confirm safety (e.g., idempotency, no side effects), response format, or any constraints beyond the permission.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences, the first clearly stating purpose. However, it is under-informative for a tool with no output schema and no parameter descriptions, making it less effective than it could be.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool (2 required params, no output schema), the description fails to provide essential context such as typical use cases, return value hints, or how the audit run is identified. It is minimally complete at best.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must add meaning. It does not explain what 'artifactBundleId' or 'auditRunId' represent, nor how they relate to the audit run. Parameter names provide limited hints, but the description adds no value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Read' and the resource 'a single audit run', distinguishing it from sibling tools like rubrkit_list_audits which list multiple runs. However, it does not explicitly name alternative tools or specify when to use this over others.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It only states a prerequisite (permission requirement) but does not explain context or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_drift_observationsRead drift observationsAInspect

List the recorded observations for a drift monitor via /api/v1/drift-monitors/{driftMonitorId}/observations. Requires artifact_bundles:read (or artifacts:pull) for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`driftMonitorId`	Yes

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions required permissions but does not disclose behavioral traits such as idempotence, pagination, rate limits, or error handling. The read-only nature is implied but not explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with two sentences, each adding essential information: the operation and the required permissions. No redundant content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and minimal description, it lacks details on response format, potential empty lists, or filtering. It is adequate for a simple read operation but incomplete for full understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description adds no details about the driftMonitorId parameter beyond the API path. It does not explain how to obtain it or its meaning, relying on the parameter name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists observations for a drift monitor, specifying the API endpoint. It distinguishes from siblings like rubrkit_list_drift_monitors which lists monitors themselves.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context: to fetch observations for a specific drift monitor. It does not explicitly state when not to use or alternatives, but the purpose is clear enough for an agent to infer.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_evalRead evalBInspect

Read one eval run, including the stats statistical verdict on advanced-mode runs (empty for standard runs). Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`evalRunId`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states 'Read one eval run' implying non-destructive behavior and mentions an auth requirement. However, it does not disclose error handling, idempotency, or side effects beyond presumption of read-only.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (two sentences) but sacrifices completeness by omitting parameter and output details. While efficient, it does not earn its brevity given the need for more information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite low complexity, the description fails to cover parameter meanings (schema coverage 0%) and does not hint at return values (no output schema). This leaves significant gaps for a tool that requires two IDs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% coverage for parameter descriptions, and the description does not explain evalRunId or artifactBundleId. With no additional context, the agent must infer their meaning from names and UUID formats, which is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Read one eval run,' specifying a single resource and a read operation. The name and title align, and the tool is distinct from siblings like 'rubrkit_list_evals' and 'rubrkit_run_evals'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Covers when to use (reading one eval run) and a key prerequisite (requires artifact_bundles:read). Does not explicitly exclude alternatives or list when-not-to-use, but the context from sibling tools implies differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_input_driftRead input driftAInspect

Compare recent production inputs against a prior reference window (population stability index) for an artifact bundle via GET /api/v1/artifact-bundles/{artifactBundleId}/drift/input. Requires artifact_bundles:read (or artifacts:pull) for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`binCount`	No
`windowDays`	No
`artifactBundleId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions GET and read permissions, indicating read-only behavior, but does not detail side effects, error conditions, or output format. Adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately short: three sentences covering purpose, endpoint, and auth. No unnecessary words, front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, no annotations, and three unelaborated parameters, the description is incomplete. It lacks details on what the tool returns, how to interpret results, and the meaning of binCount and windowDays for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description should explain parameters. It only indirectly addresses artifactBundleId and windowDays via context, but binCount is entirely unexplained. This is a significant gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: comparing recent production inputs against a reference window using population stability index. It specifies the HTTP endpoint and authentication requirements, and the name plus description effectively differentiate it from sibling drift-related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives like 'run_drift_check'. It only implies usage by stating what the tool does, but lacks context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_openapi_contractRead Rubrkit OpenAPI contractAInspect

Read the canonical docs/api/openapi.yaml contract.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, and the description only says 'Read' without disclosing behavioral details such as output format, caching, authentication, or side effects. For a read operation, minimal transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise, single sentence with no wasted words. Verb and resource are front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and no output schema, the description is minimal but adequate for a simple read. However, it could mention what the tool returns or how it relates to sibling read tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so baseline is 4. The description adds nothing beyond the schema, but no additional parameter information is needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action ('Read') and the specific resource ('canonical docs/api/openapi.yaml contract'), which distinguishes it from sibling tools like rubrkit_read_api_doc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., rubrkit_read_api_doc, rubrkit_search_api_docs). The agent is left to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_proof_reportRead proof reportCInspect

Read one proof report. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`proofReportId`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It states 'Read' (non-destructive operation) and a permission requirement. However, it does not disclose error behavior (e.g., missing report), idempotency, or any side effects. Adequate but minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: two short sentences, front-loaded with the core purpose. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 0% parameter description coverage, the description is too brief. It does not describe the return value or the significance of the UUIDs, leaving the agent with insufficient context to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description does not explain the purpose or relationship of 'proofReportId' and 'artifactBundleId'. With two UUID parameters and no schema documentation, the description fails to clarify their roles.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Read one proof report' – specific verb and resource. However, among siblings there are many 'read' tools (e.g., read_artifact_bundle, read_audit), and the description does not differentiate what a proof report is versus those, limiting sibling distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions a prerequisite ('Requires artifact_bundles:read for API keys') but provides no guidance on when to use this tool versus alternatives, no when-not-to-use, and no context about the relationship to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_rubr_flow_conversionRead rubr_flow conversionCInspect

Read one conversion report. Requires artifact_bundles:read for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`conversionId`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It only states the tool reads a report and mentions an auth requirement, omitting any details about potential errors (e.g., if the conversion doesn't exist), rate limits, idempotency, or return behavior. This is insufficient for a critical read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two short sentences), which is positive, but it lacks any structure describing parameters, behavior, or output. It sacrifices necessary detail for brevity, resulting in an incomplete specification.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has two parameters with no schema descriptions, no annotations, and no output schema, the description fails to provide adequate context. It does not explain the relationship between the parameters, the structure of the conversion report, or any usage tips, making it insufficient for an agent to use correctly without prior knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has two parameters (conversionId, artifactBundleId) with 0% description coverage, and the description adds no explanation about what these parameters represent or how they map to the report. The agent must infer their meaning from context, which is risky and incomplete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Read') and the resource ('one conversion report'), which precisely identifies the tool's function. It effectively distinguishes itself from the sibling tool rubrkit_list_rubr_flow_conversions by specifying the singular nature of the read operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a prerequisite ('Requires artifact_bundles:read for API keys'), which provides some usage context. However, it does not provide guidance on when to use this tool versus alternatives like list conversions, or any conditions under which the tool should be avoided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_read_validity_driftRead validity driftBInspect

Correlate rubric scores against recorded outcome metrics for an artifact bundle to check whether the rubric is still predictive via GET /api/v1/artifact-bundles/{artifactBundleId}/drift/validity. Requires artifact_bundles:read (or artifacts:pull) for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`artifactBundleId`	Yes

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits fully. It specifies the HTTP method (GET) and required permissions, but omits side effects, error handling, pagination, or result format. The lack of output schema description further limits transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: the first conveys the core action and endpoint, the second covers authorization. No redundant information, front-loaded with the primary purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Missing output specification. The description implies a boolean check but does not confirm the return type (e.g., score, percentage, boolean). Error conditions or resource existence checks are absent. Given no output schema, more detail would be needed for complete contextual understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter artifactBundleId is referenced in the endpoint path and is self-explanatory as an artifact bundle ID. However, schema coverage is 0%, so the description does not add format details, examples, or validation rules beyond what the schema provides. Baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool correlates rubric scores against outcome metrics to check predictive validity, with a specific HTTP GET endpoint. This distinguishes it from siblings like rubrkit_read_drift_observations or rubrkit_read_input_drift by focusing on validity drift.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as rubrkit_read_drift_observations or rubrkit_run_drift_check. The description lacks context on prerequisites or conditional usage, leaving the agent uninformed about selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_record_outcomeRecord outcomeAInspect

Record a production outcome event (input feature, metric, and/or label) for an artifact bundle via POST /api/v1/artifact-bundles/{artifactBundleId}/outcomes. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`inputHash`	No
`occurredAt`	No
`inputFeature`	No
`outcomeLabel`	No
`outcomeMetric`	No
`artifactBundleId`	Yes
`artifactVersionId`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses the authentication requirement and HTTP method, but lacks details on side effects, error states, idempotency, or what happens if the artifact bundle doesn't exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no fluff, but it could be more structured (e.g., listing parameters or providing examples). It is front-loaded with the action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and no annotations, the description is incomplete. It does not explain the return value, error conditions, or the purpose of several parameters. It is adequate for a basic understanding but lacks depth for safe invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must compensate. It only hints at 'input feature, metric, and/or label' corresponding to three parameters. It does not explain inputHash, occurredAt, artifactVersionId, leaving 4 out of 7 parameters undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Record', the resource 'production outcome event for an artifact bundle', and includes the API endpoint. It differentiates from sibling tools like rubrkit_create_artifact_bundle or rubrkit_add_golden_case by specifying the exact action and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly mentions the required permission 'artifact_bundles:write' and implies recording for artifact bundles. However, it does not provide explicit when-to-use vs alternatives or when not to use, though the context is clear from the sibling names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_restore_artifact_bundle_versionRestore artifact bundle versionCInspect

Restore files from a prior artifact-bundle version by creating new file versions. Requires artifact_bundles:write and files:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`message`	No
`versionNumber`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that the tool creates new file versions (write operation) and requires specific permissions. However, it does not state whether existing files are overwritten, if the operation is reversible, or any potential side effects. Partial transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. It front-loads the primary action, then states the required permissions. Very efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 3 undocumented parameters, the description lacks important context. It does not explain what happens to current files, whether all files are restored, or the nature of the response. For a tool with significant side effects, this is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description should explain parameters. It does not describe any of the three parameters (message, versionNumber, artifactBundleId) individually. No meaning is added beyond the schema's basic field definitions, leaving the agent to guess their purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it restores files from a prior artifact-bundle version by creating new file versions. The verb 'restore' and resource 'artifact bundle version' are specific. However, it does not explicitly distinguish from the sibling tool 'rubrkit_restore_file_version', which restores a single file version.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions required permissions (artifact_bundles:write and files:write), which is a prerequisite. But it provides no guidance on when to use this tool versus alternatives like 'rubrkit_restore_file_version' or other sibling tools. No context on scenarios or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_restore_file_versionRestore file versionCInspect

Restore a prior file version by creating a new version. Requires files:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`fileId`	Yes
`message`	No
`versionNumber`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description must carry the full burden. It mentions 'by creating a new version' and the auth scope, but does not disclose side effects, error conditions, or what happens to the current version.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no unnecessary words. First sentence states the purpose, second adds a critical auth requirement.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no output schema, and no annotations, the description is insufficient. It omits parameter purposes, return values, and usage context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% and the description does not explain any of the 4 parameters (fileId, message, versionNumber, artifactBundleId). With zero compensation, the description adds no value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Restore a prior file version by creating a new version', specifying the verb and resource. It distinguishes from sibling tools like rubrkit_restore_artifact_bundle_version by focusing on file versions, not bundle versions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. It only mentions an auth requirement but does not provide context for selection among siblings or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_retire_golden_caseRetire golden caseAInspect

Retire a golden case via DELETE /api/v1/golden-cases/{goldenCaseId}. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`goldenCaseId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Mentions it's a DELETE request (destructive) and requires a permission, but lacks details on whether it's a soft or hard delete, reversibility, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Efficient two-sentence description with no fluff. Includes endpoint and permission info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given simplicity (1 param, no output schema), covers basic use but omits response or idempotency information. Missing details on what happens after retirement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description implies the goldenCaseId is the path parameter but does not explain its purpose beyond the endpoint. Minimal but sufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action ('Retire'), the resource ('golden case'), and the HTTP method (DELETE). Distinguishes from siblings that add or list golden cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides permission requirement but does not explicitly state when to use vs alternatives or when not to use. Implicitly clear from sibling names but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_run_drift_checkRun drift check nowAInspect

Queue a drift check immediately instead of waiting for the monitor's schedule, via POST /api/v1/drift-monitors/{driftMonitorId}/check. The first check on a new monitor calibrates its baseline. Costs one audit per repeat. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`driftMonitorId`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that first check calibrates baseline, costs one audit per repeat, and required auth scope. No annotations exist, so description carries full burden; covers key behaviors but could elaborate on async nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose. Each sentence adds distinct value: action, endpoint, side-effects, cost, permissions. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks explanation of what happens after queuing (e.g., response, how to poll). Does not connect to matching sibling tool (poll_job) for status checking. Missing output schema further reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only parameter is driftMonitorId, mentioned in endpoint path. With 0% schema coverage, description provides minimal additional meaning (implicit from endpoint). Does not explicitly explain how to obtain the ID.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool queues a drift check immediately, distinguishes from waiting for schedule. Specific verb+resource with endpoint.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Describes when to use (immediate check) and important caveats (baseline calibration, cost, permissions). Does not explicitly mention when not to use or directly name sibling alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_run_evalsRun evalsCInspect

Start an async eval run. Requires evals:run for API keys. Set mode: "advanced" with baselineVersionId and candidateVersionId to run a statistically-backed A/B (repeats defaults to 3, max 5).

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No
`repeats`	No
`maxCases`	No
`testCases`	No
`auditRunId`	No
`sourceFileId`	No
`rubricVersion`	No
`candidateFileId`	No
`artifactBundleId`	Yes
`baselineVersionId`	No
`candidateVersionId`	No
`sourceFileVersionId`	No
`sourceVersionNumber`	No
`candidateFileVersionId`	No
`candidateVersionNumber`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description correctly identifies the operation as async, a key behavioral trait. However, with no annotations provided, it fails to disclose other important aspects like mutation side effects, return value (job ID or status), or error handling. The description does not indicate whether it is safe or destructive, leaving an agent uninformed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short (two sentences) and front-loaded with the purpose. However, given the tool's complexity (11 parameters, no output schema), it sacrifices necessary detail for brevity. It is concise but at the cost of completeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description fails to cover essential context for an agent: parameter semantics, return value, polling mechanism for async completion, dependencies, or relationship to other tools like poll_job. With high complexity and no output schema, this description is critically incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description offers zero information about the 11 input schema parameters, despite the schema having 0% coverage. It does not explain what artifactBundleId, maxCases, testCases, or any other parameter does. The agent must rely solely on parameter names and types, which are insufficient without descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Start' and the resource 'async eval run'. It distinguishes from siblings like list_evals and read_eval by indicating a new run is initiated. However, it does not elaborate on the async nature or how it differs from other mutation tools like start_audit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a required permission ('Requires evals:run for API keys'), which gives usage context. But it lacks explicit guidance on when to use this tool versus alternatives such as list_evals or read_eval. No when-not-to-use or alternative suggestions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_search_api_docsSearch Rubrkit API docsBInspect

Search the canonical docs/api Markdown guides and OpenAPI operation summaries.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`query`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but only specifies what is searched, not how (e.g., full-text vs keyword, ranking, pagination, or result format). Critical behavioral details are missing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. Every word earns its place with no extraneous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given two parameters, no annotations, and no output schema, the description is incomplete. It omits search behavior, result format, and parameter details, leaving significant gaps for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not elaborate on the parameters. With 0% schema description coverage, it should clarify query semantics (e.g., free text, phrase) and limit behavior. Parameter names alone are insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: searching through 'canonical docs/api Markdown guides and OpenAPI operation summaries.' This specific verb-resource combination distinguishes it from sibling tools like rubrkit_read_api_doc, which retrieves a single document.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention when-not-to-use or suggest related tools like rubrkit_read_api_doc for specific document access.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_set_drift_monitor_statusSet drift monitor statusAInspect

Pause or resume a drift monitor via PATCH /api/v1/drift-monitors/{driftMonitorId}. Requires artifact_bundles:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`status`	Yes
`driftMonitorId`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the burden. It mentions the required permission ('artifact_bundles:write') and the HTTP method (PATCH), which are useful. However, it does not disclose side effects, error conditions, or what happens upon success.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no fluff. The first sentence gives the action and endpoint; the second states the auth requirement. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the basic purpose and auth but omits return behavior, error handling, and the effect of pausing vs. resuming. With no annotations or output schema, more context would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain the parameters beyond the action. The driftMonitorId and status roles are only inferable from the endpoint and the action, providing minimal added value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Pause or resume') and the resource ('drift monitor'), and includes the HTTP method and endpoint for specificity. It distinguishes from sibling tools by focusing on status change.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (when needing to pause or resume a drift monitor) but provides no explicit guidance on when not to use or alternatives among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_start_auditStart auditAInspect

Start an async audit job. Audit results are cached per account by a content hash of their inputs; pass force: true to bypass the cache and run a fresh audit (the same as the CLI's --no-cache flag). Requires audits:run for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`force`	No
`modelTier`	No
`rubricKey`	No
`artifactType`	No
`targetFileId`	No
`reasoningEffort`	No
`artifactBundleId`	Yes
`customRequirements`	No
`targetVersionNumber`	No
`artifactBundleVersionNumber`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes async nature, caching, force bypass, and auth requirements. Lacks details on result retrieval or monitoring, but covers key behavioral aspects given no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with main action, efficient explanation of caching and auth. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 10 parameters, no output schema, and no annotations, the description is insufficient. It explains only one parameter and omits how to handle results, monitor progress, or what returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only 'force' parameter is explained in description; the other 9 parameters (including required artifactBundleId) are left to the schema, which has 0% description coverage. Description adds minimal semantic value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Start an async audit job', specifying verb and resource. It distinguishes from sibling tools like rubrkit_list_audits or rubrkit_read_audit by focusing on starting a new audit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on caching behavior and use of force parameter, along with required permission (audits:run). Does not explicitly state when not to use or compare to siblings, but context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_update_artifact_fileUpdate artifact fileBInspect

Write a new editable version for a file. Requires files:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`fileId`	Yes
`content`	Yes
`message`	No
`mediaType`	No
`artifactBundleId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full responsibility for behavioral disclosure. It states the tool creates a new version (mutation), but does not clarify whether this overwrites the existing file, how versioning works, or any side effects. The description lacks details on what happens to old versions, idempotency, or concurrency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of two short sentences: the first states the action, the second states the permission. It is highly concise, front-loaded with the core functionality, and contains no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no output schema, no annotations), the description is overly brief. It omits crucial information such as return values, error conditions, file existence requirements, version management, and parameter constraints. The description leaves significant gaps for effective agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, meaning no parameter descriptions are provided in the schema. The description does not explain any of the five parameters (fileId, content, message, mediaType, artifactBundleId), nor does it clarify their roles beyond the bare names. Thus, it adds minimal value for understanding parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Write a new editable version for a file.' The verb 'write' and the specific object 'editable version for a file' make the purpose unambiguous. It distinguishes itself from siblings like 'read_artifact_file' (read) and 'upload_artifact_files' (initial upload).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the permission requirement 'Requires files:write for API keys,' which is a prerequisite. However, it provides no guidance on when to use this tool versus alternatives (e.g., upload_artifact_files, restore_file_version), nor does it specify when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rubrkit_upload_artifact_filesUpload artifact filesBInspect

Upload one or more text files into an artifact bundle. Requires files:write for API keys.

ParametersJSON Schema

Name	Required	Description	Default
`files`	Yes
`artifactBundleId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full burden. It mentions 'text files' but schema allows any string content. It does not disclose behavior on bundle existence, overwriting, or file size limits. The permission note is useful but insufficient for complete transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence plus a permission note, which is concise. However, it is too brief given the parameter complexity (2 required params, 5 optional sub-properties). It could include more detail without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No annotations, no output schema, and 0% parameter description coverage. The tool has complex nested parameters, but the description provides minimal context. An agent would struggle to understand return values, error conditions, or file constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description adds no meaning to the parameters (artifactBundleId, files with path, content, message, mediaType, artifactType). The agent gets no hints about how to populate these fields correctly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'Upload one or more text files into an artifact bundle' and mentions the required permission 'files:write for API keys'. It distinguishes from sibling tools like read, update, delete, and list by specifying the unique action of uploading files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear precondition: requires files:write permission. It implies when to use (to add files to a bundle) but does not explicitly state when not to use or suggest alternatives like update for existing files.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?