Skip to main content
Glama
Ownership verified

Server Details

Drive OctoPerf load testing from any AI agent — import, edit, validate, run scenarios, read metrics. Hosted remote server, OAuth 2.1 (DCR + PKCE), no API key.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.4/5 across 96 of 96 tools scored. Lowest: 3.7/5.

Server CoherenceA
Disambiguation4/5

Most tools have clearly distinct purposes, aided by detailed descriptions. However, the presence of both update_ and patch_ variants for the same entity (e.g., update_scenario vs patch_scenario) and similar validation tools (get_validation_failure_detail vs fetch_validation_http_body) could cause confusion without careful reading. Overall, disambiguation is good but not perfect.

Naming Consistency5/5

All tool names follow a consistent verb_noun pattern in snake_case, making them predictable. Even compound verbs like 'sanity_check' and long phrases like 'add_correlation_framework_to_project' maintain the pattern. The naming is highly consistent.

Tool Count2/5

With 96 tools, the server vastly exceeds the typical well-scoped range of 3-15 tools. While each tool may be justified by the complexity of the performance testing platform, the high count makes the tool set overwhelming and reduces coherence for an agent.

Completeness4/5

The tool surface is comprehensive, covering creation, reading, updating, deletion, validation, scheduling, import/export, and analysis for most entities. Minor gaps exist, such as no direct list of all bench results (only via reports) and no bench result deletion, but these are workable.

Available Tools

96 tools
add_correlation_framework_to_projectAdd correlation framework to projectAInspect

[Design] Add the rules of a correlation framework preset (SAML, OAuth, .NET, Java, Token, AzureAD, or a custom one) to an OctoPerf design project's rule library. Bulk-creates every rule of the framework into the project, skipping rules that are already present (structural dedupe ignoring id/userId, mirrors the OctoPerf UI behaviour). Returns the rules that were actually created. The rules are not yet wired into any Virtual User — call apply_correlations_to_virtual_user next on each affected VU to materialise extractors and injections in its action tree.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id where the framework's rules will be created.
frameworkIdYesOctoPerf correlation framework id (see `list_correlation_frameworks`).

Output Schema

ParametersJSON Schema
NameRequiredDescription
rulesNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by detailing bulk-creation, structural deduplication (skipping existing rules), return of only created rules, and the fact that rules are not wired into Virtual Users. This adds valuable behavioral context that annotations do not cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph with no redundant information. It front-loads the key action and follows with important details (dedup, return, next step). Every sentence is essential.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (bulk-creation, dedup, follow-up action) and the presence of an output schema, the description fully covers the behavior, return value, and next steps. It is complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters are fully described in the schema, so baseline is 3. The description adds value by specifying that frameworkId refers to frameworks from 'list_correlation_frameworks', and context about presets. This enriches parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific verb 'Add' and resource 'correlation framework to project', lists the presets, and distinguishes it from sibling tools like 'apply_correlations_to_virtual_user' and 'create_correlation_rule' by explaining the bulk creation and dedup behavior. It fully clarifies what the tool does.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (to add a framework preset) and includes a direct instruction to call 'apply_correlations_to_virtual_user' next. However, it does not explicitly mention when not to use it or list alternatives, though the sibling context implies alternatives exist.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_correlations_to_virtual_userApply correlations to virtual userA
Destructive
Inspect

[Design] Submit an async correlation task: the OctoPerf backend re-walks the Virtual User's action tree, runs every active correlation rule on the recorded responses and rewires matching actions to use the extracted ${variable}. This is the step that makes a freshly-created correlation rule actually effective. The tool returns the taskId immediately — poll get_task_result with that taskId every few seconds until the task settles before deciding what to do next.

ParametersJSON Schema
NameRequiredDescriptionDefault
virtualUserIdYesOctoPerf Virtual User id to recompute with the project's correlation rules.

Output Schema

ParametersJSON Schema
NameRequiredDescription
taskIdNo
virtualUserIdNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses async nature, immediate taskId return, need to poll, and that it modifies the virtual user (destructive). Annotations already mark destructiveHint=true, but description adds valuable behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, well-structured, front-loaded with purpose, no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given async complexity and output schema presence, description covers necessary actions (submitting and polling) and expected side effects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter with full schema coverage. Description adds minimal extra meaning beyond schema, but baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool submits an async correlation task that re-walks the virtual user's action tree and applies correlation rules, which distinguishes it from creating rules.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Gives clear context that it's for making correlation rules effective after creation, and instructs to poll get_task_result. Does not explicitly mention when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

backup_virtual_userBackup virtual userAInspect

[Design] Create a safety backup of an OctoPerf Virtual User before a hard-to-reverse change (auto-correlation, large action-tree edits). Duplicates the VU in the same project — the full action tree is copied, recorded request/response bodies are not — and tags the copy backup (plus an optional label tag) so it is easy to find and restore. Non-destructive: the original VU is untouched. OctoPerf has no built-in VU versioning, so run this BEFORE applying correlation rules or editing the tree. Returns the backup copy's id, name, tags, timestamps and a url deep-link to it in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
labelNoOptional reason/marker added as an extra tag on the backup (e.g. "pre-correlation"). Leave null for a generic `backup` tag only.
virtualUserIdYesOctoPerf Virtual User id to back up. The copy is created in the same project.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
createdNoTimestamp as epoch milliseconds.
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description details non-destructive behavior (original untouched), what is copied (full action tree) and not copied (recorded request/response bodies), tagging with 'backup' and optional label, and the return values. This goes beyond annotations, which only provide readOnlyHint and destructiveHint flags. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a well-structured single paragraph: purpose first, then details of what is copied, tagging, non-destructive nature, usage timing, and return values. Every sentence is informative with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 2 parameters, an output schema, and the description explains return values (id, name, tags, timestamps, url), the description is complete for an agent to understand how to use and what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds context: label is an optional reason/marker, virtualUserId creates the copy in the same project. This adds meaning beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a backup of an OctoPerf Virtual User before hard-to-reverse changes, specifying the verb 'backup' and resource 'Virtual User'. It distinguishes from siblings like delete_virtual_user or update_virtual_user by focusing on duplication and tagging.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says to run this BEFORE applying correlation rules or editing the tree, and notes that OctoPerf has no built-in versioning, providing a clear use case. It does not explicitly state when not to use, but the guidance is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_constant_variableCreate constant variableAInspect

[Design] Create a Constant variable in an OctoPerf design project. The same value is returned every time the variable is read at runtime. Useful for environment-specific tokens, hosts, URLs and any other static parameterization input. Returns the created variable's metadata.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesVariable name. Referenced from Virtual User actions via ${name}.
valueYesConstant value the variable returns at runtime.
projectIdYesOctoPerf project id where the variable will be created.
descriptionNoHuman-readable description. Defaults to empty.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
typeNo
usageNo
valueNo
descriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-destructive creation. Description adds that the same value is always returned, and returns metadata. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences front-loaded with purpose, behavior, and use cases. No filler, each sentence adds unique information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given low complexity and existing output schema, description covers creation behavior and return value. Lacks explicit prerequisite mention but schema requires projectId.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all parameters with descriptions. Description only repeats the naming convention (${name}) already in schema. Adds minimal extra value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb 'Create' and resource 'Constant variable', distinguishes from sibling variable types (counter, csv, random, secret) by specifying constant behavior. Provides concrete use cases (tokens, hosts, URLs).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Describes typical use cases (static parameters) but does not explicitly mention when to avoid or compare with other variable types. Agent must infer from sibling context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_correlation_ruleCreate correlation ruleAInspect

[Design] Create a regex-based correlation rule in an OctoPerf design project. The rule captures one value from the response (BODY or HEADERS) via the regex's first capture group and re-injects it into subsequent requests at every configured injection target (PATH / BODY / HEADER / QUERY_PARAM / POST_PARAM, with PART_OF_VALUE semantics — substring replacement). Use this tool as part of an auto-correlation workflow: after a failed validate_virtual_user, inspect the recorded vs validation request/response via get_validation_failure_detail / fetch_validation_http_body, spot the dynamic value, create a rule capturing it, then call apply_correlations_to_virtual_user to recompute the VU. Returns the created rule's metadata.

ParametersJSON Schema
NameRequiredDescriptionDefault
regexYesRegex with at least one capture group `(...)`. The captured value is bound to ${variableName} and reinjected at every configured target.
projectIdYesOctoPerf project id where the rule will be created.
matchGroupNoRegex capture group index (1-based). -1 = all occurrences (foreach), 0 = random. Defaults to 1.
extractFromNoResponse section the regex runs against: BODY or HEADERS. Defaults to BODY.
defaultValueNoFallback value used when the regex finds no match. Defaults to empty string.
variableNameYesVariable name the rule extracts. Referenced from VU actions as ${variableName} after apply_correlations_to_virtual_user runs.
injectionTargetsYesWhere to re-inject the captured value. Each entry maps to one of the OctoPerf injection rule types (PART_OF_VALUE semantics; no specific header/param name filter).

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
enabledNo
patternNo
variableNameNo
extractorTypeNo
injectionTargetsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is not read-only (readOnlyHint=false) and not destructive (destructiveHint=false), so the description is not burdened with those disclosures. It adds value by explaining the injection semantics (PART_OF_VALUE, substring replacement) and the binding of captured values to a variable. No contradictions with annotations are present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-organized paragraph that front-loads the core purpose and then provides workflow context. It is appropriately sized for the tool's complexity (7 parameters) and does not include fluff. Minor suggestion: could be slightly more concise by trimming some workflow repetition, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the existence of an output schema, the description covers all necessary aspects: purpose, workflow integration, behavioral details, and parameter semantics. It explains when and how to use the tool in a realistic scenario (auto-correlation after validation failure), making it complete for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the description's parameter details are supplementary. The description reinforces the regex's first capture group, the variable binding, and the injection target semantics (PART_OF_VALUE), adding context beyond the schema. This enhances understanding, but the schema already covers the basics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a regex-based correlation rule in an OctoPerf design project, specifying that it captures a value from the response body or headers via a regex capture group and reinjects it into subsequent requests. It distinguishes itself from sibling tools like delete_correlation_rule or list_correlation_rules by detailing its role in an auto-correlation workflow, making its purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly places the tool within an auto-correlation workflow, instructing to use it after a failed `validate_virtual_user` and before `apply_correlations_to_virtual_user`. It mentions inspecting failure details and spotting dynamic values, providing clear context. However, it does not explicitly state when not to use this tool or list alternatives beyond the workflow steps, though the intent is well conveyed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_counter_variableCreate counter variableAInspect

[Design] Create a Counter variable in an OctoPerf design project. Each read returns the next value in [start, end] stepped by increment. Useful for generating unique IDs across iterations or VUs. Returns the created variable's metadata.

ParametersJSON Schema
NameRequiredDescriptionDefault
endNoLast value (inclusive). Defaults to Long.MAX_VALUE.
nameYesVariable name. Referenced from Virtual User actions via ${name}.
startNoFirst value (inclusive). Defaults to 1.
formatNoPrintf-style format applied to each value (e.g. "user-%04d"). Defaults to empty (raw number).
perUserNoIf true, every VU has its own counter; otherwise the counter is shared across the whole load test. Defaults to false (shared).
incrementNoStep between two consecutive values. Defaults to 1.
projectIdYesOctoPerf project id where the variable will be created.
descriptionNoHuman-readable description. Defaults to empty.
resetEachIterationNoIf true, reset to `start` at the beginning of each VU iteration. Defaults to false.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
typeNo
usageNo
valueNo
descriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains the read behavior and return value. Annotations already indicate non-destructive, readOnlyHint=false. No contradictions. Adds context about counter state management during test execution.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise, front-loaded sentences with no waste. First sentence states purpose, second explains behavior, third gives use case and return value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters and an output schema, the description is complete enough. It covers the core behavior and purpose. While it omits details like per-user, reset, format, those are in the schema. Slightly more differentiation from other variable types would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter has a description in the input schema. The tool description ties parameters into a coherent narrative (e.g., range [start, end] and increment) but does not add substantial new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a counter variable, specifies its behavior (each read returns next value in [start, end] stepped by increment), and provides a use case (generating unique IDs). It distinguishes itself from sibling variable creation tools by name and behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the use case (unique IDs across iterations or VUs), giving some guidance on when to use. However, it does not explicitly compare to alternatives like constant or random variables, which would help agents choose correctly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_csv_variableCreate CSV variableAInspect

[Design] Create a CSV variable in an OctoPerf design project. Pulls values row by row from a CSV file previously uploaded via upload_project_file. Each column listed in names becomes its own variable referenced as ${columnName} at runtime — the wrapper name is just a label and is NOT used in the substitution. Returns the created variable's metadata.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesVariable label (display-only). Substitution at runtime uses the column names, not this — see the `names` parameter.
namesYesColumn names — one entry per column of the CSV.
scopeNoPRIVATE (each VU walks its own rows) or SHARED (rows split across VUs). Defaults to SHARED.
offsetNo0-based offset of the first value row to read. Defaults to 0.
shuffleNoShuffle the rows in memory before reading. Defaults to false.
encodingNoFile encoding (e.g. UTF-8). Defaults to empty (auto-detect).
filenameYesName of an existing CSV file in the project (see `list_project_files`).
delimiterNoColumn separator. Defaults to comma ",".
projectIdYesOctoPerf project id where the variable will be created.
descriptionNoHuman-readable description. Defaults to empty.
recycleOnEOFNoLoop back to the first row when EOF is reached. Defaults to true.
allowQuotedDataNoHonor quote-enclosed cells (handles embedded delimiters and newlines). Defaults to true.
ignoreFirstLineNoSkip the first line of the CSV (header). Defaults to false.
stopThreadOnEOFNoStop the VU thread on EOF (overrides recycleOnEOF). Defaults to false.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
typeNo
usageNo
valueNo
descriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations by explaining the row-by-row pulling and the substitution mechanism where column names become runtime variables. It does not disclose error conditions or file-not-found behavior, but annotations already indicate non-destructive and non-read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with the core purpose, and every sentence adds value. No redundant or wasteful wording.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the rich schema descriptions (100% coverage) and the presence of an output schema, the description adequately explains the core concept and key differentiation (column name substitution). It links to related tools (upload_project_file) but could briefly mention scope or encoding options for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description reinforces the parameter semantics by explaining the display-only nature of 'name' and the substitution of column names, but this adds only marginal value over the already detailed schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a CSV variable in OctoPerf, specifying the source from a CSV file and explaining column substitution. It distinguishes from sibling variable creation tools by focusing on CSV-based data parameterization and clarifying the role of the 'name' parameter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a prerequisite (CSV file must be uploaded via upload_project_file) but does not explicitly state when to use this tool versus alternatives like constant or random variable creation. Usage guidance is implicit rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_projectCreate projectAInspect

[Project] Create a new OctoPerf DESIGN project in a workspace. Projects group Virtual Users, scenarios, variables, correlation rules and files together. Returns the created project's id, workspaceId, name, description and a url deep-link to the project page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesProject name as it appears in the OctoPerf web UI.
descriptionNoOptional human-readable description.
workspaceIdYesOctoPerf workspace id the new project will belong to.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
descriptionNo
workspaceIdNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate mutation (readOnlyHint=false) and non-destructive (destructiveHint=false). Description adds that it returns a deep-link URL, enriching behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose, second explains grouping and return info. No extraneous words, front-loaded, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given an output schema exists, the description fully covers creation scope, return structure (id, workspaceId, name, description, url), and project grouping, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. Description adds that the name appears as in the UI, description is optional, and workspaceId is the OctoPerf workspace id, providing extra clarity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description specifies the verb 'Create', the resource 'OctoPerf DESIGN project', and the scope 'in a workspace'. It clearly distinguishes from sibling tools like 'update_project' by focusing on creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage is implied by the creation verb, and sibling tools indicate alternatives (e.g., update). However, no explicit 'when not to use' or direct comparison to alternatives is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_random_variableCreate random variableAInspect

[Design] Create a Random variable in an OctoPerf design project. Each read returns a uniformly-random integer in [minValue, maxValue], optionally formatted via a printf-style outputFormat (e.g. "%05d"). Returns the created variable's metadata.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesVariable name. Referenced from Virtual User actions via ${name}.
maxValueYesUpper bound (inclusive).
minValueYesLower bound (inclusive).
projectIdYesOctoPerf project id where the variable will be created.
descriptionNoHuman-readable description. Defaults to empty.
outputFormatNoPrintf-style format applied to each value (e.g. "%05d"). Defaults to empty (raw number).

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
typeNo
usageNo
valueNo
descriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description explains that each read yields a uniformly-random integer, optionally formatted, and returns metadata. Annotations already indicate non-read-only, and description adds meaningful behavioral context beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. First sentence states purpose, second covers behavior and return. Front-loaded and concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and 6 parameters, the description covers creation, read behavior, formatting, and return. It could mention that an existing project is required, but the projectId parameter implies that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds extra context such as the printf-style format example and the fact that name is referenced via ${name}. This adds value beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it creates a random variable in OctoPerf, with specific detail about uniform integer distribution and optional formatting. This distinguishes it from sibling tools like create_constant_variable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for needing a random integer but does not explicitly compare to other variable types (constant, counter, CSV, secret) or specify when to choose this tool over them.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_scenario_ramp_upCreate ramp-up scenarioAInspect

[Runtime] Create a new OctoPerf Scenario with a RAMP-UP load shape: every UserProfile linearly ramps from 0 to users virtual users over rampUpSec, then stays at users for holdForSec. rampUpSec=0 collapses to an instant constant load. Each Virtual User in virtualUserIds becomes one UserProfile bound to the same providerId and one of the locations (round-robin VU[i] -> locations[i % locations.size()]). Engine defaulted from each VU's type (JMETER → JmeterUserProfileEngine, WEB_DRIVER → SeleniumUserProfileEngine, PLAYWRIGHT → PlaywrightUserProfileEngine). For richer load shapes use create_scenario_ramp_up_down (ramp + plateau + ramp-down) or create_scenario_stairs (ascending stairs). Returns the new scenario id and a url deep-link.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesScenario name (human-readable).
usersYesTarget concurrent users at plateau, applied to EACH Virtual User profile (total = users × virtualUserIds.size()).
locationsYesRegion / location ids on the provider (each must be valid). UserProfiles are assigned round-robin: VU[i] -> locations[i % locations.size()].
projectIdYesOctoPerf project id where the scenario will be created.
rampUpSecYesRamp-up duration in seconds. Use 0 for an instant constant load.
holdForSecYesPlateau duration in seconds (how long the load stays at `users`).
providerIdYesDocker provider id (see `list_docker_providers_by_workspace`).
descriptionNoOptional description. Defaults to empty.
virtualUserIdsYesIds of the Virtual Users to bind. One UserProfile is created per VU.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
projectIdNo
descriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are non-destructive and non-read-only, which matches the creation action. Description adds context on engine selection and return values. No contradictions, but no mention of idempotency or overwriting.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single focused paragraph with no wasted words. Front-loaded with purpose, then details, then alternatives. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters, high complexity, and presence of output schema, the description covers all key aspects: ramp-up shape, round-robin, engine mapping, and return value. No missing critical information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds significant meaning: explains round-robin, total users calculation, and engine defaulting per VU type. Adds value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it creates a scenario with a RAMP-UP load shape. Verb 'create', resource 'Scenario', and specific load shape are explicit. Mentions sibling tools for richer shapes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use this tool vs alternatives (ramp_up_down, stairs). Describes edge case rampUpSec=0. Explains round-robin assignment and engine defaulting.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_scenario_ramp_up_downCreate ramp-up-down scenarioAInspect

[Runtime] Create a new OctoPerf Scenario with a RAMP-UP-then-RAMP-DOWN load shape: every UserProfile ramps from 0 to users over rampUpSec, holds at users for holdForSec, then ramps back down to 0 over rampDownSec. Useful for soak tests with a controlled wind-down. Same VU / provider / round-robin semantics as create_scenario_ramp_up. Returns the new scenario id and a url deep-link.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesScenario name (human-readable).
usersYesTarget concurrent users at plateau, applied to EACH Virtual User profile.
locationsYesRegion / location ids on the provider, round-robin per VU.
projectIdYesOctoPerf project id where the scenario will be created.
rampUpSecYesRamp-up duration in seconds.
holdForSecYesPlateau duration in seconds.
providerIdYesDocker provider id (see `list_docker_providers_by_workspace`).
descriptionNoOptional description. Defaults to empty.
rampDownSecYesRamp-down duration in seconds (linear from `users` back to 0).
virtualUserIdsYesIds of the Virtual Users to bind. One UserProfile is created per VU.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
projectIdNo
descriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate this is not read-only (readOnlyHint=false). The description adds behavioral details like the load shape, per-VU semantics, and return value (id and url deep-link), which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, each serving a distinct purpose: defining the tool's action and providing usage context. No redundant information, perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description sufficiently explains the return value. It covers the load shape, parameter semantics, and cross-references a sibling tool. A minor gap could be explicit mention of the output format, but it's adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. However, the description adds meaningful context for key parameters (e.g., 'users' applies per VU, 'virtualUserIds' creates one profile per VU) and references sibling semantics, providing extra clarity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a scenario with a specific ramp-up-then-ramp-down load shape. It distinguishes itself from the sibling 'create_scenario_ramp_up' by explicitly mentioning the ramp-down phase, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides a clear use case ('soak tests with a controlled wind-down') and references an alternative tool for similar semantics. While it does not explicitly state when not to use, the context is sufficient for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_scenario_stairsCreate stairs scenarioAInspect

[Runtime] Create a new OctoPerf Scenario with an ASCENDING STAIRS load shape: every UserProfile ramps from 0 to users in stepCount discrete steps over rampUpSec total (each step adds users/stepCount users and waits rampUpSec/stepCount before the next), then plateaus at users for holdForSec. Useful for capacity-finding tests where you want to observe the SUT's behaviour at each step. Same VU / provider / round-robin semantics as create_scenario_ramp_up. Returns the new scenario id and a url deep-link.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesScenario name (human-readable).
usersYesTarget concurrent users at plateau, applied to EACH Virtual User profile.
locationsYesRegion / location ids on the provider, round-robin per VU.
projectIdYesOctoPerf project id where the scenario will be created.
rampUpSecYesTotal ramp duration in seconds (split equally across `stepCount` steps).
stepCountYesNumber of discrete steps between 0 and `users`. Must be >= 1.
holdForSecYesPlateau duration in seconds (how long the load stays at `users` after the last step).
providerIdYesDocker provider id (see `list_docker_providers_by_workspace`).
descriptionNoOptional description. Defaults to empty.
virtualUserIdsYesIds of the Virtual Users to bind. One UserProfile is created per VU.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
projectIdNo
descriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (non-readOnly, non-destructive); the description adds behavioral details: the ramp-in-steps algorithm, plateau, same semantics as ramp-up, and return of id and URL. It transparently covers the creation behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (several sentences) and front-loaded with the main purpose. It efficiently covers the load shape, use case, sibling reference, and return value without excess.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 10 parameters, 100% schema coverage, and likely output schema, the description adequately covers the tool's purpose, load shape mechanics, and return. It does not mention error conditions or prerequisites, but the schema already documents required parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; the description adds significant meaning by explaining how 'users', 'rampUpSec', 'stepCount', and 'holdForSec' interact (e.g., each step adds users/stepCount users). It also clarifies 'users' applies per VU and locations are round-robin.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create', the resource 'Scenario', and the specific load shape 'ASCENDING STAIRS'. It distinguishes from siblings by naming the shape and referencing 'create_scenario_ramp_up' for semantic comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear use case ('capacity-finding tests to observe SUT behavior at each step') and references a sibling for semantics, but does not explicitly list when not to use or alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_secret_variableCreate secret variableAInspect

[Design] Create a Secret variable in an OctoPerf design project. The plaintext value is sent over the authenticated HTTPS channel and persisted ENCRYPTED at rest; list_variables returns the ciphertext only, never the plaintext. Use for passwords, API tokens or other sensitive parameterization inputs. Returns the created variable's listing.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesVariable name. Referenced from Virtual User actions via ${name}.
valueYesPlaintext value. The backend encrypts at rest; `list_variables` returns ciphertext.
projectIdYesOctoPerf project id where the variable will be created.
descriptionNoHuman-readable description. Defaults to empty.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
typeNo
usageNo
valueNo
descriptionNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the value is encrypted at rest and that list_variables returns only ciphertext. Annotations are present (readOnlyHint=false, destructiveHint=false) and consistent; the description adds valuable behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise, front-loaded sentences. Each sentence serves a distinct purpose: purpose, security/retrieval behavior, and usage guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema is present (context signals), the description need not detail the return value beyond 'Returns the created variable's listing.' It adequately covers the creation process and security. Minor gaps: no error handling or idempotency mention, but overall complete for a creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (all 4 parameters described). The description adds meaning for 'name' (referenced as ${name}) and 'value' (plaintext, encrypted at rest), enhancing the schema's descriptions. Sibling differentiation is implicit.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Create') and resource ('Secret variable'), and specifies it is within an OctoPerf design project. It distinguishes from sibling variable creation tools (e.g., create_constant_variable) by focusing on sensitive parameterization inputs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states the tool is for sensitive inputs like passwords and API tokens, giving clear context for when to use it. However, it does not explicitly mention when not to use it or provide direct alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_trend_report_by_creation_dateCreate trend report by creation dateAInspect

[Analysis] Create a TREND bench report seeded from a reference benchResult plus a TrendReportCreationDateSelector — the backend pulls other benchResults from the same project created within the given date range. Pass at least one of fromMs / toMs (epoch milliseconds, UTC). Open-ended ranges are supported: omit fromMs for "up to toMs", omit toMs for "from fromMs onwards". Returns the new report's id, name, description, benchResultIds, tags, lastModified and a url deep-link to the analysis page.

ParametersJSON Schema
NameRequiredDescriptionDefault
toMsNoUpper bound (epoch milliseconds, UTC). Omit for open-ended "from fromMs onwards".
fromMsNoLower bound (epoch milliseconds, UTC). Omit for open-ended "up to toMs".
shownResultsYesMaximum number of trend points the backend renders.
referenceBenchResultIdYesOctoPerf benchResult id that anchors the trend.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
benchResultIdsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a non-destructive write. The description adds details about backend behavior (pulling other benchResults from the same project) and the output fields, which is valuable beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with each sentence providing unique information. No redundancy, adequate length for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers mechanism, parameter constraints, and return fields. Lacks error handling or prerequisites but sufficient for a complex tool with output schema referenced.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds usage context for parameters (open-ended ranges, at least one constraint), which enhances understanding beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it creates a TREND bench report using a creation date selector, differentiating from siblings like create_trend_report_by_name and create_trend_report_by_tags. The verb 'Create' is specific and the resource is well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on parameter usage (at least one of fromMs/toMs, open-ended ranges) and return value. Sibling names imply alternatives, but no exclusionary language.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_trend_report_by_nameCreate trend report by nameAInspect

[Analysis] Create a TREND bench report seeded from a reference benchResult plus a TrendReportNameSelector — the backend pulls other benchResults from the same project whose scenario name matches the search string under the chosen searchType (EQUALS / EQUALS_IGNORECASE / CONTAINS / CONTAINS_IGNORECASE / STARTS_WITH / ENDS_WITH). Returns the new report's id, name, description, benchResultIds, tags, lastModified and a url deep-link to the analysis page.

ParametersJSON Schema
NameRequiredDescriptionDefault
searchYesSubstring or pattern to match against other runs' scenario names.
searchTypeYesHow to compare `search` against scenario names: EQUALS, EQUALS_IGNORECASE, CONTAINS, CONTAINS_IGNORECASE, STARTS_WITH or ENDS_WITH.
shownResultsYesMaximum number of trend points the backend renders.
referenceBenchResultIdYesOctoPerf benchResult id that anchors the trend.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
benchResultIdsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=false, destructiveHint=false) confirm it's a non-destructive write. The description adds details on the backend behavior (pulling benchResults from the same project with name matching) and the return fields (id, name, etc., including a deep-link URL). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph that efficiently conveys the purpose and mechanism. It front-loads the action and then details the matching logic and return fields. Minor improvement could be structuring return fields more concisely, but no wasteful sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 required params, no output schema), the description adequately explains the algorithm and return fields. It does not mention prerequisites (e.g., existence of reference benchResult) or error handling, but for an AI agent the core information is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 4 parameters. The description adds value by explaining how 'search' and 'searchType' work together (e.g., the enumeration of search types and their matching behavior), which goes beyond the schema's individual descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'create', the resource 'TREND bench report', and the specific mechanism: seeded from a reference benchResult and matching scenario names via search and searchType. It distinguishes from sibling tools (create_trend_report_by_creation_date, create_trend_report_by_tags) by specifying name-based matching.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for name-based trend reports but does not explicitly state when to use this versus alternatives. It provides context on the matching behavior but lacks direct guidance on when not to use it or mention of sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_trend_report_by_tagsCreate trend report by tagsAInspect

[Analysis] Create a TREND bench report seeded from a reference benchResult plus a TrendReportTagsSelector — the backend pulls other benchResults from the same project that carry ALL the given tags and plots them on the trend axis. Mirrors the OctoPerf UI's trend creation flow (POST default-report then PUT with a TrendReportConfig). Returns the new report's id, name, description, benchResultIds (just the reference at create-time; the rest is materialised by the backend on each read), tags, lastModified and a url deep-link to the analysis page.

ParametersJSON Schema
NameRequiredDescriptionDefault
tagsYesTag set the other benchResults must carry (ALL tags must match — intersection, not union).
shownResultsYesMaximum number of trend points the backend renders (reference + up to N-1 selector matches; older matches drop off first).
referenceBenchResultIdYesOctoPerf benchResult id that anchors the trend.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
benchResultIdsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes well beyond annotations by detailing the backend behavior: fetching other benchResults carrying ALL tags, plotting them, and materializing the rest on each read. It also lists return fields including a deep-link URL, providing rich behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of about 4 sentences, each earning its place. It could be slightly more structured (e.g., bullet points for return fields), but it is efficient and not verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given low parameter count (3) and the presence of an output schema (implied by listed fields), the description covers creation flow, backend behavior, and return values completely. No gaps identified for the complexity level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although schema coverage is 100%, the description adds significant meaning: for 'tags' it clarifies intersection semantics ('ALL tags must match'), and for 'shownResults' it explains the maximum count and drop-off behavior ('older matches drop off first'). This adds value beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verb 'Create' and resource 'TREND bench report', clearly distinguishing from sibling tools like 'create_trend_report_by_creation_date' that create reports by different criteria. It explains the seeding mechanism with reference benchResult and tags.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use: to create a trend report from tags, mirroring the OctoPerf UI flow. It does not explicitly exclude alternatives or state when not to use, but the context of sibling tools implies differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_bench_reportDelete bench reportA
Destructive
Inspect

[Analysis] Delete an OctoPerf bench report by id. DESTRUCTIVE — drops the report (items, configs, comments) but keeps the underlying benchResults intact and reachable by their own ids. Other reports referencing the same benchResults still work. Use only after the user has confirmed the deletion.

ParametersJSON Schema
NameRequiredDescriptionDefault
reportIdYesOctoPerf bench report id to delete.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond the `destructiveHint: true` annotation by detailing that the report and its sub-items are deleted while benchResults remain intact and other reports still work. This adds valuable behavioral context without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with the action, and every sentence adds value: core purpose, destructive behavior, and usage prerequisite. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a delete operation without an output schema, the description fully explains what is deleted, what persists, and how other entities are unaffected. It provides sufficient context for an agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (`reportId`) exists with 100% schema description coverage. The description 'by id' confirms the parameter's role, but does not add additional meaning beyond the schema's own description. Baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Delete an OctoPerf bench report by id.' It clearly specifies the verb (delete), resource (bench report), and identifier (id). It distinguishes from sibling deletion tools like delete_correlation_rule or delete_scenario by focusing on bench reports.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly warns 'DESTRUCTIVE' and states 'Use only after the user has confirmed the deletion.' It explains what is dropped vs kept, but does not explicitly state when not to use it or list alternatives. However, the context provides enough guidance for appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_correlation_ruleDelete correlation ruleA
Destructive
Inspect

[Design] DELETES a correlation rule from an OctoPerf design project. DESTRUCTIVE — Virtual User actions that were correlated via this rule will keep their already-extracted variables, but new runs of apply_correlations_to_virtual_user will no longer reinject the value. Returns the deleted correlationRuleId for confirmation.

ParametersJSON Schema
NameRequiredDescriptionDefault
correlationRuleIdYesOctoPerf correlation rule id to delete.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Goes beyond annotation (destructiveHint: true) by detailing the exact consequences: existing variables are preserved but future applications of apply_correlations_to_virtual_user will not reinject. Also mentions return value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the action and provide necessary consequences. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Sufficiently complete for a single-parameter deletion tool with no output schema. Explains the effect and return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter correlationRuleId is already well-described in the schema. The description adds no extra detail beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'DELETES a correlation rule from an OctoPerf design project' with specific verb and resource. Distinguishes from sibling tools like create_correlation_rule and list_correlation_rules.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on when to use by explaining the destructive effect on virtual user correlations. However, lacks explicit mention of when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_http_serverDelete HTTP serverA
Destructive
Inspect

[Design] DELETES an HTTP server from an OctoPerf design project. DESTRUCTIVE — every HTTP request action of any Virtual User that references this server will be left dangling. The tool first calls the usage endpoint to fetch the IDs of the impacted Virtual Users, then issues the DELETE, and returns the impacted VU IDs.

ParametersJSON Schema
NameRequiredDescriptionDefault
serverIdYesOctoPerf HTTP server id to delete.

Output Schema

ParametersJSON Schema
NameRequiredDescription
impactedVirtualUserIdsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by detailing the internal process: the tool first fetches impacted VU IDs via a usage endpoint, then deletes, and returns those IDs. It also explicitly states the dangling effect on HTTP request actions, providing rich behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with three well-structured sentences that front-load the core action. It provides all necessary information without superfluous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity (single required parameter, clear annotations, and output schema present), the description fully covers what the tool does, its side effects, and what it returns. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for serverId. The tool description does not add additional parameter meaning beyond what the schema provides, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly identifies the tool as deleting an HTTP server from a design project, with specific verb and resource. It distinguishes from siblings like delete_unused_http_servers by implying it affects used servers, and mentions the destructive impact.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

While the description explains the destructive consequences, it does not provide explicit guidance on when to use this tool versus alternatives like delete_unused_http_servers. Usage is implied but not directly compared or advised.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_project_fileDelete project fileA
Destructive
Inspect

[Design] DELETES a file from an OctoPerf design project. DESTRUCTIVE — any Virtual User action that reads from this file (CSV iterators, file body sources) will be left dangling. Returns the deleted filename for confirmation.

ParametersJSON Schema
NameRequiredDescriptionDefault
filenameYesName of the file to delete (as returned by `list_project_files`).
projectIdYesOctoPerf project id the file belongs to.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotation 'destructiveHint: true', the description adds critical behavioral context: it explains that deleting a file will leave Virtual User actions (CSV iterators, file body sources) that read from that file dangling, and that it returns the deleted filename for confirmation. This fully informs the agent of consequences.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the core action, followed by warnings and return value. Every sentence is necessary and contributes to understanding without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with two required parameters and no output schema, the description is complete. It covers purpose, destructive impact, and return value. Given the context of sibling tools, it sufficiently differentiates itself.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The tool description does not add additional meaning beyond what the schema already provides for 'filename' and 'projectId'. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it deletes a file from an OctoPerf design project, using the specific verb 'DELETES' and resource 'file from an OctoPerf design project'. It distinguishes itself from other sibling delete tools by specifying the resource type, and includes the return value.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage (when a file needs deletion) but does not provide explicit guidance on when to use this tool versus alternatives, nor does it mention prerequisites or when not to use it. It only states the destructive nature without comparative context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_scenarioDelete scenarioA
Destructive
Inspect

[Runtime] Delete an OctoPerf Scenario by id. DESTRUCTIVE — drops the entire scenario configuration (all userProfiles, load shapes, engine settings). Past bench runs and reports stay accessible by their own ids but the scenario itself is gone; any scheduled job pointing to it will stop firing. Use only after the user has confirmed the deletion.

ParametersJSON Schema
NameRequiredDescriptionDefault
scenarioIdYesOctoPerf scenario id to delete.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark destructiveHint=true, but the description goes far beyond by detailing what exactly is destroyed (userProfiles, load shapes, engine settings), what remains (past bench runs/reports), and side effects (scheduled jobs stop). This fully discloses the impact.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: action, destructiveness, detailed consequences, and usage instruction, all in three sentences with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one param, destructive), the description covers all essential context: what it deletes, what persists, side effects on schedules, and confirmation requirement. No output schema needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description's mention of 'by id' does not add significant meaning beyond the schema's 'OctoPerf scenario id to delete'. The parameter semantics are adequately covered by the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete an OctoPerf Scenario by id', specifying the verb and resource. It distinguishes from sibling delete tools by emphasizing it drops the entire scenario configuration, not just a part.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use only after the user has confirmed the deletion', providing a key usage guideline. Also notes that past bench runs remain and scheduled jobs stop, helping the agent decide when to use this tool. However, it doesn't explicitly compare with sibling delete tools like delete_bench_report.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_scheduled_jobDelete scheduled jobA
Destructive
Inspect

[Scheduler] DESTRUCTIVE — permanently delete a ScheduledJob. Past runs the job already fired stay in the bench history; only the schedule entry is dropped. For a reversible pause prefer disable_scheduled_job. Use only after the user has confirmed.

ParametersJSON Schema
NameRequiredDescriptionDefault
jobIdYesOctoPerf scheduled job id to delete.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide destructiveHint=true, but the description adds valuable context: past runs remain in bench history, only schedule entry is dropped. This enhances transparency beyond annotations, though could mention whether deletion is immediate or reversible (it states permanent).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with 'DESTRUCTIVE' label and key action, no unnecessary words. Perfectly structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With a single parameter, no output schema, and strong annotations, the description fully covers the tool's behavior, side effects (past runs preserved), and usage context. Nothing missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description doesn't need to add parameter info. The schema already describes jobId adequately. The description does not elaborate further, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'permanently delete a ScheduledJob'. It uses specific verb ('delete') and resource ('ScheduledJob'), and distinguishes itself from the sibling 'disable_scheduled_job' by noting it is a permanent deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'For a reversible pause prefer disable_scheduled_job' and 'Use only after the user has confirmed', giving both when to use and when not to use, with a named alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_unused_http_serversDelete unused HTTP serversA
Destructive
Inspect

[Design] DELETES every HTTP server of an OctoPerf design project that no Virtual User references. SAFE — only unreferenced servers are removed; servers in use by at least one VU action are kept untouched. Returns the baseUrl of each deleted server so the user can audit the cleanup.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose unused HTTP servers to delete.

Output Schema

ParametersJSON Schema
NameRequiredDescription
serverBaseUrlsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds safety guarantee beyond the destructiveHint annotation, noting only unreferenced servers are removed. Also discloses return of baseUrl for audit. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that are front-loaded: first states action and scope, second adds safety and return value. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, safety, and return value adequately for a simple tool. Could mention error cases or behavior when no unused servers exist, but the output schema likely covers return structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter projectId, with a clear description in the schema. The tool description does not add extra meaning for this parameter, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool deletes unused HTTP servers in an OctoPerf project, with specific verb 'delete' and resource 'unused HTTP servers'. Distinguishes from siblings like delete_http_server by emphasizing the 'unused' condition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context that it is safe and only removes unreferenced servers, but does not explicitly state when to use this tool versus alternatives like delete_http_server for force deletion. The return value for audit is mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_variableDelete variableA
Destructive
Inspect

[Design] DELETES a variable from an OctoPerf design project. DESTRUCTIVE — any Virtual User action that references this variable (via ${variableName}) will be left dangling. Returns the deleted variableId for confirmation.

ParametersJSON Schema
NameRequiredDescriptionDefault
variableIdYesOctoPerf variable id to delete.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the destructiveHint annotation, the description adds critical context: dangling references to the variable and return of the deleted variableId for confirmation. This helps the agent understand consequences.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with a tag and clear action, no redundancy. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation with one parameter and no output schema, the description covers the effect on referenced actions and the return value. Lacks error handling but is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already describes the single parameter variableId with 100% coverage. The description does not add further detail about the parameter beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'DELETES a variable from an OctoPerf design project', using specific verb and resource. It distinguishes from sibling tools like delete_correlation_rule or delete_scenario by explicitly mentioning 'variable'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., create_variable, update_variable). No context on prerequisites or conditions for deletion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_virtual_userDelete virtual userA
Destructive
Inspect

[Design] Delete an OctoPerf Virtual User by id. DESTRUCTIVE — drops the entire action tree, all attached extractors, assertions and recorded responses. Validation history is preserved separately by benchResultId but the VU itself is gone. Use only after the user has confirmed the deletion.

ParametersJSON Schema
NameRequiredDescriptionDefault
virtualUserIdYesOctoPerf Virtual User id to delete.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses specific behavioral details beyond annotations: 'drops the entire action tree, all attached extractors, assertions and recorded responses. Validation history is preserved separately...' This adds significant context about what is destroyed and what is preserved, exceeding the destructiveHint annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise: two sentences, front-loaded with the purpose, then relevant behavioral and usage details. No unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple destructive tool with one required parameter and no output schema, the description is complete. It covers what happens, what is preserved, and when to use it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with a clear description for virtualUserId. The tool description mentions 'by id' but does not add meaningful semantics beyond the schema's description. Baseline is 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete an OctoPerf Virtual User by id.' It specifies the action (delete) and resource (virtual user by id), distinguishing it from other delete tools and virtual user manipulation tools among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use only after the user has confirmed the deletion,' providing a clear usage guideline. It also implies destructive nature, but does not directly compare to alternatives like backup or edit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

describe_virtual_userDescribe virtual userA
Read-only
Inspect

[Design] Project an existing OctoPerf Virtual User into its compact listing form: id, name, description, tags, timestamps and a url deep-link to the Virtual User page in the OctoPerf web UI. Use it after a presigned import (e.g. import_har_virtual_user, upload_jmx_virtual_user) returned a raw VU id and you need the UI link without re-listing the whole project, or whenever you have a VU id but want the lightweight projection rather than the heavy action tree returned by get_virtual_user.

ParametersJSON Schema
NameRequiredDescriptionDefault
virtualUserIdYesOctoPerf Virtual User id to project into a VirtualUserListing.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
createdNoTimestamp as epoch milliseconds.
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds that the output includes a deep-link URL and specific fields, providing useful context beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first defines purpose and output, second gives usage guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple projection tool with one parameter and an output schema, the description fully covers what the tool does, when to use it, and what it returns. Siblings list confirms unique role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers the single parameter fully. The description mentions 'raw VU id' from import tools, adding slight context but not essential beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool projects a Virtual User into a compact listing with specific fields, and distinguishes itself from get_virtual_user by offering a lightweight alternative.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs when to use: after an import that returns a VU id when needing a UI link, or when wanting a lightweight projection instead of the full action tree. Also names sibling tool get_virtual_user as alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

disable_scheduled_jobDisable scheduled jobAInspect

[Scheduler] Disable a ScheduledJob without deleting it — pauses fire-time execution. The job stays in the project's scheduler, can be re-armed later via enable_scheduled_job. Use this to safely stop a recurring cron without losing the configuration. Returns the updated ScheduledJobListing with enabled=false and nextRun cleared.

ParametersJSON Schema
NameRequiredDescriptionDefault
jobIdYesOctoPerf scheduled job id to disable.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
enabledNo
nextRunNoTimestamp as epoch milliseconds.
scenarioIdNo
triggerDescriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false and destructiveHint=false. The description adds that the job stays in the scheduler and can be re-armed, and it specifies the return value (enabled=false, nextRun cleared). This provides good behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only two sentences, front-loads the key action, and includes usage context and return behavior with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and an output schema, the description adequately covers purpose, usage, and return behavior. No additional information is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema provides 100% coverage for the single parameter jobId with a clear description. The tool description does not add additional meaning beyond what the schema already provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (Disable), the resource (ScheduledJob), and the scope (without deleting). It distinguishes from sibling tools like delete_scheduled_job and enable_scheduled_job by explicitly stating it pauses execution without deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises when to use the tool: 'Use this to safely stop a recurring cron without losing the configuration.' It also mentions the alternative enable_scheduled_job for re-arming, providing clear guidance on when to use versus other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

download_bench_result_fileDownload bench result fileA
Read-only
Inspect

[Analysis] Mint a presigned URL to download one file attached to an OctoPerf benchResult (trace.zip, screenshots, HAR archives, …). Returns a GET URL the LLM or the user can fetch directly, valid for ~5 minutes. The single-use token is consumed on the first GET, so fetch it once.

ParametersJSON Schema
NameRequiredDescriptionDefault
filenameYesFilename as returned by `list_bench_result_files`.
benchResultIdYesOctoPerf benchResult id the file belongs to.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlNo
methodNo
expiresAtNoTimestamp as epoch milliseconds.
instructionsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the agent knows it is a safe read operation. The description adds value by explaining the presigned URL nature, 5-minute validity, and single-use token behavior, which are not in annotations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with the core action of minting a presigned URL. No extraneous information; every sentence adds value (purpose, usage, and behavioral notes). It is well-structured and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description, combined with the schema (which references list_bench_result_files for filename), provides sufficient context for a file download tool. It explains the URL generation, validity, and single-use token. Returns are covered by the output schema, so no need to detail them further.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with descriptions for both parameters (filename and benchResultId). The description mentions example file types but does not add new semantic meaning beyond the schema. Baseline of 3 is appropriate as schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool mints a presigned URL to download one file attached to an OctoPerf benchResult, listing example file types like trace.zip and screenshots. It distinguishes from sibling tools like download_project_file by specifying the resource is a benchResult file. The verb 'Mint a presigned URL' is specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for downloading bench result files but does not explicitly state when to use it over alternatives like download_project_file. It provides usage notes (validity period, single-use token) but lacks guidance on when not to use it or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

download_project_fileDownload project fileA
Read-only
Inspect

[Design] Mint a presigned URL to download a file (typically a CSV used for Virtual User parameterization) from an OctoPerf design project. Returns a GET URL the LLM or the user can fetch directly (bypassing the MCP server for the bytes), valid for ~5 minutes. The single-use token is consumed on the first GET, so fetch it once.

ParametersJSON Schema
NameRequiredDescriptionDefault
filenameYesName of the file to download (as returned by `list_project_files`).
projectIdYesOctoPerf project id the file belongs to.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlNo
methodNo
expiresAtNoTimestamp as epoch milliseconds.
instructionsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=true, destructiveHint=false), the description adds behavioral context: the operation mints a presigned URL (non-destructive read), is single-use, and time-limited. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: first states core purpose, second explains usage of the result, third gives a critical warning. No wasted words; all information is front-loaded and essential.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema and annotations covering safety, the description sufficiently covers all necessary aspects: what it does, how to use the result, and important constraints. Complete for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds value by stating filename is 'as returned by list_project_files' (linking to sibling tool) and that the file is typically a CSV for parameterization, providing richer context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool 'Mint a presigned URL to download a file', specifies the resource ('OctoPerf design project'), and differentiates from siblings like 'download_bench_result_file' and 'read_project_file_lines' by mentioning 'typically a CSV used for Virtual User parameterization'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains how to use the returned URL ('fetch directly, bypassing the MCP server'), its validity ('valid for ~5 minutes'), and a critical constraint ('single-use token is consumed on the first GET'). It does not explicitly state when not to use, but provides sufficient guidance for correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

enable_scheduled_jobEnable scheduled jobAInspect

[Scheduler] Enable a paused ScheduledJob — the job will fire again at its next trigger time. Re-arms a destructive action (every cron fire consumes credits); confirm with the user before invoking. Returns the updated ScheduledJobListing.

ParametersJSON Schema
NameRequiredDescriptionDefault
jobIdYesOctoPerf scheduled job id to enable.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
enabledNo
nextRunNoTimestamp as epoch milliseconds.
scenarioIdNo
triggerDescriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and destructiveHint=false. Description adds that enabling re-arms a destructive action and consumes credits, which is valuable behavioral context beyond annotations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with core purpose, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter and output schema mentioned ('Returns the updated ScheduledJobListing'), the description is complete. No missing critical information for a simple enable action.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter 'jobId' with schema covering 100% of description. Description does not add extra meaning beyond schema's 'OctoPerf scheduled job id to enable'. Baseline 3 is appropriate as schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Enable', the resource 'paused ScheduledJob', and the effect 'the job will fire again at its next trigger time'. It distinguishes from sibling 'disable_scheduled_job' implicitly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (enable a paused job) and includes a warning about credit consumption and need for user confirmation. Does not mention alternatives explicitly, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

export_bench_report_pdfExport bench report PDFA
Read-only
Inspect

[Analysis] Submit an async task to render an OctoPerf benchReport as a PDF (headless Playwright print). Reuses the report's persisted ExportReportConfig (orientation, cover page, page format, table row counts, scale) — same settings the UI applies when the user clicks Export. The PDF is attached to the report's first benchResult under a sanitized filename derived from the report name. The tool returns the taskId immediately — poll get_task_result until it settles, then locate the generated PDF with list_bench_result_files and mint a presigned URL with download_bench_result_file.

ParametersJSON Schema
NameRequiredDescriptionDefault
benchReportIdYesOctoPerf benchReport id to render as a PDF.

Output Schema

ParametersJSON Schema
NameRequiredDescription
taskIdNo
benchReportIdNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds valuable behavioral context: it is asynchronous (returns taskId immediately), reuses persisted settings, and derives a sanitized filename. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph containing all key information: async nature, reused config, return value, and follow-up steps. It is efficient with no redundant sentences, though a slight structuring (e.g., bullet points) could improve readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the asynchronous workflow, configuration reuse, file naming, and required follow-up actions. It does not mention error handling or rate limits, but given the presence of an output schema (which likely covers return values) and the tool's moderate complexity, it is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the single parameter 'benchReportId' ('OctoPerf benchReport id to render as a PDF'). The tool description mentions the report's ExportReportConfig but does not add additional semantics or constraints for the parameter beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Submit an async task to render an OctoPerf benchReport as a PDF (headless Playwright print).' The verb 'submit' and resource 'benchReport as PDF' are specific, and the mechanism 'asynchronous' distinguishes it from sibling tools that return data directly or perform other operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear step-by-step workflow: returns taskId → poll get_task_result → list_bench_result_files → download_bench_result_file. It explains the tool reuses existing ExportReportConfig, so no extra configuration needed. However, it does not explicitly state when not to use this tool or mention alternative tools for similar purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fetch_bench_error_httpFetch bench error HTTPA
Read-only
Inspect

[Analysis] Download the HTTP request AND response that produced one specific BenchError — given the benchResultId, the actionId of the failing sampler, and the error's timestamp (epoch-ms). Use after get_report_errors returned the per-sample error list so the LLM can drill into the actual payload that failed (full headers, body, status code, timings). Returns (request, response) together in a single call. The actionId is base64-url-encoded on the wire; pass it verbatim as it appears on the BenchError — the tool handles the encoding.

ParametersJSON Schema
NameRequiredDescriptionDefault
actionIdYesAction id of the failing sampler (verbatim from `BenchError.actionId`).
timestampYesFailed sample timestamp (epoch milliseconds) from `BenchError.timestamp`.
benchResultIdYesOctoPerf benchResult id the error belongs to.

Output Schema

ParametersJSON Schema
NameRequiredDescription
requestNo
responseNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, establishing safe read. Description adds that it returns (request, response) together and that actionId is base64-url-encoded but the tool handles encoding, which is useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, front-loading key info with '[Analysis]'. Every sentence contributes value, though it could be slightly more concise. Still effectively communicates without unnecessary fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description appropriately omits return value details. It covers prerequisites, encoding, and the nature of the response, making it fully complete for a focused drill-down tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds meaning by specifying actionId is verbatim from BenchError.actionId and timestamp is in epoch milliseconds from BenchError.timestamp, plus clarifying encoding handling, which provides added value over the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Download' and the resource 'HTTP request AND response' for one specific BenchError. It distinguishes itself from siblings like 'get_report_errors' by specifying it retrieves the full payload of a single error, not a list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use after get_report_errors returned the per-sample error list' and mentions the goal of drilling into actual payload. Does not exclude cases or mention alternatives, but provides clear context for when to invoke this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fetch_validation_http_bodyFetch validation HTTP bodyA
Read-only
Inspect

[Design] Fetch ONE of the four HTTP entities involved in a failed OctoPerf Virtual User validation run — UNTRUNCATED (full headers + body + timing). Use when the summary returned by get_validation_failure_detail is truncated past 8 KB and you need the complete payload. Pick kind ∈ {RECORDED_REQUEST, RECORDED_RESPONSE, VALIDATION_REQUEST, VALIDATION_RESPONSE}. Only the matching field (request for *_REQUEST, response for *_RESPONSE) is populated in the result; the other is empty.

ParametersJSON Schema
NameRequiredDescriptionDefault
kindYesWhich entity to fetch — RECORDED_REQUEST / RECORDED_RESPONSE / VALIDATION_REQUEST / VALIDATION_RESPONSE.
actionIdYesAction id returned by `get_virtual_user_validation_index`.
projectIdYesOctoPerf project id the Virtual User belongs to.
timestampYesFailed run timestamp (epoch ms) returned by `get_virtual_user_validation_index`. Ignored for RECORDED_* kinds since the baseline is the same across runs.
virtualUserIdYesOctoPerf Virtual User id.

Output Schema

ParametersJSON Schema
NameRequiredDescription
kindNo
requestNo
responseNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the agent knows it's safe. The description adds value by detailing the untruncated nature, the conditional population of fields, and the timing info. It also notes that timestamp is ignored for RECORDED_* kinds, a behavioral detail not fully captured in the schema. This goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured paragraph. It front-loads the core action ('Fetch ONE of the four HTTP entities... UNTRUNCATED') and then adds usage context and parameter guidance. Every sentence adds value, no repetition or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 required parameters, an output schema (exists but not shown), and is a read-only fetch operation, the description adequately covers its purpose, usage conditions, parameter selection, and response shape (only one of two fields populated). No obvious gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (all 5 parameters have descriptions). The main description does not add significantly new parameter-level semantics beyond what the schema already provides (e.g., the `kind` enum values and `timestamp` behavior are both already in the schema). The description's added value is contextual (when to use), not parameter-specific, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Fetch'), specific resource ('ONE of the four HTTP entities involved in a failed OctoPerf Virtual User validation run'), and specific behavior ('UNTRUNCATED'). It distinguishes itself from the sibling tool `get_validation_failure_detail` by noting the truncation threshold, making it easy for an AI agent to select the correct tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('when the summary returned by `get_validation_failure_detail` is truncated past 8 KB and you need the complete payload'), provides an alternative (`get_validation_failure_detail`), and guides on parameter selection (`Pick kind ∈ {RECORDED_REQUEST, RECORDED_RESPONSE, VALIDATION_REQUEST, VALIDATION_RESPONSE}`). This is complete usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_bench_reportGet bench reportA
Read-only
Inspect

[Analysis] Download an OctoPerf bench report in full — id, projectId, name, description, benchResultIds, tags, timestamps, the polymorphic items list (every widget of the report: SummaryReportItem, StatisticTableReportItem, StatisticTreeReportItem, TopReportItem, LineChartReportItem, PercentilesChartReportItem, BarChartReportItem, PieChartReportItem, StackedChartReportItem, AreaRangeChartReportItem, ErrorsReportItem, TextReportItem, SynopsisReportItem, SLAThresholdReportItem, InsightsReportItem, …) and the configs set (per-benchResult colour/threshold settings). Each item carries its @type discriminator + the metric ids / filters / dimensions / thresholds the LLM needs to interpret the widget. For a one-line overview prefer list_bench_reports_by_project. Required upstream of every get_report_*_values tool — pick the itemId from the returned items list.

ParametersJSON Schema
NameRequiredDescriptionDefault
reportIdYesOctoPerf bench report id whose content to fetch.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
nameNo
tagsNo
itemsNo
userIdNo
configsNo
createdNoTimestamp as epoch milliseconds.
projectIdNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
benchResultIdsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint true and destructiveHint false, so the description doesn't need to repeat safety traits. However, it adds behavioral context by detailing the return payload (polymorphic items, configs, discriminator), helping the agent understand the data structure and how to interpret widgets.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single paragraph that efficiently packs essential information with no wasted words. It front-loads the main purpose and quickly dives into details. Could be slightly more structured (e.g., bullet list for items), but overall concise for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description is comprehensive despite existence of an output schema. It details the polymorphic items list, configs, and '@type' discriminator, providing necessary context for the agent to interpret the returned data. It also explains the tool's role as a prerequisite for get_report_*_values tools, ensuring the agent understands the workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema provides full coverage for the only parameter 'reportId' with description 'OctoPerf bench report id whose content to fetch.' The description does not add additional meaning beyond this, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description starts with 'Download an OctoPerf bench report in full', clearly stating the action and resource. It details the complex structure including polymorphic items and configs, and distinguishes from sibling 'list_bench_reports_by_project' by noting it provides a one-line overview, not full details.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly guides when to use this tool vs alternatives: 'For a one-line overview prefer list_bench_reports_by_project'. Also states it is 'Required upstream of every get_report_*_values tool', establishing a prerequisite relationship and informing the agent of the correct workflow.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_bench_resultGet bench resultA
Read-only
Inspect

[Runtime] Get an OctoPerf BenchResult in full — id, userId, batchId, scenarioId, designProjectId, resultProjectId, mode (STANDARD / JIRA / MAVEN), regions, sampling interval, state (DockerBatchState — CREATED / PENDING / SCALING / PREPARING / INITIALIZING / RUNNING / FINISHED / ABORTED / ERROR; FINISHED/ABORTED/ERROR are terminal) and timestamps. This is the canonical 'result status' lookup. For progress [0.0, 1.0] of a running test prefer get_bench_status. For aggregated metrics fetch the report with get_bench_report and dispatch to the matching get_report_*_values tool.

ParametersJSON Schema
NameRequiredDescriptionDefault
benchResultIdYesOctoPerf benchResult id to read.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
modeNo
stateNo
userIdNo
batchIdNo
createdNoTimestamp as epoch milliseconds.
regionsNo
samplingNo
scenarioIdNo
lastModifiedNoTimestamp as epoch milliseconds.
designProjectIdNo
resultProjectIdNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds valuable behavioral context by listing the fields returned, including state and timestamps, and indicating terminal states. No contradictions or missing safety information.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is succinct, with three sentences that each serve a purpose: stating the core functionality, emphasizing its role, and providing alternatives. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single parameter and the presence of an output schema, the description covers all necessary information: what it does, what it returns, and when to use alternatives. It is complete for an agent to make an informed selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There is only one parameter, `benchResultId`, and the input schema already includes a description for it. With 100% schema coverage, the description does not add further parameter details, meeting the baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the verb 'Get' and the resource 'OctoPerf BenchResult' with a detailed list of returned fields. It distinguishes itself from sibling tools like `get_bench_status` and `get_bench_report`, ensuring the agent understands its unique role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides guidance on when to use this tool: 'canonical result status lookup'. It also directs to alternative tools for progress (`get_bench_status`) and aggregated metrics (`get_bench_report`), giving clear when-to and when-not-to instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_bench_statusGet bench statusA
Read-only
Inspect

[Runtime] Get the progress of a running OctoPerf load test, as a double in [0.0, 1.0]. 1.0 means the test has finished.

ParametersJSON Schema
NameRequiredDescriptionDefault
benchResultIdYesOctoPerf benchResult id whose progress to read.

Output Schema

ParametersJSON Schema
NameRequiredDescription
progressNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes return format (double) and termination condition (1.0 means finished), adding context beyond the readOnlyHint annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: one sentence with key info and a clarifying example, front-loaded with '[Runtime]'.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, parameter, and return value meaning. Lacks error handling or prerequisites, but sufficient for a simple read operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already documents the single parameter fully; description restates it without adding new meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool retrieves progress of a running OctoPerf load test as a double in [0.0, 1.0], distinguishing it from other get_* siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The '[Runtime]' prefix implies use during a running test, but no explicit when-to-use or alternatives are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_area_range_valuesGet report area range valuesA
Read-only
Inspect

[Analysis] Fetch the comparison points of an AreaRangeChartReportItem widget — a single metric measured on the current bench (curve) against a reference baseline (reference), plus the rmse (root mean square deviation) between them. Returns an AreaRangeResult with curve: List<GraphPoint>, reference: List<GraphPoint>, rmse: Double. The most LLM-useful chart type for regression detection (low rmse = close to baseline, high rmse = deviation). Reject non-AreaRange items.

ParametersJSON Schema
NameRequiredDescriptionDefault
itemIdYesId of the AreaRangeChartReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
rmseNo
curveNo
referenceNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds context about the returned data structure (curve, reference, rmse) and the rejection behavior for non-AreaRange items, going beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with '[Analysis]', and every sentence adds value (purpose, return structure, usage context). No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (implied by described return structure), annotations covering safety, and the tool's focused purpose, the description provides complete context including the rejection behavior and the utility for regression detection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for both parameters (reportId, itemId). The description does not add additional parameter details beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Fetch') and resource ('comparison points of an AreaRangeChartReportItem widget'), and explicitly names the chart type, distinguishing it from sibling report tools that fetch other chart types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Reject non-AreaRange items', which is a clear when-not-to-use instruction. It also highlights the tool's utility for regression detection, but does not directly compare to other get_report_* siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_errorsGet report errorsA
Read-only
Inspect

[Analysis] Download every BenchError captured for an ErrorsReportItem widget — one entry per failed sample, each with benchResultId, regionId, injectorId, virtualUserId, actionId, timestamp (epoch-ms), errorMessage, connectTime / latency / elapsedTime and the list of BenchAssertionResult evaluated on the sample. Scoped by the item's filters (regionId / virtualUserId / actionId / …) and benchResultId; fromMs/toMs are optional epoch-millis bounds, omit to scan the whole test. Reject non-Errors items. Heavy on a high-error run — paginate the LLM's analysis if the list is long.

ParametersJSON Schema
NameRequiredDescriptionDefault
toMsNoUpper bound (epoch milliseconds). Omit to scan until test end.
fromMsNoLower bound (epoch milliseconds). Omit to scan from test start.
itemIdYesId of the ErrorsReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
errorsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true; description adds behavioral traits like 'Reject non-Errors items' and heavy load guidance. Does not cover authorization or error handling, but sufficient given annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph, front-loaded with purpose. Lists many fields that could be in output schema, but each sentence adds information. Slightly verbose but effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, scoping, parameters, behavior for non-Errors items, and performance advice. With output schema present, the field listing is redundant but not detrimental. Very complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds value by explaining scoping by item filters (regionId, etc.) and benchResultId, and clarifies fromMs/toMs meaning. Improves understanding beyond raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool downloads every BenchError for an ErrorsReportItem widget, specifying the verb (download) and resource (error entries per failed sample). It distinguishes from sibling report tools by focusing on error items.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context: scoped by item's filters and benchResultId, optional time bounds, rejection of non-Errors items, and heavy load warning. Lacks explicit when-not or named alternatives, but implied by context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_insightsGet report insightsA
Read-only
Inspect

[Analysis] Evaluate an InsightsReportItem widget — runs OctoPerf's built-in heuristics (apdex, error rate, response-time drift, …) against the item's benchResultId and thresholds and returns the firing insights. Returns ReportInsights whose insights are ordered by level then id; each Insight carries an InsightId, an InsightLevel (severity), the computed value, plus a more widget (and an optional inspect widget) the LLM can render or drill into via the matching get_report_*_values tool. fromMs/toMs are optional epoch-millis bounds; omit to evaluate over the whole test. Reject non-Insights items.

ParametersJSON Schema
NameRequiredDescriptionDefault
toMsNoUpper bound (epoch milliseconds). Omit to scan until test end.
fromMsNoLower bound (epoch milliseconds). Omit to scan from test start.
itemIdYesId of the InsightsReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
insightsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false; description adds significant behavioral context (heuristics run, output structure with ordered insights, insight components, drill-down capability). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, well-structured sentence front-loaded with purpose, then details output and usage, with a clear reject instruction. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given annotations and output schema existence, the description covers purpose, usage, behavior, and parameter semantics well. Minor missing info on error handling, but overall highly informative for this report tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so description adds only marginal value (e.g., reiterating optionality of fromMs/toMs). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (evaluate), resource (InsightsReportItem widget), and outcome (returns firing insights). It distinguishes from sibling tools like other get_report_* tools by specifying it is specifically for InsightsReportItem and mentions drill-down via matching tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: 'Reject non-Insights items' clarifies when not to use. Also suggests alternatives for drill-down ('use the matching get_report_*_values tools'). Optional parameters explained.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_line_chart_valuesGet report line chart valuesA
Read-only
Inspect

[Analysis] Fetch the time-series points of a LineChartReportItem or PercentilesChartReportItem widget — one series per metric/dimension configured on the item. Returns LineChartValues whose series each hold a list of GraphPoint ((x=epoch-ms, y=value)). Time-series can be hundreds of points; prefer get_report_table_values for action-level summaries when you don't need the temporal shape. fromMs/toMs are optional epoch-millis bounds; omit to scan the whole test. Reject other chart types (use get_report_stacked_chart_values or get_report_area_range_values).

ParametersJSON Schema
NameRequiredDescriptionDefault
toMsNoUpper bound (epoch milliseconds). Defaults to now.
fromMsNoLower bound (epoch milliseconds). Defaults to 0 (test start).
itemIdYesId of the LineChartReportItem or PercentilesChartReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
seriesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and destructiveHint, so the description adds value by describing the return format (series of GraphPoint with x/y) and noting that time-series can be hundreds of points, which implies potential data volume.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise paragraphs, front-loaded with the core purpose, followed by usage guidance and parameter notes. Every sentence adds value with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and siblings are listed, the description covers purpose, usage, parameter hints, and return format. It lacks error/edge case details but is sufficient for a read-only tool with well-documented schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds minimal extra: it restates optionality and bounds for fromMs/toMs but does not add significant new meaning beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Fetch') and resource ('time-series points of a LineChartReportItem or PercentilesChartReportItem widget'), and distinguishes from siblings by specifying the chart types and data format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use alternatives ('prefer get_report_table_values for action-level summaries when you don't need the temporal shape') and which tools to use for other chart types ('Reject other chart types (use get_report_stacked_chart_values or get_report_area_range_values)').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_pie_valuesGet report pie valuesA
Read-only
Inspect

[Analysis] Fetch the slice values of a PieChartReportItem widget — a distribution over a categorical dimension (e.g. errors per type, response codes per bucket). Returns PieValues whose series (typically one per benchResult compared in the report) each hold a list of PieSlice (label + count). Reject non-Pie items.

ParametersJSON Schema
NameRequiredDescriptionDefault
itemIdYesId of the PieChartReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
seriesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-readOnly and non-destructive behavior. The description adds context about returning PieValues with series and PieSlice, and the rejection of non-Pie items, which is additional behavioral transparency beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no fluff. The first sentence states purpose with examples, and the second describes output and error condition. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read tool with full annotations and schema, the description is complete. It explains what it does, what it returns, and when it fails. No additional details are needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters. The description does not add extra semantics to the parameters, so it meets the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches slice values of a PieChartReportItem widget, giving examples and specifying the output structure. It distinguishes from siblings by noting it rejects non-Pie items, making it specific to pie chart data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by stating the tool fetches pie chart data and rejects non-Pie items, but it does not explicitly compare with alternative report tools like get_report_line_chart_values or mention when not to use it beyond non-Pie items.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_stacked_chart_valuesGet report stacked chart valuesA
Read-only
Inspect

[Analysis] Fetch the stacked time-series points of a StackedChartReportItem widget — one categorical dimension stacked over time (e.g. hits/s split by response status, throughput split by region). Returns StackedChartValues whose points each have x=epoch-ms and a values map of seriesLabel → value. Reject non-Stacked items.

ParametersJSON Schema
NameRequiredDescriptionDefault
itemIdYesId of the StackedChartReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
pointsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already confirm readOnlyHint=true and destructiveHint=false. The description adds behavioral details: it returns StackedChartValues with point structure and rejects non-Stacked items (an error condition). This is valuable context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences with no filler. It is front-loaded with '[Analysis]' to quickly signal the domain. Every sentence adds essential information about purpose, examples, return type, and constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists, the description adequately explains the return format (points with x and values map) and constraints (reject non-Stacked). It does not miss any critical information needed for correct invocation and interpretation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so both required parameters (reportId, itemId) are described in the input schema. The description does not add any additional parameter semantics or usage details beyond what the schema provides, so the baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches stacked chart values for StackedChartReportItem widgets, with examples like hits/s split by response status. It explicitly distinguishes itself by rejecting non-Stacked items, making the purpose unmistakable among the many get_report_* sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context (only for stacked chart items) and warns against non-Stacked items. It does not explicitly name alternative tools for other chart types, but the sibling list includes many get_report_* variants, making the differentiation clear enough for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_summary_valuesGet report summary valuesA
Read-only
Inspect

[Analysis] Fetch the aggregated metric values of a SummaryReportItem or BarChartReportItem widget — both are MultiMetricReportItem shapes (a list of ReportItemMetric, each with a MetricId + filters + benchResultId). Returns SummaryValues whose values are aligned with the item's metrics order (zip by index against item.metrics[i].id to label). fromMs/toMs are optional epoch-millis bounds; omit to scan the whole test. Reject other item types — use get_report_table_values for tables, get_report_top_values for top-N, etc.

ParametersJSON Schema
NameRequiredDescriptionDefault
toMsNoUpper bound (epoch milliseconds). Omit to scan until test end.
fromMsNoLower bound (epoch milliseconds). Omit to scan from test start.
itemIdYesId of the SummaryReportItem or BarChartReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
valuesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already show readOnlyHint=true. Description adds context: return structure (SummaryValues aligned with metrics order, zip by index with item.metrics[i].id), optional time bounds. Does not contradict annotations. Could mention if any rate limits or auth, but fine.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph, front-loaded with purpose. Clear and concise, though could be slightly more structured (e.g., bullet points). No wasted sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete given complexity: explains parameter usage, item type constraints, output structure (aligned with metrics). Output schema exists, description adds complementary context. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds meaning: explains fromMs/toMs optional and their effect, and that itemId must be of specific types. Exceeds baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it fetches aggregated metric values for SummaryReportItem or BarChartReportItem (both MultiMetricReportItem shapes). It explicitly distinguishes from sibling tools by rejecting other item types and directing to alternatives like get_report_table_values or get_report_top_values.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use: only for summary/bar chart items. Tells when to omit fromMs/toMs (to scan whole test). Offers clear alternatives for other item types, making usage guidance complete.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_table_valuesGet report table valuesA
Read-only
Inspect

[Analysis] Fetch the aggregated metric values for a StatisticTableReportItem widget — the flat table rows you see in the OctoPerf web UI. Returns TableValues whose entries each bind an actionId (or aggregated container id like <id>.resources) to its values (one value per configured metric column: avgRT, p95RT, errorRate, throughput, hits, …, in the same order as the source widget's metrics). For per-VU breakdown (hybrid scenarios with multiple UserProfiles) use get_report_tree_values against the equivalent StatisticTreeReportItem.

ParametersJSON Schema
NameRequiredDescriptionDefault
itemIdYesId of the StatisticTableReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
entriesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and non-destructive behavior. The description adds value by detailing the return structure (TableValues with entries binding actionId to values in metric order) and the relationship to the source widget's metrics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey purpose, behavior, and alternative tool. Front-loaded with the core action and resource, no superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with output schema present, the description covers purpose, usage guidance, and behavioral details adequately. It is complete for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. The description does not add significant additional meaning beyond the schema, maintaining the baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches aggregated metric values for a StatisticTableReportItem widget, specifying the exact resource and action. It distinguishes from sibling get_report_tree_values by noting that tool is for per-VU breakdown.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool (flat table rows) and when to use the alternative get_report_tree_values for per-VU breakdown, providing clear usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_textual_monitorsGet report textual monitorsA
Read-only
Inspect

[Analysis] Fetch every TextualCounterValue captured under a TextualMonitorReportItem widget — string-valued samples emitted by custom monitors (versions of upstream services, feature flag state at fire time, free-form tags, etc.). Returns TextualMonitorValues with one monitors entry per sample, each with monitorConnectionId, counter name, value, and timestamp. Reject other item types.

ParametersJSON Schema
NameRequiredDescriptionDefault
itemIdYesId of the TextualMonitorReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
monitorsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and non-destructive behavior. The description adds details about fetching data under a specific widget and rejecting other types, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two-sentence description is concise and front-loaded. Every sentence provides value: first states purpose and return data, second clarifies rejection behavior.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations covering safety, the description sufficiently explains the tool's behavior and return values for a read-only data retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add significant new parameter semantics; it mentions `TextualMonitorReportItem` which relates to `itemId`, but the schema already describes both parameters adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches `TextualCounterValue` under a `TextualMonitorReportItem` widget, specifies the return fields, and mentions rejection of other item types. It distinguishes itself from sibling report tools by focusing on this specific data type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the tool's purpose well, implying when to use it (for textual monitor data). It lacks explicit exclusions or alternatives, but the sibling list provides differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_threshold_alarmsGet report threshold alarmsA
Read-only
Inspect

[Analysis] Fetch every ThresholdAlarm raised against a ThresholdAlarmReportItem widget — ThresholdAlarms with one alarms entry per breach, each with id, monitorConnectionId, severity (INFO / WARN / ERROR / FATAL), thresholdName, thresholdValue, observed value, and timestamp. Use when a ThresholdAlarmReportItem fired during a run (typically surfaced in get_report_insights under the THRESHOLD_ALARM id) and you want to enumerate the actual breaches. fromMs/toMs are optional epoch-millis bounds; omit to scan the whole test.

ParametersJSON Schema
NameRequiredDescriptionDefault
toMsNoUpper bound (epoch ms). Omit to scan until test end.
fromMsNoLower bound (epoch ms). Omit to scan from test start.
itemIdYesId of the ThresholdAlarmReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
alarmsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds useful behavioral details: it fetches every alarm, lists return fields, and notes optional epoch-millis bounds. No contradiction. Slightly lacks mention of pagination or limits, but for a read-only tool this is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose and return fields, second gives usage context. Front-loaded with key info, no fluff or redundancy. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (per context signals), the description provides sufficient context: it details return fields, optional parameters, and the trigger condition (alarm from get_report_insights). Covers all necessary aspects for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds value by explaining that fromMs/toMs are optional epoch-millis bounds and that omitting them scans the whole test, which is beyond the schema's literal 'Upper bound' and 'Lower bound' phrasing.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches ThresholdAlarms for a ThresholdAlarmReportItem, listing specific fields (id, monitorConnectionId, severity, etc.). It distinguishes from siblings by referencing get_report_insights which surfaces the alarm id, and uses the verb 'fetch' with the resource 'ThresholdAlarm'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit context: use when a ThresholdAlarmReportItem fired during a run, typically surfaced in get_report_insights under the THRESHOLD_ALARM id. Also explains optional bounds (fromMs/toMs) can be omitted. Lacks explicit when-not-to-use or alternative tools, but context is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_top_valuesGet report top valuesA
Read-only
Inspect

[Analysis] Fetch the values of a TopReportItem widget — the top-N actions by some criterion (slowest, most error-prone, highest throughput, …) over a time window. Returns a TopResult with top (action id → score) and curves (action id → time-series leading to the score). fromMs/toMs are optional epoch-millis bounds; omit to scan the whole test. Reject non-Top items.

ParametersJSON Schema
NameRequiredDescriptionDefault
toMsNoUpper bound (epoch milliseconds). Defaults to now.
fromMsNoLower bound (epoch milliseconds). Defaults to 0 (test start).
itemIdYesId of the TopReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
topNo
curvesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only behavior. The description adds the return structure (TopResult with top and curves) and the validation behavior (reject non-Top items). This is sufficient context, though it could mention error handling for missing reports.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise—two sentences with a front-loaded verb. No redundant words. Every sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the parameter count, required fields, and existence of an output schema, the description completely covers what the tool does, its return structure, and its validation. It is sufficient for correct selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning beyond schema: it clarifies that fromMs/toMs are optional and that omitting them scans the whole test. It also reinforces the itemId constraint. This adds value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Fetch'), the resource ('TopReportItem widget'), and the purpose (top-N actions by criterion). It distinguishes from sibling tools by specifying the widget type, which is unique among get_report_* tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Reject non-Top items,' guiding the agent to use this tool only for TopReportItem widgets. It also explains optional time bounds. However, it does not explicitly mention when not to use other similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_report_tree_valuesGet report tree valuesA
Read-only
Inspect

[Analysis] Fetch the aggregated metric values for a StatisticTreeReportItem widget — the per-VU-then-per-action tree rows you see in the OctoPerf web UI. Returns TreeValues whose entries each bind a (virtualUserId, actionId) pair to its values (one value per configured metric column: avgRT, p95RT, errorRate, throughput, hits, …, in the same order as the source widget's metrics). Key tool for hybrid scenarios with multiple UserProfiles (e.g. JMeter + Playwright) — group the rows by virtualUserId to split per-VU stats. For the flat (no-VU) variant use get_report_table_values against the equivalent StatisticTableReportItem.

ParametersJSON Schema
NameRequiredDescriptionDefault
itemIdYesId of the StatisticTreeReportItem within the report.
reportIdYesOctoPerf bench report id holding the item.

Output Schema

ParametersJSON Schema
NameRequiredDescription
entriesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds context by labeling it an analysis tool and detailing the return structure (TreeValues with entries binding virtualUserId, actionId, and values). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but well-structured, starting with the purpose and then adding details. It is informative without being overly long, though minor trimming could improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description explains the return value structure (TreeValues, entries, binding of virtualUserId/actionId to values) and notes the order of metric columns. This, combined with schema and annotations, makes the tool definition complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for both parameters (itemId, reportId) with adequate schema descriptions. The tool description does not add further parameter semantics beyond what the schema provides, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches aggregated metric values for a StatisticTreeReportItem widget, specifying the tree structure (per-VU-per-action). It distinguishes from the sibling tool get_report_table_values, which serves the flat variant.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly identifies this as a key tool for hybrid scenarios with multiple UserProfiles and advises grouping rows by virtualUserId. It also mentions the alternative tool get_report_table_values for flat (no-VU) cases, providing clear when-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_scenarioGet scenarioA
Read-only
Inspect

[Runtime] Get an OctoPerf Scenario in full — id, projectId, name, description, tags, timestamps, mode (STANDARD / JIRA / MAVEN) and the list of userProfiles that bind each VirtualUser to a load generator. Each UserProfile carries its virtualUserId, providerId, location, memory settings, optional iterations cap, a polymorphic load (ConstantLoad / RampLoad / PeakLoad / …) and a polymorphic engine (JmeterUserProfileEngine / PlaywrightUserProfileEngine / SeleniumUserProfileEngine) each with their own discriminator and engine-specific settings. Use this when you actually need to reason about the scenario's load shape or build a JSON Patch for patch_scenario; for a one-line overview prefer list_scenarios_by_project.

ParametersJSON Schema
NameRequiredDescriptionDefault
scenarioIdYesOctoPerf scenario id whose configuration to fetch.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
modeNo
nameNo
tagsNo
userIdNo
createdNoTimestamp as epoch milliseconds.
projectIdNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
userProfilesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description reinforces read-only behavior. It adds substantial value by detailing the complex return structure (polymorphic load and engine types) and the fact that it returns the full configuration, which is beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but efficiently front-loaded with the main action ('Get an OctoPerf Scenario in full') and then systematically lists the returned fields. No extraneous words, perfectly sized for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description is still remarkably complete, covering all major nested structures and polymorphic types. It even hints at discriminators, which is more than sufficient for an agent to understand the tool's output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage and only one parameter, the description does not add new meaning to scenarioId beyond what the schema already says. The baseline of 3 is appropriate because the description focuses on the output rather than parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it retrieves a full OctoPerf Scenario with all key fields (id, projectId, name, tags, mode, userProfiles with nested details). It also distinguishes itself from list_scenarios_by_project, clearly defining its unique purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: use for reasoning about load shape or building JSON patches for patch_scenario, and prefer list_scenarios_by_project for a one-line overview. This contrasts with siblings and gives clear when-to-use and when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_scenario_matching_plansGet scenario matching plansA
Read-only
Inspect

[Runtime] PRE-FLIGHT check: list the subscriptions / plans on the account that can actually host the given scenario. A plan appearing in the result fits the scenario's caps (concurrent users, real-browser users, profile count, duration) and the scenario is launchable as-is. For each plan returns plan name, remaining test count, expiration date, concurrency caps (JMeter + real-browser), max duration, and parallelRunsSupported — how many simultaneous instances of THIS scenario the plan could host (typically 1; only relevant when maxTestsPerRun > 1). Empty list = no plan covers the scenario; the user must trim the load profile (drop UserProfiles, lower plateau VUs), buy capacity, or renew — call list_active_subscriptions to see which cap is binding. Call this BEFORE run_scenario to avoid burning credits on a run the plan can't sustain — run_scenario will error out at startup if it can't be sized, but the cause is opaque without this preflight.

ParametersJSON Schema
NameRequiredDescriptionDefault
scenarioIdYesOctoPerf scenario id to check against the account's subscriptions.

Output Schema

ParametersJSON Schema
NameRequiredDescription
plansNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations show readOnlyHint=true and destructiveHint=false, and the description confirms it is a read-only pre-flight check. It adds behavioral traits like checking caps and returning plan details, and explains that run_scenario will error opaquely without this preflight. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense and informative, with key purpose front-loaded. Every sentence adds value, though it could be slightly shorter. It is well-structured for an agent to quickly grasp the tool's role.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the existence of an output schema, the description is complete. It explains the return values, when to use, consequences of not using it, and references sibling tools, providing all necessary context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter scenarioId with schema description 'OctoPerf scenario id to check against the account's subscriptions.' Schema coverage is 100%, so baseline 3. The description reinforces the same meaning but adds little beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a pre-flight check to list subscriptions/plans that can host a given scenario, specifying the constraints (concurrent users, real-browser users, profile count, duration) and what is returned. It distinguishes itself from siblings like run_scenario and list_active_subscriptions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states to call this BEFORE run_scenario to avoid burning credits, and provides guidance on what to do if the result is empty (trim load, buy capacity, renew) and suggests calling list_active_subscriptions to see which cap is binding.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_task_resultGet task resultA
Read-only
Inspect

[Tasks] Poll the lifecycle of an async OctoPerf task by id. Returns status=PENDING while the task is still running (poll again after 2-3 seconds), status=SUCCESS once it has settled successfully, or status=FAILED with the backend stack trace in message if it has failed. Use this after any tool that submits an async task (e.g. apply_correlations_to_virtual_user).

ParametersJSON Schema
NameRequiredDescriptionDefault
taskIdYesOctoPerf task id returned by an async tool such as `apply_correlations_to_virtual_user`.

Output Schema

ParametersJSON Schema
NameRequiredDescription
statusNo
taskIdNo
messageNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds detailed behavioral information about the three possible statuses (PENDING, SUCCESS, FAILED) and what the message field contains. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with the first sentence immediately stating the purpose. It then provides necessary details on status and usage in a well-structured paragraph without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description covers the return values (status, message) and the polling logic. It fully addresses the context needed for an agent to use this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with a description for taskId. The description adds context about where the taskId comes from (returned by async tools) and the polling behavior, which enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Poll the lifecycle of an async OctoPerf task by id' with specific verbs and resource. It distinguishes itself from sibling tools by focusing on polling async tasks, which is unique among the listed sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this after any tool that submits an async task' and gives an example ('apply_correlations_to_virtual_user'). Also advises polling interval of 2-3 seconds when status is PENDING.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_validation_failure_detailGet validation failure detailA
Read-only
Inspect

[Design] Fetch the four HTTP entities involved in one failed validation run of an OctoPerf Virtual User action: the recorded request/response (the baseline captured at VU creation) and the validation request/response (what was sent and received during the latest check). Use the actionId and timestamp returned by get_virtual_user_validation_index. Response bodies are truncated past 8 KB — call fetch_validation_http_body to get the full untruncated entity. Also returns virtualUserUrl, a deep-link opening the Virtual User page on the failed action in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
actionIdYesAction id returned by `get_virtual_user_validation_index`.
projectIdYesOctoPerf project id the Virtual User belongs to.
timestampYesFailed run timestamp (epoch ms) returned by `get_virtual_user_validation_index`.
virtualUserIdYesOctoPerf Virtual User id.

Output Schema

ParametersJSON Schema
NameRequiredDescription
virtualUserUrlNo
recordedRequestNo
recordedResponseNo
validationRequestNo
validationResponseNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and destructiveHint=false, so the read-only nature is known. The description adds behavioral context by stating response bodies are truncated past 8 KB and that it returns a deep-link URL (virtualUserUrl), which is not obvious from annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The first sentence defines the core purpose and outputs, the second adds important nuance about truncation and a related tool. Front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (not shown) and four required parameters, the description covers what is returned (four entities, virtualUserUrl) and key behavioral details. It could potentially mention response format, but the output schema likely handles that, so adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters are documented. The description adds value by explaining that actionId and timestamp come from get_virtual_user_validation_index, enriching the semantic meaning beyond the schema's generic descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches the four HTTP entities involved in a failed validation run, specifying recorded and validation request/response. It references the source of actionId and timestamp and differentiates from siblings like fetch_validation_http_body by noting truncation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use (after a failed validation run, using actionId/timestamp from get_virtual_user_validation_index) and mentions an alternative for full bodies (fetch_validation_http_body). It does not explicitly state when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_virtual_userGet virtual userA
Read-only
Inspect

[Design] Get an OctoPerf Virtual User in full — id, projectId, name, description, type (JMETER / WEB_DRIVER / PLAYWRIGHT / FRAGMENTS), tags, timestamps and the recursive polymorphic children action tree. Each tree node carries its backend discriminator (HttpRequestAction, ContainerAction, IfContainerAction, RegexpVariableExtractor, …) plus every native field of that subtype: HTTP method / path / headers / query parameters / postData for samplers, regex / matchGroup / template for extractors, etc. Heavy by design — use it when you actually need to reason about the VU's content or build a JSON Patch for patch_virtual_user; for a 10-line overview prefer list_virtual_users.

ParametersJSON Schema
NameRequiredDescriptionDefault
virtualUserIdYesOctoPerf Virtual User id whose action tree to fetch.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
nameNo
tagsNo
typeNo
userIdNo
createdNoTimestamp as epoch milliseconds.
childrenNo
projectIdNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the safety profile is clear. The description adds valuable context beyond annotations: it warns that the response is 'heavy by design', implying potential performance/resource impact, and describes the recursive nature of the children tree. This enriches the agent's understanding of the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that front-loads the purpose ('Get an OctoPerf Virtual User in full') then details. It includes necessary detail about the response structure and usage context. While informative, it is relatively dense and could be slightly more streamlined, but it remains effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool (polymorphic children, multiple subtypes), the description covers the response structure comprehensively. It also provides usage guidance and an alternative. An output schema exists, so return value documentation is handled externally. The description leaves no significant gaps for an agent to misuse the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (one parameter with clear description). The tool description does not add significantly beyond the schema; it merely restates that the parameter is a virtual user ID whose action tree is fetched. Baseline 3 is appropriate since the schema already handles documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'OctoPerf Virtual User in full', detailing the included fields (id, projectId, name, type, etc.) and the recursive children tree. It explicitly distinguishes from 'list_virtual_users' by noting that this tool is for full details while list provides a 10-line overview.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use this tool ('when you actually need to reason about the VU's content or build a JSON Patch for patch_virtual_user') and when not to ('for a 10-line overview prefer list_virtual_users'). Names an alternative sibling tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_virtual_user_validationGet virtual user validationA
Read-only
Inspect

[Runtime] Get the latest functional validation for an OctoPerf Virtual User. Returns the benchResultId, its state, and a finished flag (true once the bench reaches FINISHED, ABORTED or ERROR). Empty if no validation has ever been run for this VU.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id the Virtual User belongs to.
virtualUserIdYesOctoPerf Virtual User id whose validation to poll.

Output Schema

ParametersJSON Schema
NameRequiredDescription
statusNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the description adds value by disclosing the return fields (benchResultId, state, finished flag) and the behavior that it returns empty if no validation has been run. This goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences that cover purpose, return content, and edge case. No unnecessary words, and the key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description provides sufficient context: it names the essential return fields and explains the 'finished' flag condition. It also notes the 'empty' case and the '[Runtime]' context. Minor omission: the possible state values are not enumerated beyond the finished flag, but the output schema likely covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100% with descriptions for both parameters. The description adds no additional semantic meaning beyond what the schema already provides, so it meets the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with a clear verb 'Get' and specifies the resource 'latest functional validation for an OctoPerf Virtual User'. It distinguishes itself from sibling tools like 'validate_virtual_user' (which triggers validation) and 'get_validation_failure_detail' (details of failures) by focusing on polling the latest validation result.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide explicit guidance on when to use this tool versus alternatives. While the '[Runtime]' prefix hints at polling usage, there is no mention of prerequisites, typical workflow (e.g., after calling validate_virtual_user), or when to prefer this over similar tools like 'get_virtual_user_validation_index'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_virtual_user_validation_indexGet virtual user validation indexA
Read-only
Inspect

[Design] List every HTTP request action of an OctoPerf Virtual User that has at least one validation run. Each entry has the actionId, success/total counts, the timestamps of the successful runs and the timestamps of the failing runs. Returns an empty list when the VU has never been validated. Pair an actionId with a failed timestamp and get_validation_failure_detail to triage a regression, or with a successful timestamp and fetch_validation_http_body (kind=VALIDATION_RESPONSE) to read the body captured on a passing run.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id the Virtual User belongs to.
virtualUserIdYesOctoPerf Virtual User id whose validation index to inspect.

Output Schema

ParametersJSON Schema
NameRequiredDescription
outcomesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and destructiveHint=false. The description adds that it returns an empty list when the VU has never been validated and explains the data structure, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is structured and front-loaded with the main purpose. It is slightly lengthy but every sentence adds value, explaining use cases and pairing with other tools. Minor redundancy could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description fully explains the return values (actionId, success/total, timestamps) and the empty list case. It also provides context for how to use the results with other tools, making it complete for this tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description does not add significant meaning beyond what the schema already provides; it only mentions the parameters in context of ownership and purpose. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists HTTP request actions with validation runs, including specific fields. It distinguishes from siblings by mentioning pairing with get_validation_failure_detail and fetch_validation_http_body.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use (to get validation index) and what to do with results (pair with other tools). It also notes the empty list case. However, it does not explicitly say when not to use this tool over alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

import_har_virtual_userImport HAR virtual userAInspect

[Design] Mint a presigned URL to import a HAR (HTTP Archive) capture into an OctoPerf project as a new Virtual User. Returns the URL to POST the HAR file to directly (bypassing the MCP server for the bytes), valid for ~5 minutes. The single-use token is consumed on the first POST. The direct POST returns a raw VirtualUser JSON object with no UI deep-link — chain into describe_virtual_user with the returned id to obtain the compact listing and the url to the Virtual User page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id where the HAR will be imported.
resourcesNoHTML resources handling: KEEP_ALL or REMOVE.
thinktimeNoThinktime computation: THINKTIMES or DELAYS.
adBlockingNoAd blocking: ENABLED or DISABLED.
containersNoContainer algorithm: PAGE_REF or THINKTIME.
virtualUserIdNoExisting virtual user id to merge the import into; null creates a new VU.
maxThinktimeMsNoMaximum thinktime cap in milliseconds.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlNo
methodNo
expiresAtNoTimestamp as epoch milliseconds.
contentTypeNo
instructionsNo
fileFieldNameNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description reveals key behaviors beyond annotations: the presigned URL bypasses the MCP server for file transfer, the token is single-use and expires in ~5 minutes, and the direct POST returns a raw JSON. This adds valuable transparency about the tool's side effects and lifecycle.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with the primary purpose, and immediately followed by critical behavioral details. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and the tool is a preparatory step, the description covers the essential workflow: mint URL, POST file, chain to describe. It could mention error conditions or prerequisites, but overall it is sufficiently complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to explain parameters. It adds no additional meaning beyond the schema, but the baseline for high coverage is 3. The description focuses on usage context rather than individual parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the action: 'Mint a presigned URL to import a HAR capture into an OctoPerf project as a new Virtual User.' It clearly identifies the resource (HAR capture) and the outcome (new Virtual User), distinguishing it from sibling import tools for other formats.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use: for importing HAR files, and provides details on the presigned URL mechanism, token expiration (~5 minutes), and single-use consumption. It also suggests chaining into 'describe_virtual_user' to get the UI deep-link, offering clear next steps. It does not explicitly mention alternatives or when not to use, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

import_playwright_virtual_userImport Playwright virtual userAInspect

[Design] Mint a presigned URL to import a single Playwright test file (typically a .spec.ts) into an OctoPerf project as a new Virtual User. Returns the URL to POST the file to directly (bypassing the MCP server for the bytes), valid for ~5 minutes. The single-use token is consumed on the first POST. The direct POST returns a raw VirtualUser JSON object with no UI deep-link — chain into describe_virtual_user with the returned id to obtain the compact listing and the url to the Virtual User page in the OctoPerf web UI. Additional Playwright source files (helpers, fixtures, package.json) can be added afterwards with upload_project_file or patch_virtual_user tool calls.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id where the Playwright spec will be imported.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlNo
methodNo
expiresAtNoTimestamp as epoch milliseconds.
contentTypeNo
instructionsNo
fileFieldNameNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations: presigned URL, 5-minute validity, single-use token consumed on first POST, and that the direct POST bypasses the MCP server for bytes. Annotations already indicate not read-only (creation) and not destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph with clear, front-loaded purpose. Every sentence adds value (token details, chaining, additional files). Could be slightly more concise but is well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one simple required parameter and the presence of an output schema, the description is complete. It explains the output (raw VirtualUser JSON) and suggests next steps. No missing critical context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (projectId) with 100% schema description coverage. The description does not add additional meaning beyond what the schema already provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool mints a presigned URL to import a single Playwright test file into an OctoPerf project as a new Virtual User. It specifies the file type (.spec.ts) and distinguishes from sibling import tools like import_har_virtual_user or import_postman_virtual_user by name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the tool's purpose and provides post-usage guidance: chain into describe_virtual_user for the UI link, and use upload_project_file or patch_virtual_user for additional files. It doesn't explicitly exclude other import types, but sibling tool names imply the scope.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

import_postman_virtual_userImport Postman virtual userAInspect

[Design] Mint a presigned URL to import a Postman collection (v2.1 JSON) into an OctoPerf project as a new Virtual User. Returns the URL to POST the collection to directly (bypassing the MCP server for the bytes), valid for ~5 minutes. The single-use token is consumed on the first POST. The direct POST returns a raw VirtualUser JSON object with no UI deep-link — chain into describe_virtual_user with the returned id to obtain the compact listing and the url to the Virtual User page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id where the collection will be imported.
adBlockingNoAd blocking: ENABLED or DISABLED. Server defaults to ENABLED.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlNo
methodNo
expiresAtNoTimestamp as epoch milliseconds.
contentTypeNo
instructionsNo
fileFieldNameNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Fully explains the indirect upload mechanism (bypassing MCP server for bytes), token lifetime, single-use constraint, and raw response lacking UI deep-link. Annotations already indicate non-read-only, but description adds crucial behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single dense paragraph that front-loads the purpose and then explains the flow. Slightly verbose but all sentences add value. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity, the description covers the main purpose, input, indirect file upload, token validity, post-upload handling, and chaining instructions. Output schema exists, so return values are documented elsewhere.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers both parameters with descriptions; schema_description_coverage is 100%. Description does not add extra meaning beyond what schema already provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it mints a presigned URL to import a Postman collection as a new Virtual User, specifying the collection format (v2.1 JSON) and output. Differentiates from sibling import tools (e.g., import_har_virtual_user) by the source format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit post-import guidance: chain into describe_virtual_user with the returned id. Mentions token is single-use and valid for 5 minutes. Does not explicitly state when to use this vs alternatives, but the sibling list implies context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

import_urls_virtual_userImport URLs virtual userAInspect

[Design] Create a Virtual User from a list of URLs (each entry is a {method, url} pair — url is a plain string, e.g. "https://example.com/path"). Optionally crawls each URL to discover linked resources. Returns the VU's id, name, description, tags, timestamps and a url deep-link to the Virtual User page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeNoURL crawling strategy: CRAWL (follow linked resources) or DO_NOT_CRAWL.
urlsYesURL list. Each entry: `method` (e.g. "GET") and `url` (full URL string, e.g. "https://example.com/path").
projectIdYesOctoPerf project id where the URL list will be imported.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
createdNoTimestamp as epoch milliseconds.
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes optional crawling behavior and return fields (id, name, etc.), adding value beyond annotations which only indicate non-read-only and non-destructive. Could be more explicit about mutational effect on project.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three efficient sentences: purpose, parameter details, and return value. Front-loaded with key verb and resource, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, input, optional behavior, and output. Output schema exists for return details. Could mention prerequisites like valid project ID, but schema already requires it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions; the tool description restates parameter format in natural language but does not add significant new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Create a Virtual User from a list of URLs', specifying the exact resource and input format. Distinguishes from sibling import tools that use different formats (HAR, Playwright, etc.).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly indicates usage when you have a list of URLs, but does not explicitly state when not to use or mention alternatives. The context from sibling tool names provides differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

import_webdriver_virtual_userImport WebDriver virtual userAInspect

[Design] Create a WebDriver (browser-driven) Virtual User from a list of URLs. Each entry is a {method, url} pair — url is a plain string, e.g. "https://example.com/path". Each URL becomes a navigation step. Returns the VU's id, name, description, tags, timestamps and a url deep-link to the Virtual User page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlsYesURL list. Each entry: `method` (typically "GET") and `url` (full URL string, e.g. "https://example.com/path").
projectIdYesOctoPerf project id where the WebDriver VU will be created.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
createdNoTimestamp as epoch milliseconds.
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that each URL becomes a navigation step and returns the VU's id, name, description, tags, timestamps, and a deep link. Annotations declare readOnlyHint=false and destructiveHint=false, and the description aligns with creation behavior, adding useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, information-dense sentence with no redundant words. It front-loads the purpose and efficiently covers input and output structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given two required parameters, no nested objects, and presence of an output schema, the description adequately covers what the tool does, input format, and return value. It is complete for a creation tool with clear schema definitions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value by explaining that each URL in the array becomes a navigation step and that the `url` is a plain string (e.g., 'https://example.com/path'). This contextualizes the parameters beyond the schema's technical definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a WebDriver Virtual User from a list of URLs, distinguishing it from sibling tools like import_har_virtual_user which import from HAR files. The verb 'Create' and resource 'WebDriver Virtual User' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the input format but does not provide guidance on when to use this tool versus alternatives like import_playwright_virtual_user or import_har_virtual_user. It lacks explicit exclusions or context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_active_subscriptionsList active subscriptionsA
Read-only
Inspect

[Runtime] List every USABLE subscription on the account — those whose status is active, trialing or cancel_at_period_end (the rest — past_due, canceled, unpaid, incomplete, … — are filtered out). Each entry flattens the plan caps the LLM needs to reason about a scenario fit: maxConcurrentUsers, maxRealBrowserUsers, maxProfilesPerScenario, maxTestDurationSec, remainingTests, expiresOn. Use this when get_scenario_matching_plans came back empty: the scenario hits one of these caps (typically maxRealBrowserUsers=0 on basic plans rejects any Playwright UserProfile, or maxProfilesPerScenario=1 rejects multi-VU hybrid scenarios). Returns an empty list when the user has no usable subscription at all.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
subscriptionsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. Description adds valuable behavioral details: flattens plan caps, specific fields included, and returns empty list for no usable subscription. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph, front-loaded with purpose. Dense but not overly long. Minor stylistic notes like '[Runtime]' but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no input parameters and existence of output schema, description covers output fields, use case, and edge case (empty list). Complete for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters (0), schema coverage 100%. Description does not need to add parameter info. Baseline for 0 params is 4, and the description provides extensive context about output fields, which adds value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it lists 'USABLE subscription' with specific statuses (active, trialing, cancel_at_period_end), distinguishing it from sibling tools like get_scenario_matching_plans. Verb 'list' and resource 'subscriptions' are explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this when get_scenario_matching_plans came back empty' and explains the reasoning (hitting caps). Also mentions returns empty list when no usable subscription, providing clear when-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_bench_docker_logsList bench Docker logsA
Read-only
Inspect

[Runtime] Snapshot of the Docker container logs produced by an OctoPerf load test — the same lines the web UI streams on the running-bench logs panel. Use this to diagnose launch failures (image pull errors, provider quotas, missing project files, agent boot crashes) when get_bench_result reports state=ERROR or the run is stuck in PREPARING / INITIALIZING. Returns one entry per log line with date, level (INFO / WARN / ERROR), and message. The server resolves the BenchResult's batchId internally — pass the benchResultId, not the batchId. No incremental cursor; the upstream API always returns the full log set, sort/filter client-side.

ParametersJSON Schema
NameRequiredDescriptionDefault
benchResultIdYesOctoPerf benchResult id whose Docker launch logs to read.

Output Schema

ParametersJSON Schema
NameRequiredDescription
logsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds valuable behavioral details beyond annotations: it states it returns a snapshot, one entry per log line with specific fields, that the server resolves batchId internally, and importantly, 'No incremental cursor; the upstream API always returns the full log set, sort/filter client-side.' This disclosure of full log set behavior is crucial for the agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-organized paragraph of five sentences. Every sentence adds value: purpose, usage context, return format, parameter clarification, and behavioral caveat. No redundancy or fluff. It is front-loaded with the core purpose and efficiently packs all necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only tool with one parameter and an output schema present, the description is complete. It covers purpose, usage context, parameter semantics, return format (date, level, message), and behavioral notes (no pagination). The output schema may provide additional structure, but the description does not need to repeat it. It adequately informs the agent for correct selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a description for benchResultId. The description adds critical semantic context: 'pass the benchResultId, not the batchId' and 'The server resolves the BenchResult's batchId internally.' This clarifies the exact parameter meaning and avoids a common mistake of passing the wrong ID, surpassing what the schema alone provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool lists Docker container logs for a OctoPerf load test bench result. It specifies the action (list/snapshot) and resource (Docker container logs), and distinguishes from siblings like list_bench_load_generators and list_bench_result_files by focusing on Docker logs for diagnostics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance: 'Use this to diagnose launch failures when get_bench_result reports state=ERROR or the run is stuck in PREPARING/INITIALIZING.' It also clarifies which ID to pass (benchResultId, not batchId). However, it does not explicitly mention when not to use it or differentiate from other log-related tools like read_bench_result_file_lines, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_bench_load_generatorsList bench load generatorsA
Read-only
Inspect

[Runtime] List the BenchLoadGenerator agents that participated in a bench run — one entry per LG container with id, regionId, hostname (the JMeter / Chromium agent), virtualUserCount it carried, testStarted timestamp, and (when finished) testEnded. This is the data source behind both the LoadGeneratorsChartReportItem and LoadGeneratorsTreeReportItem widgets — a tree groups by region, a chart plots VU count per LG over time, but both pull from the same BenchLoadGenerator list returned here. Use when diagnosing LG-side issues (a hybrid scenario should show one Playwright LG + one JMeter LG per region; missing LGs hint at a provisioning failure surfaced earlier in list_bench_docker_logs).

ParametersJSON Schema
NameRequiredDescriptionDefault
benchResultIdYesOctoPerf benchResult id whose load generators to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
generatorsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and non-destructive behavior. The description adds value by explaining the data fields and that it serves as the data source for two report widgets, providing deeper behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: it starts with the action and key details, then lists returned fields, then provides usage guidance. Every sentence contributes value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, output schema exists), the description covers all needed information: what it does, what it returns, and when to use it. It also clarifies the relationship to report widgets.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description does not add new information about the benchResultId parameter beyond what the schema provides. The description reinforces the parameter's purpose but no additional semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool lists 'BenchLoadGenerator' agents for a bench run, specifying fields like id, regionId, hostname, virtualUserCount, and timestamps. It differentiates from sibling tools by referencing list_bench_docker_logs for provisioning failures.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises using this tool when diagnosing LG-side issues and provides an example scenario (hybrid scenario expecting one Playwright and one JMeter LG per region). It does not explicitly state when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_bench_reports_by_projectList bench reports by projectA
Read-only
Inspect

[Analysis] List every OctoPerf bench report that belongs to a project, including reports tied to bench results that are no longer kept. Returns BenchReportListings whose reports each carry id, name, description, benchResultIds (the runs the report analyses), lastModified, tags and a url deep-link to the analysis page. Use this to discover what's available before pulling the full content of a specific report with get_bench_report.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose bench reports to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
reportsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and non-destructive behavior. The description adds further transparency by noting it includes reports tied to deleted bench results, and describes the return structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no fluff, front-loaded with purpose and behavior, followed by return type and usage guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description provides adequate detail about return fields (id, name, description, etc.) for a list operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add extra meaning beyond the schema's clear description of the projectId parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists bench reports by project, including reports tied to deleted results. It uses specific verbs and resources, and distinguishes from the sibling 'get_bench_report' tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use this to discover what's available before pulling the full content of a specific report with get_bench_report', providing clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_bench_result_filesList bench result filesA
Read-only
Inspect

[Analysis] List every file stored against an OctoPerf benchResult — JMeter logs (jmeter.log, jmeter-server.log, gzipped variants), JTL result files, screenshots, HAR captures, Playwright traces, debug attachments, any artefact uploaded by the load generators. Returns BenchResultFiles whose files (sorted by filename) each carry filename, size (bytes) and lastModified (epoch ms). Works for both real bench runs (from run_scenario) and Virtual User validation runs (from validate_virtual_user) — both flows produce a benchResultId that backs the same storage. Files are typically available once the run reaches FINISHED / ABORTED. Use read_bench_result_file_lines next to pull text content.

ParametersJSON Schema
NameRequiredDescriptionDefault
benchResultIdYesOctoPerf benchResult id whose files to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
filesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, destructiveHint) are consistent. Description adds behavioral details: sorting by filename, field details (size in bytes, lastModified epoch ms), and dual-flow support. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is moderately long but front-loaded with purpose; every sentence adds value. Slight redundancy in listing file types but justified by completeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given single parameter and output schema existence, description adequately covers return fields and usage context. Could mention pagination or limits, but not critical for simple list.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Single parameter benchResultId with 100% schema coverage. Description adds context that the ID can come from run_scenario or validate_virtual_user, enhancing meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description specifies verb 'list', resource 'bench result files', and enumerates file types (JMeter logs, JTL, screenshots, etc.). Distinguishes from siblings by noting it works for both real bench runs and Virtual User validation runs, which is unique among sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States when files are typically available (FINISHED/ABORTED) and suggests next step tool (read_bench_result_file_lines). Does not explicitly exclude when not to use, but context implies it's for listing after a bench result exists.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_correlation_frameworksList correlation frameworksA
Read-only
Inspect

[Design] List the correlation frameworks available on the OctoPerf instance — including built-in presets (SAML, OAuth, .NET, Java, Token, AzureAD, …) and any custom framework defined on the instance. Each framework groups a set of correlation rules that can be applied to a project in one call via add_correlation_framework_to_project. Returns each framework's id, name, isDefault flag, and ruleCount.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
frameworksNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds that it returns id, name, isDefault flag, and ruleCount, and mentions scope (instance). It does not contradict annotations and provides additional return value context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, covering the action, scope, examples, and return fields in a single sentence with no unnecessary words. It is well structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of parameters and presence of an output schema (not shown), the description is reasonably complete. It explains the tool's purpose, the items it returns, and its relationship to a sibling tool. Missing potential details like permissions or prerequisites, but these are minor for a read-only tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are no parameters, so the baseline is 4. The description adds value by explaining what the tool returns and the purpose of the frameworks, which is beyond the empty schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists correlation frameworks, distinguishing between built-in presets and custom ones. It also differentiates from sibling tools by mentioning that frameworks can be applied via add_correlation_framework_to_project, and explicitly lists return fields.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use this tool: to see available frameworks before applying them via add_correlation_framework_to_project. It also implicitly contrasts with list_correlation_rules, which lists individual rules. However, it does not explicitly state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_correlation_rulesList correlation rulesA
Read-only
Inspect

[Design] List the correlation rules defined in an OctoPerf design project. Each rule extracts a value from a response (BODY or HEADERS) and re-injects it into subsequent requests, letting load tests handle CSRF tokens, session ids and other server-issued values. Returns each rule's id, variableName, extractor type, type-specific pattern (regex / jsonPath / xpath / …), injection targets summary, enabled flag, and a url deep-link to the rule edit page.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose correlation rules to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
rulesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds value by specifying the exact fields returned (id, variableName, extractor type, etc.), providing behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with purpose. It could be slightly shorter but effectively communicates the tool's function and returns.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists (though not shown), the description covers what the tool does and the return fields. For a list operation with one parameter, it is fairly complete, though more details about pagination or ordering could help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the projectId parameter. The description adds no further parameter-level details, but the tool's purpose is clear. The baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists correlation rules in an OctoPerf design project and explains what correlation rules are. It differentiates from sibling tools like create_correlation_rule and delete_correlation_rule by focusing on listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to view existing rules) but does not explicitly state when not to use or mention alternatives. No guidance on when to choose this over list_correlation_frameworks or other list tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_docker_providers_by_workspaceList Docker providers by workspaceA
Read-only
Inspect

[Docker] List the Docker providers a workspace can use to run OctoPerf load generators. Returns each provider's id, name, type (PUBLIC / PRIVATE), available regions, enabled flag, and a url deep-link to the provider page in the OctoPerf web UI. Pick one to feed into validate_virtual_user (providerId + region).

ParametersJSON Schema
NameRequiredDescriptionDefault
workspaceIdYesOctoPerf workspace id whose providers to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
providersNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds value by detailing the return fields (including a deep-link URL) and the tool's role in a workflow, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: one sentence for purpose, one for return fields, one for usage. It is front-loaded with [Docker] tag. Every sentence adds value, no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema and only one parameter, the description is complete. It explains what the tool returns, why to use it (for validate_virtual_user), and includes a deep-link. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. The description does not add extra meaning to the single parameter workspaceId beyond what the schema provides. No additional guidance on format or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists Docker providers for a workspace, specifies the returned fields (id, name, type, regions, enabled, url), and distinguishes from siblings like list_public_docker_providers. It is specific with verb 'list' and resource 'Docker providers by workspace'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says to pick a provider to feed into validate_virtual_user, providing clear when-to-use guidance. It does not mention when not to use or compare to similar tools, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_http_servers_by_projectList HTTP servers by projectA
Read-only
Inspect

[Design] List the HTTP servers (baseUrl + timeouts + IP spoofing flag) defined in an OctoPerf design project. Authorization credentials are NOT exposed. Each server is referenced by the HTTP request actions of the project's Virtual Users. Each entry also includes a url deep-link to the server edit page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose HTTP servers to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
serversNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark readOnlyHint=true and destructiveHint=false, so the description adds value by specifying that authorization credentials are not exposed and that each entry includes a deep-link URL. This provides behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences, front-loading the purpose and key fields. Every sentence adds value, with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, output schema exists), the description is complete. It covers what is listed, what is omitted, and additional return value details like deep-links. The output schema handles the rest.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with a clear description for projectId. The description does not add further semantic details about the parameter, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists HTTP servers with specific fields (baseUrl, timeouts, IP spoofing flag) from a design project, and distinguishes itself by not exposing authorization credentials. The purpose is unambiguous and distinguishes from sibling tools like list_http_server_usages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context (design project, credentials not exposed) but does not explicitly state when to use this tool over alternatives. No direct comparison with sibling list tools is provided, so the agent must infer usage from the purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_http_server_usagesList HTTP server usagesA
Read-only
Inspect

[Design] List the Virtual Users whose HTTP request actions reference a given HTTP server. Read-only. Use before delete_http_server or update_http_server to surface the blast radius of a change. Returns HttpServerUsages whose virtualUsers each carry id, name, description, tags, timestamps, and a url deep-link to its design page.

ParametersJSON Schema
NameRequiredDescriptionDefault
serverIdYesOctoPerf HTTP server id whose impacted Virtual Users to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
virtualUsersNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. Description adds that it returns HttpServerUsages with specific fields (id, name, description, tags, timestamps, url deep-link), enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with [Design] and purpose, followed by usage guidance and return detail. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given simple single-param tool with safety annotations and output schema, the description fully covers purpose, usage, return structure, and safety. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameter description. Description does not add new semantics beyond schema, but the parameter is self-explanatory. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists Virtual Users referencing a given HTTP server. Distinguishes from siblings like list_http_servers_by_project and delete_http_server by specifying the scope (usages reference) and use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using this before delete_http_server or update_http_server to assess impact. Provides clear context on when to invoke.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_project_filesList project filesA
Read-only
Inspect

[Design] List the files attached to an OctoPerf design project (typically CSV files used for Virtual User parameterization). Returns each file's name, size in bytes, last-modified timestamp, and a url deep-link to the project's Files page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose files to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
filesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds value by detailing return fields (name, size, timestamp, url) and typical file type, which is beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise—two sentences with no wasted words. The purpose is front-loaded in the first sentence, and the second efficiently lists return fields.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one required parameter, no nested objects), annotations covering safety, and an existing output schema (implied by context signals), the description provides all necessary context for correct use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single 'projectId' parameter well-described in the schema. The description does not add further parameter semantics, which is acceptable given the baseline of 3 for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists files attached to an OctoPerf design project, specifying typical use (CSV files for parameterization). This distinguishes it from sibling tools like 'delete_project_file' or 'download_project_file'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing project files but does not explicitly compare with alternatives or provide when-not-to-use guidance. Sibling tools exist for deleting, downloading, or reading file lines, but no direct contrast is made.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_projects_by_workspaceList projects by workspaceA
Read-only
Inspect

[Project] List OctoPerf design projects that belong to a workspace. Returns each project's id, workspaceId, name, description, and a url deep-link to the project page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
workspaceIdYesOctoPerf workspace id whose projects to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
projectsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint: true and destructiveHint: false, so the agent knows it is a safe read operation. The description adds value by specifying the return format including a deep-link URL, which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, no unnecessary words, front-loaded with the verb and resource, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present (indicated by 'has output schema: true'), the description explains return values adequately. It covers the purpose, input, and output, leaving no significant gaps for a simple listing tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (the only parameter, workspaceId, is described). The description does not add extra meaning beyond the schema, so the baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('list') and resource ('projects by workspace'), and explicitly states the returned fields (id, workspaceId, name, description, url), clearly distinguishing it from sibling tools like create_project, update_project, or delete_project_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly indicates the tool lists projects for a given workspace, providing sufficient context for use. However, it does not explicitly state when not to use it or offer alternatives, though the sibling tool list implies other tools for specific actions like creation or update.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_public_docker_providersList public Docker providersA
Read-only
Inspect

[Docker] List the public OctoPerf Cloud Docker providers shared across all workspaces. Use these when list_docker_providers_by_workspace returns an empty list. Returns each provider's id, name, type (always PUBLIC), available regions, and enabled flag. The url deep-link is empty because public providers are not bound to a specific workspace. Pick one to feed into validate_virtual_user (providerId + region).

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
providersNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. Description adds valuable behavioral context: describes return fields (id, name, type always PUBLIC, regions, enabled flag), explains why url is empty, and confirms the read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise yet comprehensive. Front-loaded with purpose. Every sentence adds value: usage guidance, return format, nullable fields, and downstream usage. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Fully contextualized for a zero-parameter list tool. Covers when to use, what is returned, how to use with another tool, and addresses edge case (empty url). Output schema likely documents return types, so description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so schema coverage is 100% and baseline is 4. Description provides context about output fields (id, name, type, regions, enabled, url) beyond schema, but no param info needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'List', the resource 'public OctoPerf Cloud Docker providers', and the scope 'shared across all workspaces'. Distinguishes from sibling 'list_docker_providers_by_workspace' by explicitly referencing it.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies exactly when to use this tool: 'Use these when list_docker_providers_by_workspace returns an empty list.' Also directs how to use the results: 'Pick one to feed into validate_virtual_user (providerId + region).'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_scenarios_by_projectList scenarios by projectA
Read-only
Inspect

[Runtime] List OctoPerf scenarios (i.e. runtime load-test definitions) that belong to a project. Returns each scenario's id, projectId, name, description, tags and a url deep-link to the scenario page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose scenarios to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
scenariosNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false, so the tool's safety is clear. The description adds that it returns a deep-link URL but doesn't disclose other behavioral traits like pagination or performance implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (one sentence plus a list of returned fields) and front-loaded with '[Runtime]'. It effectively communicates the tool's function without unnecessary text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single required parameter, read-only, no output schema needed), the description provides sufficient context including the return structure. It could mention that the output is a list, but the purpose is clear.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema itself describes the parameter adequately. The description does not add additional meaning beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists OctoPerf scenarios belonging to a project, specifying the returned fields (id, projectId, name, description, tags, url). It distinguishes itself from sibling tools that list other entities (e.g., virtual users, projects).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives. While the purpose is clear, it lacks context like when not to use it or prerequisites (e.g., needing a valid projectId).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_scheduled_jobs_by_projectList scheduled jobs by projectA
Read-only
Inspect

[Scheduler] List every ScheduledJob belonging to an OctoPerf project. Each entry flattens the polymorphic trigger as triggerDescription (cron expression like 0 22 * * * for cron jobs, ISO-8601 datetime for one-shot jobs) and includes the job's enabled flag and nextRun epoch-ms. Use this after schedule_scenario_once / schedule_scenario_cron to recap active schedules, or before enable_scheduled_job / disable_scheduled_job / delete_scheduled_job to look up the right jobId.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose scheduled jobs to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
jobsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the agent knows it's a safe read operation. The description adds that it flattens the polymorphic trigger into triggerDescription and includes enabled/nextRun, which are useful behavioral details beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first states the core functionality, the second provides usage context and output highlights. It is front-loaded, efficient, and every sentence adds value with no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple input (one parameter with 100% schema coverage) and presence of an output schema, the description is complete. It mentions the key output fields (triggerDescription, enabled, nextRun) and provides usage context that directly guides the agent. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage for the single parameter projectId, which is already well-described in the schema. The description does not add additional semantics beyond what the schema provides, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists all ScheduledJobs for a project, with a [Scheduler] prefix that distinguishes it from other tools. The verb 'list' and resource 'ScheduledJob by project' are specific. Sibling tools include other scheduler operations like enable/disable, so this is distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends using this tool after schedule operations and before enable/disable/delete operations to get the jobId. It does not explicitly state when not to use it, but the usage context is very clear and provides actionable guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_variablesList variablesA
Read-only
Inspect

[Design] List the variables (parameterization inputs for Virtual Users) defined in an OctoPerf design project. Returns each variable's id, name, type (ConstantVariable / CounterVariable / RandomVariable / CSVVariable / SecretVariable / ListVariable), description, value summary, usage syntaxes (dollar-brace placeholders) and a url deep-link to the variable edit page. SecretVariable values are surfaced encrypted (as the backend returns them).

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose variables to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
variablesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adds value by noting that SecretVariable values are surfaced encrypted and that a deep-link URL is returned. No behavioral contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and efficiently conveys purpose, return fields, and a behavioral note. It is concise enough while covering important details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists (as indicated by context signals), the description explains the return values sufficiently. For a list tool with good annotations, the description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema documents the single parameter. The description does not add additional parameter semantics, but appropriately avoids redundancy. Baseline 3 is correct.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists variables in an OctoPerf design project and details the returned fields (id, name, type, etc.). It distinguishes itself from sibling tools like create_variable or delete_variable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description starts with '[Design]' indicating it applies to design projects, but does not explicitly state when not to use or suggest alternatives. However, the context of sibling tools makes usage boundaries clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_virtual_usersList virtual usersA
Read-only
Inspect

[Design] List the Virtual Users that belong to an OctoPerf design project. Returns VirtualUserListings whose virtualUsers each carry id, name, description, tags, timestamps, and a url deep-link to the Virtual User page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id whose virtual users to list.

Output Schema

ParametersJSON Schema
NameRequiredDescription
virtualUsersNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds context about the return structure (VirtualUserListings with specific fields and a deep-link URL), which goes beyond the annotations. It does not mention pagination, but given the simple one-parameter tool, this is acceptable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences with no unnecessary words. The first sentence states the action, the second describes the output. It is front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool with one parameter, annotations covering safety, and an output schema (though not shown), the description is complete. It explains the return structure and the scope. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% and already explains the projectId parameter. The tool description does not add any additional semantic information beyond what the schema provides. Baseline is 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (List), the resource (Virtual Users), and the scope (belong to an OctoPerf design project). It also specifies the return structure. It is distinct from sibling tools like list_scenarios_by_project or list_variables.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing virtual users in a project but does not explicitly state when to use this tool versus alternatives (e.g., get_virtual_user for a single user). No when-not-to-use guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_workspacesList workspacesA
Read-only
Inspect

[Workspace] List OctoPerf workspaces the current user is a member of. Returns each workspace's id, name, description, and a url deep-link to the workspace page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
workspacesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true. The description adds value by specifying the exact returned fields and the deep-link URL, providing context beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the tool's purpose, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (no parameters, read-only, explicit output fields), the description is complete and sufficient for an AI agent to understand its functionality.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are zero parameters, so the description does not need to explain parameter semantics. Baseline of 4 is appropriate as there is nothing to describe.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists workspaces the user is a member of and specifies the return fields (id, name, description, url). It distinguishes itself from siblings, which cover projects, virtual users, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives, but the purpose is implied by its name and simplicity. It is the only tool for listing workspaces among many sibling tools, but explicit usage context is missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patch_bench_reportPatch bench reportA
Destructive
Inspect

[Analysis] Apply an RFC 6902 JSON Patch to an OctoPerf BenchReport's full entity (metadata + the polymorphic items list + the configs set + benchResultIds). The patch is a JSON string holding an array of operations [{"op":"add|remove|replace|move|copy|test", "path":"/json/pointer", "value":<json>}, ...]. Paths use RFC 6901 JSON Pointer (/items/0/name, /configs, /tags, …). Polymorphic widgets (BenchReportItem subtypes: SummaryReportItem, StatisticTableReportItem, TopReportItem, LineChartReportItem, PieChartReportItem, StackedChartReportItem, AreaRangeChartReportItem, ErrorsReportItem, InsightsReportItem, …) must keep their @type discriminator; consult octoperf://schema/bench-report for the per-subtype required fields (also served as JSON at /mcp/public/schema/bench-report.json on this server's origin for clients that can't read MCP resources). The server fetches the current report, applies the patch on its JSON representation, re-deserializes via Jackson (round-trip validation — invalid schema is rejected) and persists. Returns the updated BenchReport. Use get_bench_report first to read the current shape and compute precise paths.

ParametersJSON Schema
NameRequiredDescriptionDefault
patchYesRFC 6902 JSON Patch document as a JSON string — an array of `{op, path, value}` operations applied in order to the current BenchReport entity.
reportIdYesOctoPerf bench report id to patch.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
nameNo
tagsNo
itemsNo
userIdNo
configsNo
createdNoTimestamp as epoch milliseconds.
projectIdNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
benchResultIdsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses patching behavior, round-trip validation, and schema rejection. It adds context beyond annotations (destructiveHint=true) by explaining the process and validating schema. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the purpose. It includes necessary details like schema resource for clients without MCP resource support. While somewhat lengthy, every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool involving polymorphic types and JSON Patch, the description is thorough. It references the schema resource, explains round-trip validation, and instructs on preparatory steps. Output schema exists, so return values are covered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant value by detailing the patch format (JSON string of operations), providing example paths, and explaining the polymorphic subtypes, which goes beyond the schema's minimal description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool applies an RFC 6902 JSON Patch to a BenchReport's full entity, specifying the resource (BenchReport) and scope (metadata, items, configs, benchResultIds). It distinguishes from siblings like update_bench_report by using a specific patching approach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides clear guidance to first use get_bench_report to compute paths and mentions the schema resource for polymorphic types. However, it doesn't explicitly contrast with update_bench_report or state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patch_scenarioPatch scenarioA
Destructive
Inspect

[Runtime] Apply an RFC 6902 JSON Patch to an OctoPerf Scenario's full entity (metadata + the userProfiles list with their polymorphic load shapes and engine settings). The patch is a JSON string holding an array of operations [{"op":"add|remove|replace|move|copy|test", "path":"/json/pointer", "value":<json>}, ...]. Paths use RFC 6901 JSON Pointer (/userProfiles/0/load/users, /name, …). Polymorphic nodes (UserProfileLoad, UserProfileEngine, …) must keep their @type discriminator; consult octoperf://schema/scenario for the per-subtype required fields (also served as JSON at /mcp/public/schema/scenario.json on this server's origin for clients that can't read MCP resources). The server fetches the current scenario, applies the patch on its JSON representation, re-deserializes via Jackson (round-trip validation — invalid schema is rejected) and persists. Returns the updated Scenario. Use get_scenario first to read the current shape and compute precise paths.

ParametersJSON Schema
NameRequiredDescriptionDefault
patchYesRFC 6902 JSON Patch document as a JSON string — an array of `{op, path, value}` operations applied in order to the current Scenario entity.
scenarioIdYesOctoPerf scenario id to patch.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
modeNo
nameNo
tagsNo
userIdNo
createdNoTimestamp as epoch milliseconds.
projectIdNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
userProfilesNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses the full behavior: server fetches current scenario, applies patch, performs round-trip validation via Jackson, rejects invalid schema, and persists. Annotations already indicate destructiveness, and description adds detail on the update mechanism and validation step.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is informative and front-loaded with purpose but is relatively long due to technical details. It contains all necessary information without redundancy, though some users might find it dense. Still, it earns its length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (polymorphic types, validation, output schema), the description is thorough. It covers return value (updated Scenario), validation behavior, and references additional resources for schema details. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 100% schema description coverage, the description adds significant value beyond the schema. It explains patch format with example operations, mentions path syntax, and references external schema for polymorphic subtypes. This aids the agent in constructing correct patch documents.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool applies an RFC 6902 JSON Patch to an OctoPerf Scenario's full entity, including metadata and polymorphic userProfiles. It distinguishes itself from sibling tools like update_scenario by specifying the patch-based approach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises using get_scenario first to read the current shape and compute precise paths. It provides context on path syntax (RFC 6901 JSON Pointer) and polymorphic node requirements but does not list when to avoid using this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patch_virtual_userPatch virtual userA
Destructive
Inspect

[Design] Apply an RFC 6902 JSON Patch to an OctoPerf Virtual User's full entity (metadata + recursive children action tree). The patch is a JSON string holding an array of operations [{"op":"add|remove|replace|move|copy|test", "path":"/json/pointer", "value":<json>}, ...]. Paths use RFC 6901 JSON Pointer (/children/0/url, /name, …). Polymorphic nodes (actions, extractors, assertions, …) must keep their @type discriminator; consult octoperf://schema/vu for the per-subtype required fields (also served as JSON at /mcp/public/schema/vu.json on this server's origin for clients that can't read MCP resources). The server fetches the current VU, applies the patch on its JSON representation, re-deserializes via Jackson (round-trip validation — invalid schema is rejected) and persists. Returns the updated VirtualUser. Use get_virtual_user first to read the current tree and compute precise paths.

ParametersJSON Schema
NameRequiredDescriptionDefault
patchYesRFC 6902 JSON Patch document as a JSON string — an array of `{op, path, value}` operations applied in order to the current VirtualUser entity.
virtualUserIdYesOctoPerf Virtual User id to patch.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
nameNo
tagsNo
typeNo
userIdNo
createdNoTimestamp as epoch milliseconds.
childrenNo
projectIdNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description details the server behavior: fetching current VU, applying patch, round-trip validation via Jackson, and persistence. It also warns about maintaining @type discriminator. Annotations already indicate destructiveHint=true, but the description enriches beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that efficiently conveys all necessary information. Each sentence is valuable, though it could be slightly more structured with bullet points. Still, it is concise and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and high schema coverage, the description is complete: it covers purpose, operation, prerequisites (get_virtual_user), validation, and return value. No critical gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant meaning: explains patch format (RFC 6902), path syntax (RFC 6901 JSON Pointer), and references to schema for polymorphic nodes. This goes well beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool applies an RFC 6902 JSON Patch to an OctoPerf Virtual User. It specifies the verb 'apply', resource 'Virtual User', and the method 'JSON Patch', distinguishing it from siblings like update_virtual_user (full replace) and delete_virtual_user.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises using get_virtual_user first to compute precise paths, providing clear guidance on when to use this tool. It does not explicitly state when not to use it, but the context is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_bench_result_file_linesRead bench result file linesA
Read-only
Inspect

[Analysis] Read a contiguous range of lines from a file attached to an OctoPerf benchResult — jmeter.log, jmeter-server.log, JTL traces, attachments, … Works for both real bench runs and Virtual User validation runs. Line numbers are 0-based, fromLine is inclusive and toLine is exclusive. Defaults read the first 100 lines. Gzipped files are transparently uncompressed server-side. Binary artefacts (zip, png screenshots) return garbage — only call on text files (filenames ending in .log, .jtl, .txt, .csv, .har, .json, or their .gz variants).

ParametersJSON Schema
NameRequiredDescriptionDefault
toLineNoFirst line to exclude (0-based). Defaults to fromLine + 100.
filenameYesFilename as returned by `list_bench_result_files`.
fromLineNoFirst line to read (0-based, inclusive). Defaults to 0.
benchResultIdYesOctoPerf benchResult id the file belongs to.

Output Schema

ParametersJSON Schema
NameRequiredDescription
linesNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses transparent decompression of gzipped files, 0-based line numbering, defaults, and binary file warning, going beyond annotations which only indicate readOnly and non-destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise paragraph with 5 sentences, each providing essential information without fluff, front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all relevant aspects: file types, line ranges, defaults, decompression, and binary warning, making it fully informative for a simple read operation with an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description adds meaning: inclusive/exclusive ranges, defaults, and that filename comes from list_bench_result_files.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it reads a contiguous range of lines from a file attached to an OctoPerf benchResult, specifying file types and differentiating from sibling tools like read_project_file_lines.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on when to use (bench results, validation runs) and warns about binary files, but lacks explicit comparison with siblings or when-not-to-use scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_project_file_linesRead project file linesA
Read-only
Inspect

[Design] Read a contiguous range of lines from a file in an OctoPerf design project — useful to inspect CSVs without downloading them entirely. Line numbers are 0-based, fromLine is inclusive and toLine is exclusive. Defaults read the first 100 lines.

ParametersJSON Schema
NameRequiredDescriptionDefault
toLineNoFirst line to exclude (0-based). Defaults to fromLine + 100.
filenameYesName of the file to read (as returned by `list_project_files`).
fromLineNoFirst line to read (0-based, inclusive). Defaults to 0.
projectIdYesOctoPerf project id the file belongs to.

Output Schema

ParametersJSON Schema
NameRequiredDescription
linesNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and destructiveHint=false, and the description adds details about 0-based indexing, inclusive/exclusive bounds, and default behavior, which is consistent and transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three efficient sentences: first states purpose, second explains line numbering, third clarifies defaults. No wasted words and good structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description covers essential aspects. It could mention error handling or limits, but for a simple read tool it is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds clarity on line number semantics (0-based, inclusive/exclusive) and defaults, which goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Read a contiguous range of lines'), the resource ('file in an OctoPerf design project'), and the use case ('inspect CSVs without downloading'), distinguishing it from siblings like download_project_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear use case ('inspect CSVs without downloading'), implying when to use it. It does not explicitly state when not to use it or list alternatives, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

run_scenarioRun scenarioA
Destructive
Inspect

[Runtime] STARTS a new OctoPerf load test by running the given scenario. This consumes subscription credits and spins up load generators. Returns the benchReportId, the benchResultIds that can be polled with get_bench_status, and a url deep-link to the live bench report page in the OctoPerf web UI. Pre-flight with get_scenario_matching_plans first: an empty result means no plan can host the scenario; a non-empty result confirms the run will start as configured.

ParametersJSON Schema
NameRequiredDescriptionDefault
scenarioIdYesOctoPerf scenario id to execute.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlNo
benchReportIdNo
benchResultIdsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true and readOnlyHint=false. The description adds valuable context by mentioning credit consumption, load generator spin-up, and return values (benchReportId, benchResultIds, url). This goes beyond what annotations provide, though no mention of rate limits or cancellation behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first sentence covers purpose and effects, second provides a critical pre-flight tip. No wasted words, information is front-loaded, and the structure is easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity involving credit consumption and load generators, the description adequately covers what to expect (return values, pre-flight). The presence of an output schema reduces the need to detail return format further. However, it doesn't mention potential errors or required permissions, leaving minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the single parameter 'scenarioId' is well-described in the schema. The description does not add additional parameter-level details beyond what the schema provides, so baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it starts a load test by running a scenario, including the verb 'starts', the resource 'scenario', and specific effects like consuming credits and spinning up load generators. It distinguishes from sibling tools like stop_bench_result and delete_scenario by making the action explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to pre-flight with get_scenario_matching_plans to check if a plan can host the scenario, and explains what different results mean. This provides clear when-to-use and alternative actions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanity_check_virtual_userSanity check virtual userA
Read-only
Inspect

[Design] Run the OctoPerf design sanity check on a Virtual User. Returns the list of issues found, each with a type and a message. An empty list means the VU passed all checks.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id the Virtual User belongs to.
virtualUserIdYesOctoPerf Virtual User id to check.

Output Schema

ParametersJSON Schema
NameRequiredDescription
errorsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond the annotations (readOnlyHint=true, destructiveHint=false) by explaining the output format (list of issues with 'type' and 'message') and what an empty list means (all checks passed). This clarifies the behavior without repeating the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that immediately convey the tool's purpose and output. It is front-loaded with the action and resource, and every sentence provides necessary information without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (two required string params, read-only operation, output schema exists), the description fully covers the key aspects: what it does, what it returns, and the meaning of an empty result. No gaps are present.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already describes both parameters (projectId and virtualUserId) with 100% coverage. The description does not add any additional meaning or context about these parameters, so it provides no extra value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a design sanity check on a Virtual User, using a specific verb ('Run') and resource ('Virtual User'). It distinguishes from siblings like 'validate_virtual_user' by the '[Design]' prefix and focus on design-time checks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives like 'validate_virtual_user' or 'get_virtual_user'. It implies usage for checking a virtual user's design, but lacks guidance on when not to use it or what distinguishes it from similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

schedule_scenario_cronSchedule scenario (cron)AInspect

[Scheduler] Schedule an OctoPerf Scenario to run on a recurring CRON expression. Creates a ScheduledJob with a CronScheduledTrigger and an ExecuteScenario task. Every fire consumes credits — a daily cron drains the subscription daily until disabled or deleted. expression is a Unix-style 5-field cron expression (minute hour day-of-month month day-of-week — NO seconds field, NOT Quartz format) evaluated in UTC by the server. Examples: 0 22 * * * = every day at 22:00 UTC (= midnight Paris in CEST / 23:00 in CET); 0 9 * * 1-5 = every weekday at 09:00 UTC; 30 14 1 * * = 1st of every month at 14:30 UTC. Convert the user's local time to UTC explicitly, accounting for DST when relevant. The job is enabled=true by default — pass enabled=false to register a paused job. Returns the new ScheduledJob id, scenarioId, name, triggerDescription (the cron expression), enabled flag, nextRun (when known), and a url deep-link to the project scheduler page.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoHuman-readable job name. Defaults to empty.
enabledNoWhether the job is enabled. Defaults to true.
expressionYesUnix-style 5-field cron expression evaluated in UTC, e.g. `0 22 * * *` for every day at 22:00 UTC (= midnight Paris CEST).
scenarioIdYesOctoPerf scenario id to schedule.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
enabledNo
nextRunNoTimestamp as epoch milliseconds.
scenarioIdNo
triggerDescriptionNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses creation of ScheduledJob, credit consumption, and default enabled state beyond annotations. No contradictions with annotations (readOnlyHint=false, destructiveHint=false).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-organized: main purpose first, then details, examples, defaults. Slightly verbose but each sentence adds information. Could trim some repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers input parameters, behavior (credit consumption, time zone), and output fields (id, scenarioId, etc.) despite no formal output schema. Complete for a scheduling tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all parameters; description adds value with cron format explanation, examples, and default behavior (e.g., enabled defaults to true, name defaults to empty).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb (schedule), resource (OctoPerf scenario), and mechanism (recurring CRON expression). Distinguishes from siblings like schedule_scenario_once and scenario creation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides key context: credit consumption per fire, time zone handling, default enabled behavior. Lacks explicit comparison with one-time scheduling but context implies proper use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

schedule_scenario_onceSchedule scenario onceAInspect

[Scheduler] Schedule an OctoPerf Scenario to run ONCE at a specific date/time. Creates a ScheduledJob with a SimpleScheduledTrigger and an ExecuteScenario task. The scheduled run will consume credits at fire time (same cost as a manual run_scenario). runAt is an ISO-8601 datetime (e.g. 2026-06-15T03:00:00Z); UTC unless an offset is supplied. The job is enabled=true by default — pass enabled=false to register a paused job. Returns the new ScheduledJob id, scenarioId, name, triggerDescription (the ISO datetime), enabled flag, nextRun, and a url deep-link to the project scheduler page.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoHuman-readable job name. Defaults to empty.
runAtYesISO-8601 datetime when the scenario fires (e.g. 2026-06-15T03:00:00Z). UTC unless an offset is supplied.
enabledNoWhether the job is enabled. Defaults to true.
scenarioIdYesOctoPerf scenario id to schedule.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
enabledNo
nextRunNoTimestamp as epoch milliseconds.
scenarioIdNo
triggerDescriptionNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations show readOnlyHint=false, destructiveHint=false, openWorldHint=true. The description adds important behavioral context: credits consumed at fire time, default enabled=true, and return fields including id, scenarioId, name, triggerDescription, enabled, nextRun, and url. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single well-structured paragraph that front-loads the purpose, then provides trigger, credit, format, defaults, and return details. It is dense but clear, with no wasted words. Could be slightly more bulleted, but fine.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, 100% schema coverage, presence of output schema (described in return fields), and annotations, the description is fully complete. It covers scheduling behavior, credit cost, defaults, format, and return values, leaving no critical gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers all 4 parameters with descriptions (100% coverage). The description adds value by explaining `runAt` format/UTC default, `name` default, and `enabled` default. It also clarifies the ISO-8601 requirement beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'Schedules an OctoPerf Scenario to run ONCE at a specific date/time.' It uses specific verbs and resource, and the distinction from sibling tools like 'schedule_scenario_cron' (recurring) and 'run_scenario' (immediate) is obvious.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use this tool (one-time scheduling), provides details about trigger type, credit consumption, default enabled state, and ISO-8601 format. It does not explicitly list when-not-to-use or alternative tools, but the naming and sibling list make it sufficiently clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stop_bench_resultStop bench resultA
Destructive
Inspect

[Runtime] Stop an OctoPerf bench result that is still running (state in CREATED / PENDING / SCALING / PREPARING / INITIALIZING / RUNNING). DESTRUCTIVE — terminates the load test immediately on the load generators and transitions the state to ABORTED. Already-finished runs (FINISHED / ABORTED / ERROR) are no-ops. Use only after the user has confirmed the abort. Returns the BenchResult listing with the post-stop state.

ParametersJSON Schema
NameRequiredDescriptionDefault
benchResultIdYesOctoPerf benchResult id to stop.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
stateNo
createdNoTimestamp as epoch milliseconds.
regionsNo
samplingNo
scenarioIdNo
lastModifiedNoTimestamp as epoch milliseconds.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark destructiveHint=true, but the description adds detailed behavioral traits: terminates on load generators, transitions to ABORTED, and lists the exact states it affects. This goes beyond annotation info.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, well-structured, and front-loaded. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, annotations, and output schema presence, the description fully covers behavior, state transitions, and return value. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (single parameter documented). The description repeats 'OctoPerf benchResult id' without adding new semantics or constraints. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool stops a running bench result, lists the applicable states (CREATED, PENDING, etc.), and distinguishes behavior for already-finished runs. It uses specific verbs and resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use only after the user has confirmed the abort' and notes that finished runs are no-ops. While sibling tools are numerous, none are direct alternatives, so no comparison needed. Clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_bench_reportUpdate bench reportA
Destructive
Inspect

[Analysis] Update an OctoPerf BenchReport's editable metadata: name, description and tags. Partial — any parameter left null keeps its existing value. The items list (polymorphic widgets), configs set and benchResultIds are NOT changed by this tool; use patch_bench_report to restructure the report. Returns the updated report's id, name, description, benchResultIds, tags, lastModified and a url deep-link to the analysis page.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew report name. Leave null to keep the existing name.
tagsNoNew tag set (replaces the existing set). Leave null to keep the existing tags.
reportIdYesOctoPerf bench report id to update.
descriptionNoNew description. Leave null to keep the existing description.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
benchResultIdsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true, but the description adds detail: it only updates metadata, does not change other fields, and the return includes id, name, description, benchResultIds, tags, lastModified, and a deep-link URL. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences: first states purpose and fields, second explains partial update, exclusions, and return values. No unnecessary words, front-loaded with the most critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 params, partial update, return values), the description covers all necessary aspects: purpose, partial update behavior, what is not changed, and return fields. The output schema exists but the description still lists return values, making it complete for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with each parameter already documented (e.g., 'Leave null to keep the existing name'). The description reinforces the partial update concept but does not add new parameter-specific meaning beyond the schema. However, it provides helpful context summarized in one place, justifying a 4 rather than baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb ('Update'), resource ('OctoPerf BenchReport'), and scope ('editable metadata: name, description and tags'). It explicitly distinguishes from sibling tool 'patch_bench_report', which is used for restructuring. This provides clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains partial update behavior ('any parameter left null keeps its existing value') and explicitly states when NOT to use this tool (for items, configs, benchResultIds), directing to 'patch_bench_report' instead. This gives clear guidance on appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_http_serverUpdate HTTP serverA
Destructive
Inspect

[Design] MODIFIES an HTTP server's configuration in an OctoPerf design project. DESTRUCTIVE — the change applies to every HTTP request action of every Virtual User that references this server. Any field left null is kept unchanged. ipSpoofing is intentionally not exposed here; manage it via the OctoPerf GUI. Returns the updated server with authorizations stripped.

ParametersJSON Schema
NameRequiredDescriptionDefault
portNoNew TCP port. Null to keep the existing value.
hostnameNoNew hostname (without the scheme). Null to keep the existing value.
protocolNoNew protocol (HTTP or HTTPS). Null to keep the existing value.
serverIdYesOctoPerf HTTP server id to update.
projectIdYesOctoPerf project id the HTTP server belongs to.
connectTimeoutNoNew connect timeout in milliseconds. Null to keep the existing value.
responseTimeoutNoNew response timeout in milliseconds. Null to keep the existing value.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
portNo
userIdNo
hostnameNo
protocolNo
projectIdNo
ipSpoofingNo
authorizationsNo
connectTimeoutNo
responseTimeoutNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint=true, but the description adds critical behavioral details: the change applies to every HTTP request action of every Virtual User referencing the server, null fields are kept unchanged, ipSpoofing is intentionally excluded, and the response has authorizations stripped. These significantly expand transparency beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences with zero wasted words. It front-loads the verb and destructive label, then efficiently covers the global effect, null behavior, exclusion note, and return value. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 params, mutation with output schema), the description covers the operation, destructive effect, null behavior, a missing feature (ipSpoofing), and return details. It lacks information on potential errors, rate limits, or prerequisites (e.g., server existence), but these are less critical for a well-documented mutation tool with a schema and annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, with each parameter describing its purpose and the null-keep behavior. The description restates this null-keep behavior globally but adds no new per-parameter semantics. Thus, it meets the baseline without adding extra value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it modifies an HTTP server's configuration in an OctoPerf design project. It uses specific and distinct verbs ('MODIFIES') and resource ('HTTP server'), and the destructive nature distinguishes it from read-only or creation tools. Among siblings like delete_http_server, this is the only update tool for HTTP servers, providing clear purpose differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly notes that `ipSpoofing` is intentionally not exposed and should be managed via the OctoPerf GUI, providing a clear when-not-to-use guidance. However, it does not explicitly compare with sibling tools like delete_http_server for when to update vs delete, nor does it state prerequisites like the server must exist, which is implicitly understood.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_projectUpdate projectA
Destructive
Inspect

[Project] Update an OctoPerf design project's editable metadata: name, description and tags. Partial — any parameter left null keeps its existing value. Workspace membership is not changed by this tool (a dedicated move endpoint handles that). Returns the updated project's id, workspaceId, name, description and a url deep-link to the project page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew project name. Leave null to keep the existing name.
tagsNoNew tag set (replaces the existing set). Leave null to keep the existing tags.
projectIdYesOctoPerf project id to update.
descriptionNoNew description. Leave null to keep the existing description.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
descriptionNo
workspaceIdNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate destructiveHint=true, and the description confirms modification of metadata. It adds context: partial update semantics, return fields including a url deep-link. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, each with clear purpose: purpose, partial update, what it doesn't do, what it returns. No unnecessary words. Front-loaded with the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given full schema coverage and an output schema, the description covers purpose, behavior (partial update), exclusions (workspace change), and return fields. Complete for a project update tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds crucial information: that null parameters retain existing values. This clarifies the behavior beyond the schema's parameter descriptions, which only state 'Leave null to keep existing' for each field. Also lists return fields, adding context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description starts with a specific verb+resource ('Update an OctoPerf design project's editable metadata') and lists the editable fields (name, description, tags). It distinguishes from siblings by explicitly stating that workspace membership is handled by another tool, clarifying scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states partial update behavior ('any parameter left null keeps its existing value') and what the tool does not do ('Workspace membership is not changed...'), guiding appropriate use. Could mention when to prefer this over other update tools (e.g., patch_scenario), but overall clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_scenarioUpdate scenarioA
Destructive
Inspect

[Runtime] Update an OctoPerf Scenario's editable metadata: name, description and tags. Partial — any parameter left null keeps its existing value. The userProfiles list (load shapes, engine settings, VU bindings) is NOT changed by this tool; use patch_scenario to edit the test configuration. Returns the updated scenario's id, projectId, name, description, and a url deep-link to the scenario page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew scenario name. Leave null to keep the existing name.
tagsNoNew tag set (replaces the existing set). Leave null to keep the existing tags.
scenarioIdYesOctoPerf scenario id to update.
descriptionNoNew description. Leave null to keep the existing description.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
projectIdNo
descriptionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructive nature (destructiveHint: true). Description adds value by detailing partial update semantics and clarifying that userProfiles are untouched. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, each purposeful: purpose, partial update, disambiguation, return values. No redundant wording.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With annotations and output schema present, description covers all necessary aspects: what is updated, partial update, what is excluded (userProfiles), and return value structure. Complete for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 4 parameters with descriptions (100% coverage). Description reiterates partial update behavior but adds minimal new information beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb (Update), resource (OctoPerf Scenario), and specific editable fields (name, description, tags). Distinguishes from sibling patch_scenario by explicitly noting that userProfiles are not changed.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states what the tool does and does not do, and directs to patch_scenario for test configuration. Also clarifies partial update behavior (null keeps existing).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_virtual_userUpdate virtual userA
Destructive
Inspect

[Design] Update an OctoPerf Virtual User's editable metadata: name, description and tags. Partial — any parameter left null keeps its existing value. The action tree (children) is NOT changed by this tool; use patch_virtual_user to edit the tree. Returns the updated VU's id, name, description, tags, timestamps and a url deep-link to the Virtual User page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew VU name. Leave null to keep the existing name.
tagsNoNew tag set (replaces the existing set). Leave null to keep the existing tags.
descriptionNoNew description. Leave null to keep the existing description.
virtualUserIdYesOctoPerf Virtual User id to update.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idNo
urlNo
nameNo
tagsNo
createdNoTimestamp as epoch milliseconds.
descriptionNo
lastModifiedNoTimestamp as epoch milliseconds.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate destructiveHint=true (mutation), but the description adds context: it's a partial update, no changes to the action tree, and returns a deep-link URL. This goes beyond the annotation's binary signal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no fluff. The key action, partial behavior, and sibling distinction are all front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description adequately covers purpose, usage, partial update details, and return fields including timestamps and URL. No gaps given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers 100% of parameters with descriptions. The description reinforces the null-means-keep pattern and clarifies the scope of changes (metadata only), adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it updates 'editable metadata: name, description and tags' for a Virtual User, using a specific verb ('Update') and resource. It distinguishes from sibling patch_virtual_user by noting that the action tree is not changed, clearly differentiating purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: partial updates (null keeps existing value) and directs to use patch_virtual_user for editing the action tree. This gives clear when-to-use and when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upload_jmx_virtual_userUpload JMX virtual userAInspect

[Design] Mint a presigned URL to import a JMeter JMX file into an OctoPerf project. Returns the URL to POST the JMX file to directly (bypassing the MCP server for the bytes), valid for ~5 minutes. The single-use token is consumed on the first POST. The direct POST returns a raw Virtual User Action array with no UI deep-links — chain into describe_virtual_user with each entry's id to obtain the compact listing and the url to the Virtual User page in the OctoPerf web UI.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id where the JMX will be imported.
resourcesNoHTML resources handling: KEEP_ALL or REMOVE. Defaults to REMOVE.
adBlockingNoAd blocking: ENABLED or DISABLED. Defaults to DISABLED.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlNo
methodNo
expiresAtNoTimestamp as epoch milliseconds.
contentTypeNo
instructionsNo
fileFieldNameNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=false and destructiveHint=false, but the description adds crucial behavioral context: the tool returns a presigned URL rather than uploading directly, the URL is temporary (~5 minutes), the token is single-use, and the direct POST response is a raw Virtual User Action array with no UI deep-links. This exceeds what annotations alone convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the key action and mechanism. Every sentence adds value without redundancy. It is appropriately concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step presigned URL upload), the description is complete: it explains the entire flow, the response format, the constraints (time and token limits), and the recommended follow-up tool (describe_virtual_user). The output schema exists but is not provided; the description covers what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described in the schema. The description does not add significant new meaning beyond the schema parameter descriptions (e.g., projectId, resources, adBlocking are already explained). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Mint a presigned URL to import a JMeter JMX file into an OctoPerf project.' It distinguishes from sibling import tools like import_har_virtual_user, import_playwright_virtual_user, etc., and from other upload tools like upload_project_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: it explains the presigned URL flow, that the direct POST bypasses the MCP server, the 5-minute validity, the single-use token, and the follow-up action to call describe_virtual_user. However, it does not explicitly state when not to use this tool or compare it to alternatives like upload_project_file or other import methods.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upload_project_fileUpload project fileAInspect

[Design] Mint a presigned URL to upload a file (typically a CSV used for Virtual User parameterization) to an OctoPerf design project. Returns the URL to POST the file to directly (bypassing the MCP server for the bytes), valid for ~5 minutes. The single-use token is consumed on the first POST. Overwrites any existing file with the same name.

ParametersJSON Schema
NameRequiredDescriptionDefault
projectIdYesOctoPerf project id where the file will be uploaded.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlNo
methodNo
expiresAtNoTimestamp as epoch milliseconds.
contentTypeNo
instructionsNo
fileFieldNameNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=false, destructiveHint=false), the description adds critical behavioral details: the token is single-use, consumed on first POST, overwrites existing files, and the URL bypasses MCP server. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences) and well-structured. The first sentence establishes purpose and typical use, the second details behavior and constraints. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, an output schema exists (so return values don't need description), and the description covers purpose, mechanism, constraints, and side effects. It is sufficiently complete for correct selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter projectId, with a clear schema description. The tool description minimally adds context ('design project'), but does not provide further semantic or format details beyond what the schema already conveys.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Mint a presigned URL to upload a file') and the resource (file to OctoPerf design project). It distinguishes from sibling tools like download_project_file and delete_project_file by specifying the upload mechanism and typical CSV use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the use case (uploading CSV for Virtual User parameterization) and the behavior (bypass MCP server, 5-minute validity, single-use token). It does not explicitly mention alternatives or when not to use, but the context of sibling tools implies distinctions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_virtual_userValidate virtual userAInspect

[Runtime] STARTS a functional validation of an OctoPerf Virtual User. Spins up a single load generator on the given Docker provider/region, runs the VU once (or a few iterations), and returns a BenchValidationStatus (benchResultId, initial state, finished flag — true once the run reached FINISHED / ABORTED / ERROR). Poll get_virtual_user_validation (or get_bench_status for progress) to watch completion. Unlike run_scenario, validation does not consume any credits.

ParametersJSON Schema
NameRequiredDescriptionDefault
locationYesProvider region/location name (e.g. `eu-west-1`).
projectIdYesOctoPerf project id the Virtual User belongs to.
iterationsNoNumber of iterations to run. Defaults to 1, capped server-side (5 on SaaS); values above the cap are rejected.
providerIdYesDocker provider id (use `list_docker_providers_by_workspace` to discover).
virtualUserIdYesOctoPerf Virtual User id to validate.

Output Schema

ParametersJSON Schema
NameRequiredDescription
stateNo
finishedNo
benchResultIdNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behaviors: it spins up a load generator, runs the VU once or a few iterations, and returns a validation status. It also mentions it's non-destructive and does not consume credits, which adds value beyond the annotations (readOnlyHint=false, destructiveHint=false). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the main action. Each sentence adds useful information, though it could be slightly more concise. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (polling needed, output schema with BenchValidationStatus), the description covers the essential points: how to start, what results to expect, and how to monitor progress. It doesn't fully detail the output schema but references key fields, which is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description adds significant value: it explains that iterations default to 1 and are capped server-side (5 on SaaS), and how to discover providerId using 'list_docker_providers_by_workspace'. This goes beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it 'STARTS a functional validation of an OctoPerf Virtual User' with a specific verb and resource. It distinguishes itself from the sibling tool 'run_scenario' by noting that validation does not consume credits, helping an agent choose correctly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the typical use case (functional validation before full scenario run) and provides explicit guidance to poll 'get_virtual_user_validation' or 'get_bench_status' for progress. However, it does not explicitly state when not to use it or list all alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources