Skip to main content
Glama

check_success

Evaluate a campaign's success contract by comparing the latest metric value to the threshold, returning whether the contract is met along with iterations used.

Instructions

Evaluate whether a campaign has met its typed success contract.

A campaign's success contract (success_metric, benchmark_command, scope, max_iterations) is set at creation time by templates like autoresearch. This tool gives the planner / coordinator a single scalar decision:

  • met=True — contract satisfied; coordinator should mark the campaign done.

  • met=False — keep iterating, or escalate if budget exhausted.

  • iterations_used — count of DONE atomic steps on this campaign so far. Useful for comparing against max_iterations.

  • metric_value — the most recently recorded metric from campaign_notes tagged metric:<success_metric> (best-effort parse; see below).

How the metric is discovered. The runner / worker agents emit a note of the form::

add_note(campaign_id, f"metric={value}", tags=["metric:<name>"])

on every benchmark run. check_success scans the most recent such note, extracts the float after =, and compares it against success_metric_threshold if set via steer_campaign(strategy=...). In v0.3 we only surface the most-recent value — the planner decides whether it's "good enough". A future revision may wire a numeric threshold column.

Args: id: Campaign UUID.

Returns: {met, metric_value, iterations_used, max_iterations, success_metric, notes_checked, reason}

``reason`` is a short string explaining the decision — useful
both for humans looking at logs and for the planner's own
chain-of-thought.

A campaign without a success contract returns {met: False, reason: "no_success_metric_configured"}.

Next: if met is True, call cancel_campaign(id) or steer_campaign(id, "wrap up"). If False and iterations_used >= max_iterations, escalate via the notification channel.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
idYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses the tool's behavior: it scans notes for the metric, extracts the value, compares against the threshold, and notes limitations (only most-recent value, future revisions). This exceeds typical transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections, bullet points, and clear examples. Every sentence adds value, explaining the metric discovery, return fields, and next steps. It is comprehensive without being overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single parameter) and presence of an output schema (though not shown), the description fully covers the return values, decision logic, and context for the coordinator. It leaves no gaps for a standard use case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero description coverage for the single parameter 'id'. The description adds 'Campaign UUID', which is minimally sufficient but clear. It compensates for the schema's lack of documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool evaluates whether a campaign has met its success contract, with a specific verb 'check' and resource 'success contract'. It distinguishes itself from sibling tools like get_campaign or steer_campaign by focusing on the scalar decision for the coordinator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool (to check contract satisfaction before marking done or iterating) and what to do next (call cancel_campaign or steer_campaign if met, escalate if iterations exhausted). It also explains the context for the planner.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/retospect/sortie-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server