Skip to main content
Glama

iterate

Run an agent repeatedly, using evaluator feedback from each attempt to correct outputs until a success threshold is reached or time/budget constraints trigger a halt.

Instructions

Iteratively refine an agent's output until an evaluator is satisfied.

Runs the agent up to max_iterations times. Each iteration sees the prior attempts' outputs, scores, and issues injected into its prompt, so the agent can correct the specific shortcomings the evaluator called out on the last pass. Halts when any of these become true:

  • Evaluator score >= success_threshold (success).

  • patience iterations pass with no improvement to the best score.

  • Total agent cost exceeds max_budget (best-so-far).

  • Wall-time exceeds max_wall_time seconds (best-so-far).

  • max_iterations reached (best-so-far).

Evaluator forms (pass exactly one of target_type or evaluator):

  • target_type="some-type" — shorthand for evaluator="validate:some-type". LLM validator against a registered type. Best for text artifacts whose correctness is a matter of shape and content.

  • evaluator="validate:<type>" — explicit form of the above.

  • evaluator="exec:<shell cmd>" — ground-truth executor. The agent's output is written to a tempfile and the command runs with $ARTIFACT set to the path (and {artifact} substituted in the template). Exit 0 scores 1.0; non-zero scores 0.0 with stderr parsed into issues. Use for artifacts with compile/build/test semantics (docker build, pytest, etc.) — this is the only way to get ground-truth feedback.

  • evaluator="score:<criterion>" — ad-hoc LLM rubric scoring via haiku.

Args: prompt: Base task description sent to the agent on iteration 1; on subsequent iterations it is augmented with a "Prior attempts" section summarising previous outputs, scores, and issues. target_type: Shorthand for evaluator="validate:<target_type>". Also injects the type as output_type context on the agent prompt. evaluator: Full evaluator directive (validate:, exec:, or score:). Takes precedence over target_type if both given. max_iterations: Hard cap on iteration count (default: 10). success_threshold: Score at or above which we declare success (default: 0.9). Must be in [0.0, 1.0]. patience: Halt if patience consecutive iterations fail to improve on the best score so far (default: 3). max_budget: Optional USD cap on total agent (proposer) cost. Evaluator cost is not counted. Halts best-so-far on breach. max_wall_time: Optional wall-time cap in seconds. Halts best-so-far on breach. sandbox: Named sandbox spec or inline JSON for the agent. model: Agent (proposer) model (default: sonnet). timeout: Per-iteration agent timeout in seconds. mcps: JSON array of MCP server names attached to the agent.

Returns: JSON with run_id, iterations, halted_because, best_iteration, best_ref (validated-stamped), best_score, total_cost, and the full attempts trace with per-iteration ref / score / issues / cost.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYes
target_typeNo
evaluatorNo
max_iterationsNo
success_thresholdNo
patienceNo
max_budgetNo
max_wall_timeNo
sandboxNo
modelNosonnet
timeoutNo
mcpsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It thoroughly covers the iterative behavior, halting conditions (success threshold, patience, budget, wall time, max iterations), evaluator forms, and how prior attempts are injected. This is comprehensive and transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections for general behavior, evaluator forms, and arguments. It is front-loaded with the main purpose. While somewhat lengthy, it earns its sentences given the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite 12 parameters and 1 required, the description covers all aspects: behavior, halting conditions, evaluator variants, parameter details, and return value structure (output schema exists). It is complete and leaves no major gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the schema lacks parameter descriptions. The description compensates fully with detailed 'Args' section explaining each parameter, including defaults and behavior. This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Iteratively refine an agent's output until an evaluator is satisfied.' It explains the iterative process and distinguishes it from siblings like chain or map by emphasizing evaluator-based refinement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool (e.g., for iterative refinement with an evaluator) and describes evaluator forms and halting conditions. However, it does not explicitly state when not to use it or provide alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stiege/swarm-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server