beam
Run parallel candidate agents, score them with a cheap evaluator, and select the top result. Optimizes for best output under budget constraints.
Instructions
Sample N candidates in parallel, score each, commit to the top-1.
The simplest search combinator: proposes width candidates via par,
scores each with a cheap haiku evaluator, and returns the highest-scoring
ref. Losing candidates are preserved on the winner's search.alternatives
field (unless keep_losers is false). This is self-consistency /
majority-vote with arbitrary scoring — the same shape as the governor
beam search, but applied to arbitrary agent output.
Evaluator forms:
score:<criterion>— direct haiku call, returns a float in [0, 1] plus a reason string. Use for rubric-style scoring.validate:<type>— runs the type validator; VALID=1.0, PARTIAL=0.5, INVALID=0.0. Use when the acceptance criterion is a registered type.
Budget semantics: a hard cap on total proposer spend. If exceeded, the
winner's search stamp records prune_reason="budget exhausted" but the
result is still returned — best-effort rather than abort. Evaluator cost
is not counted against budget for phase 1; evaluators are already
constrained to haiku.
Anti-pattern: the Tree Search paper flags evaluator-as-expensive-as-proposer as a non-starter. This combinator hardcodes haiku for scoring — if you need a stronger evaluator, lift that logic into a governor instead.
Args:
prompt: Task prompt sent to every candidate agent.
width: Number of parallel candidates (default: 3).
evaluator: Scoring directive. Must start with score: or validate:.
sandbox: Named sandbox spec or inline JSON for candidate agents.
model: Candidate agent model (default: sonnet — the proposer).
timeout: Per-candidate timeout in seconds.
mcps: JSON array of MCP server names attached to candidates.
keep_losers: Preserve losing candidates on winner search stamp
(default: true — useful for inspection + future step-lookahead).
budget: Total USD cap on proposer cost. Best-so-far semantics on breach.
max_concurrency: Upper bound on concurrent candidate agents.
Returns:
JSON with run_id, winner ref (search-stamped), scores, and
total_cost. If all candidates scored 0, error is populated.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | ||
| width | No | ||
| evaluator | No | score:overall quality, rigour, and correctness | |
| sandbox | No | ||
| model | No | sonnet | |
| timeout | No | ||
| mcps | No | ||
| keep_losers | No | ||
| budget | No | ||
| max_concurrency | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |