Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
runA

Run a single Claude agent in a Docker container. Returns the agent's text output and metadata.

Args: prompt: The task prompt for the agent. sandbox: Named sandbox spec (from ~/.claude/sandboxes/) or inline JSON. Overrides below are merged on top. network: Whether the container has network access (default: true — needed for API calls). tools: Comma-separated list of allowed Claude tools (default: Read,Write,Glob,Grep,Bash). mounts: JSON array of mount specs: [{"host_path": "...", "container_path": "...", "readonly": true}]. model: Claude model to use (default: sonnet). Options: haiku, sonnet, opus. timeout: Max execution time in seconds (default: 120). system_prompt: System prompt injected via --system-prompt (role, persona, instructions). claude_md: Project instructions written to workspace CLAUDE.md. output_schema: JSON schema string for structured output (--json-schema). mcps: JSON array of MCP server names to attach: ["database-mcp", "whatsapp"]. effort: Effort level: low, medium, high, max. max_budget: Explicit USD budget cap. env_vars: JSON object of environment variables: {"KEY": "value"}. input_files: JSON object of files to inject: {"/path": "content"}. memory: Docker memory limit (e.g. "2g"). cpus: Docker CPU limit (e.g. 2.0). gpu: Pass --gpus all to Docker for GPU access (default: false). Acquires the "gpu" resource pool (capacity 1). resources: JSON array of named resource pools to acquire before execution (e.g. '["gpu", "database"]'). Agents wait for all resources. Configure capacity via SWARM_RESOURCE_= env vars. input_type: Natural language type describing what the agent receives (e.g. "research notes", "[code-review]"). output_type: Natural language type describing what the agent must produce (e.g. "[mcp-server] with [test-suite]").

parA

Run multiple Claude agents in parallel. Each task can have its own config.

Args: tasks: JSON array of task objects. Each supports all sandbox fields (prompt, model, tools, sandbox, system_prompt, claude_md, output_schema, mcps, effort, etc.). max_concurrency: Max agents running simultaneously (default: 5).

mapA

Apply a prompt template to each input in parallel. Use {input} as the placeholder.

Args: prompt_template: Prompt template with {input} placeholder(s). inputs: JSON array of input strings: ["input1", "input2", ...]. sandbox: Named sandbox spec or inline JSON. network: Whether containers have network access (default: true — needed for API calls). tools: Comma-separated list of allowed Claude tools. model: Claude model to use (default: sonnet). timeout: Max execution time per agent in seconds (default: 120). max_concurrency: Max agents running simultaneously (default: 5). system_prompt: System prompt injected via --system-prompt. claude_md: Project instructions written to workspace CLAUDE.md. output_schema: JSON schema string for structured output. mcps: JSON array of MCP server names to attach. effort: Effort level: low, medium, high, max.

chainA

Run agents sequentially as a pipeline. Each stage receives the prior stage's output as context.

Args: stages: JSON array of stage objects. Each supports all sandbox fields (prompt, model, tools, sandbox, system_prompt, etc.).

reduceA

Synthesise multiple results into one. Accepts plain strings or structured AgentResult objects (auto-extracts .text fields), so you can pipe par/map output directly without manual unwrapping.

Args: results: JSON array — either plain strings ["text1", "text2"] or AgentResult objects [{"text": "...", ...}]. synthesis_prompt: Instructions for how to synthesise the results. sandbox: Named sandbox spec or inline JSON. network: Whether the container has network access (default: true — needed for API calls). tools: Comma-separated list of allowed Claude tools. model: Claude model to use (default: sonnet). timeout: Max execution time in seconds (default: 120). mcps: JSON array of MCP server names to attach to the reducer agent. system_prompt: System prompt for the reducer agent.

map_reduceA

Map a prompt over inputs in parallel, then reduce results into one — all in a single call. Fan-out then synthesise: map produces N results, reduce consumes them, no manual plumbing.

Args: prompt_template: Prompt template with {input} placeholder(s). inputs: JSON array of input strings: ["input1", "input2", ...]. synthesis_prompt: Instructions for how to synthesise the map results. sandbox: Named sandbox spec or inline JSON (used for map agents). network: Whether containers have network access (default: true — needed for API calls). tools: Comma-separated list of allowed Claude tools for map agents. model: Claude model for map agents (default: sonnet). reduce_model: Claude model for the reduce agent (default: same as model). timeout: Max execution time per agent in seconds (default: 120). max_concurrency: Max map agents running simultaneously (default: 5). system_prompt: System prompt for map agents. reduce_system_prompt: System prompt for the reduce agent. output_schema: JSON schema for structured reduce output. mcps: JSON array of MCP server names for map agents. effort: Effort level for map agents.

unwrapA

Unwrap an agent result ref — writes the full text to a file and returns the path.

All combinators return refs (metadata without text). Use unwrap to extract the text when you need it. The text is written to a .md file alongside the result, so you can Read() it, Grep it, or pass it to other tools without bloating the MCP protocol.

Args: ref: A ref string like "run_id/agent_id", or a JSON object with a "ref" field.

inspectA

Inspect an agent's full execution state — partial output, stream log, files produced.

Use after a timeout, crash, or unexpected result to understand what happened. Writes a human-readable debug report to output_dir/inspect.md.

Args: ref: A ref string like "run_id/agent_id".

filterA

Filter refs by type validation — keep only results that match the declared type.

Runs validate on each ref in parallel. Returns only refs with VALID verdict. This is the type-gated composition primitive: ensures only correct results flow downstream.

Args: refs: JSON array of ref objects: [{"ref": "run_id/agent_id"}, ...]. declared_type: Type name or description to validate against. model: Model for the validator agents (default: sonnet). timeout: Timeout per validation (default: 120).

raceA

Run multiple approaches in parallel, return the first to succeed.

All tasks start simultaneously. As soon as one completes without error, its ref is returned. Remaining tasks are abandoned (their containers are killed). Use for speculative execution or when multiple strategies might work.

Args: tasks: JSON array of task objects (same format as par). max_concurrency: Max agents running simultaneously (default: 5).

retryA

Run a single agent with automatic retries on failure.

If declared_type is set, retries until the output validates as that type (not just until exit code 0). Each attempt receives the prior error as context.

Args: prompt: The task prompt. max_attempts: Maximum number of attempts (default: 3). sandbox: Named sandbox spec or inline JSON. model: Claude model (default: sonnet). timeout: Timeout per attempt (default: 120). declared_type: If set, validates output and retries if not VALID. mcps: JSON array of MCP server names to attach.

beamA

Sample N candidates in parallel, score each, commit to the top-1.

The simplest search combinator: proposes width candidates via par, scores each with a cheap haiku evaluator, and returns the highest-scoring ref. Losing candidates are preserved on the winner's search.alternatives field (unless keep_losers is false). This is self-consistency / majority-vote with arbitrary scoring — the same shape as the governor beam search, but applied to arbitrary agent output.

Evaluator forms:

  • score:<criterion> — direct haiku call, returns a float in [0, 1] plus a reason string. Use for rubric-style scoring.

  • validate:<type> — runs the type validator; VALID=1.0, PARTIAL=0.5, INVALID=0.0. Use when the acceptance criterion is a registered type.

Budget semantics: a hard cap on total proposer spend. If exceeded, the winner's search stamp records prune_reason="budget exhausted" but the result is still returned — best-effort rather than abort. Evaluator cost is not counted against budget for phase 1; evaluators are already constrained to haiku.

Anti-pattern: the Tree Search paper flags evaluator-as-expensive-as-proposer as a non-starter. This combinator hardcodes haiku for scoring — if you need a stronger evaluator, lift that logic into a governor instead.

Args: prompt: Task prompt sent to every candidate agent. width: Number of parallel candidates (default: 3). evaluator: Scoring directive. Must start with score: or validate:. sandbox: Named sandbox spec or inline JSON for candidate agents. model: Candidate agent model (default: sonnet — the proposer). timeout: Per-candidate timeout in seconds. mcps: JSON array of MCP server names attached to candidates. keep_losers: Preserve losing candidates on winner search stamp (default: true — useful for inspection + future step-lookahead). budget: Total USD cap on proposer cost. Best-so-far semantics on breach. max_concurrency: Upper bound on concurrent candidate agents.

Returns: JSON with run_id, winner ref (search-stamped), scores, and total_cost. If all candidates scored 0, error is populated.

iterateA

Iteratively refine an agent's output until an evaluator is satisfied.

Runs the agent up to max_iterations times. Each iteration sees the prior attempts' outputs, scores, and issues injected into its prompt, so the agent can correct the specific shortcomings the evaluator called out on the last pass. Halts when any of these become true:

  • Evaluator score >= success_threshold (success).

  • patience iterations pass with no improvement to the best score.

  • Total agent cost exceeds max_budget (best-so-far).

  • Wall-time exceeds max_wall_time seconds (best-so-far).

  • max_iterations reached (best-so-far).

Evaluator forms (pass exactly one of target_type or evaluator):

  • target_type="some-type" — shorthand for evaluator="validate:some-type". LLM validator against a registered type. Best for text artifacts whose correctness is a matter of shape and content.

  • evaluator="validate:<type>" — explicit form of the above.

  • evaluator="exec:<shell cmd>" — ground-truth executor. The agent's output is written to a tempfile and the command runs with $ARTIFACT set to the path (and {artifact} substituted in the template). Exit 0 scores 1.0; non-zero scores 0.0 with stderr parsed into issues. Use for artifacts with compile/build/test semantics (docker build, pytest, etc.) — this is the only way to get ground-truth feedback.

  • evaluator="score:<criterion>" — ad-hoc LLM rubric scoring via haiku.

Args: prompt: Base task description sent to the agent on iteration 1; on subsequent iterations it is augmented with a "Prior attempts" section summarising previous outputs, scores, and issues. target_type: Shorthand for evaluator="validate:<target_type>". Also injects the type as output_type context on the agent prompt. evaluator: Full evaluator directive (validate:, exec:, or score:). Takes precedence over target_type if both given. max_iterations: Hard cap on iteration count (default: 10). success_threshold: Score at or above which we declare success (default: 0.9). Must be in [0.0, 1.0]. patience: Halt if patience consecutive iterations fail to improve on the best score so far (default: 3). max_budget: Optional USD cap on total agent (proposer) cost. Evaluator cost is not counted. Halts best-so-far on breach. max_wall_time: Optional wall-time cap in seconds. Halts best-so-far on breach. sandbox: Named sandbox spec or inline JSON for the agent. model: Agent (proposer) model (default: sonnet). timeout: Per-iteration agent timeout in seconds. mcps: JSON array of MCP server names attached to the agent.

Returns: JSON with run_id, iterations, halted_because, best_iteration, best_ref (validated-stamped), best_score, total_cost, and the full attempts trace with per-iteration ref / score / issues / cost.

score_listA

Score and rank a list of existing refs against an evaluator.

This is the missing piece for "generate N candidates, then pick the best" pipelines where the N candidates already exist as refs from an upstream combinator (map, par, etc.) — beam is the wrong shape because it fans out width-N proposers of the same prompt, whereas score_list takes N different outputs and ranks them.

The key infrastructure point: refs are resolved to their artifact text server-side, so callers do not need to pipe full artifact text through tool parameters. This clears the ~4KB param-size wall you would otherwise hit scoring 6+ medium-length artifacts through map.

Evaluator forms are the same as iterate / beam:

  • validate:<type> — LLM validator against a registered type

  • score:<criterion> — ad-hoc haiku rubric

  • exec:<shell-cmd> — ground-truth shell command (exit code scoring)

Each ref is tagged with a search stamp carrying its score and beam_rank. Refs outside the top-k are marked pruned=True with a reason pointing at the beam cut. top_k=0 means return all without pruning.

Args: refs: JSON array of refs — either ["run_id/agent_id", ...] or [{"ref": "run_id/agent_id"}, ...]. Both forms are accepted. evaluator: Scoring directive (validate:<type>, score:<criterion>, or exec:<cmd>). top_k: How many top-scoring refs to surface as winners. 0 means rank-only, no pruning stamp applied. max_concurrency: Upper bound on parallel evaluator calls (default: 5).

Returns: JSON with run_id, evaluator, total, top_k, winners (list of winning ref strings), and ranked (the full per-ref trace: rank, ref, score, verdict, issues, reason).

huntA

Run multiple (fetch → map → score) hunt strategies in parallel and merge.

Each strategy is an independent (fetch, plan, score) triple with its own seed prompt or explicit seeds, plan template, and rubric. Strategies run concurrently; a failure in one does not halt the others. All scored refs across all strategies are merged into a single globally-ranked leaderboard so you can see which strategy produced the highest-yield finding.

This is the compound combinator that makes "run all the hunt framings at once" tractable — rather than sequentially trying one hunt shape at a time, you parallelise across framings (problem-driven, gap-in-field, broken-claims, tool-landscape, connection) and let the scoring sort them.

Strategy dict fields:

  • name (required) — short label used for tagging and per-strategy reporting in the leaderboard.

  • seeds (optional) — pre-supplied list of seed strings (each becomes a map input). Exactly one of seeds or fetcher_prompt must be set.

  • fetcher_prompt (optional) — prompt sent to a single run call that must emit a JSON array of seed strings (or a JSON object with a seeds key). The agent runs with network enabled.

  • planner_template (required) — prompt template for the map step. Use {input} as the placeholder for each seed.

  • rubric (required) — evaluator directive for the score step. Any form accepted by _evaluate_node: validate:<type>, score:<criterion>, or exec:<cmd>.

  • top_k (optional, default 3) — how many winners this strategy contributes to the unified leaderboard.

  • model_planner (optional, default "haiku") — model for the map step.

  • model_fetcher (optional, default "haiku") — model for the fetch step.

  • timeout (optional) — per-agent timeout in seconds.

Args: strategies: JSON array of strategy dicts (or a Python list). max_parallel_strategies: Upper bound on strategies running at once (default: 5). Each strategy internally parallelises its map step. top_k_global: Size of the unified leaderboard across all strategies (default: 10).

Returns: JSON with run_id, strategies (per-strategy summary including name, error-if-any, total_cost, winner_ref, best_score), and leaderboard (globally-ranked list tagged by strategy). The full per-ref scoring trace lives at strategy_details[i].ranked.

guardA

Check a monadic condition on a ref. Returns the ref if the guard passes, error if not.

Use to enforce constraints before passing refs to downstream combinators.

Args: ref: A ref string or JSON object. check: The guard to check — one of: "validated", "budget", "classification", "encrypted", "exists". value: Required for some checks — e.g. the type name for "validated", the classification level for "classification".

classifyA

Set the classification level on a ref. Controls which MCPs can access the data.

Use for data sensitivity enforcement — e.g. mark original legal documents as 'confidential' (no WhatsApp MCP), mark synthetic outputs as 'public'.

Args: ref: A ref string or JSON object. level: Classification level: public, internal, confidential, restricted. allowed_mcps: JSON array of MCP names allowed to access this ref. denied_mcps: JSON array of MCP names denied access.

encryptA

Encrypt a ref's text payload. Returns the ref with a key_id — only callers with the key can decrypt.

The ref metadata (provenance, classification, etc.) stays visible; only the text content is encrypted. Pass the key_id to specific agents or features that should be able to read the content.

Args: ref: A ref string like "run_id/agent_id", or a JSON object with a "ref" field.

decryptA

Decrypt an encrypted ref's text. Writes the plaintext to output.md and returns the path.

You need the key_id that was returned when the ref was encrypted.

Args: ref: A ref string like "run_id/agent_id", or a JSON object with a "ref" field. key_id: The key ID returned by the encrypt tool.

save_governor_specA

Register an LLM-governed governor for use in pipeline control flow.

Governors are evaluated at trigger points (on_fail, on_success) to decide the continuation. The spec is a natural language description of what the governor should decide. The governor LLM returns one of:

  • next — proceed normally

  • jump(target) — jump to a named step

  • halt — stop the pipeline cleanly

  • broken(reason) — stop and write broken status (visible via pipeline_status)

  • patch_pipeline — deep-merge patch the pipeline definition and continue

Each continuation also carries a free-form context dict that accumulates across the pipeline and is written to /shared/governor-context.json, plus a confidence score in [0.0, 1.0].

Reference a governor in a pipeline step: "on_fail": {"governor": "Failure"} "on_success": {"governor": "Validation"}

Args: name: Unique governor name used to reference it from pipeline steps. spec: Natural language description telling the LLM what to decide. description: One-line summary shown in list_governor_specs. model: Claude model for evaluation (default: haiku). beam_width: Self-consistency beam width. When >1 the harness samples the governor N times in parallel and commits to the confidence-weighted majority decision. Losing candidates are preserved on the winner's alternatives field for inspection. Default 1 (no beam).

list_governor_specsA

List all registered LLM-governed governors.

Returns each governor's name, description, model, and a preview of its spec.

pipelineA

Launch a pipeline in the background and return immediately.

The pipeline runs asynchronously in a daemon thread. Use pipeline_status(run_id) to poll progress, and pipeline_kill(run_id) to stop it.

The definition is a JSON object or a pipeline name (loaded from registered project pipelines/ directories or ~/.claude/pipelines/).

Pipeline format: { "name": "optional-name", "sandbox": "optional-default-sandbox", "steps": [ {"id": "step-0", "prompt": "...", "model": "sonnet", "sandbox": "...", ...}, {"id": "test", "prompt": "Run tests", "tools": "Bash", "on_fail": "fix"}, {"id": "fix", "prompt": "Fix failing tests", "tools": "Read,Edit,Bash", "condition": "prev.error", "next": "test", "max_retries": 3} ] }

Step fields: prompt (required), plus any sandbox fields (model, tools, system_prompt, etc.). Control flow: on_fail (step id to jump to on error, or {"governor": "name"} for LLM-governed recovery — see save_governor_spec), on_success ({"governor": "name"} for LLM-governed continuation), next (jump after success), condition ("prev.error" = only run if previous failed), max_retries, retry_if ({target_step: keyword} — jump if output contains keyword). Any unhandled failure terminates the pipeline with status="broken".

Args: definition: Pipeline name (loaded from ~/.claude/pipelines/.json) or inline JSON definition. resume: Resume a previous run. Format: "run_id" or "run_id/step_id". Reuses the shared directory from the previous run. If step_id is given, skips to that step. If only run_id, resumes from the step that failed.

pipeline_statusA

Return the current status of a running or completed pipeline.

Reads /tmp/swarm-mcp/<run_id>/pipeline-status.json and returns its contents. The status file is written after each step completes.

Args: run_id: The pipeline run ID returned by the pipeline() tool.

pipeline_artifactsA

List artifacts produced by a pipeline run.

Without step_id: lists the /shared/ directory contents (inter-step files) plus a summary of each step's output directory.

With step_id: lists that specific step's output directory in detail, including file sizes. Use unwrap(ref) or Read() to view file contents.

Args: run_id: The pipeline run ID. step_id: Optional step ID to inspect. If omitted, lists shared/ and all steps.

pipeline_killA

Kill a running pipeline and all its Docker containers.

Sets the pipeline's stop event (so the loop exits cleanly after the current step) and immediately kills all Docker containers associated with the run.

Args: run_id: The pipeline run ID to kill.

list_pipelinesA

List recent pipeline runs and their current status.

Scans /tmp/swarm-mcp/ for pipeline-status.json files and returns a summary of all known runs, sorted by last_updated descending. Also annotates which runs have live threads in the current process.

save_sandbox_specA

Save a reusable sandbox spec to ~/.claude/sandboxes/.json.

Args: name: Name for the sandbox spec (e.g. "web-researcher", "code-reviewer"). spec: JSON object with sandbox fields: model, tools, mcps, system_prompt, claude_md, output_schema, effort, max_budget, mounts, workdir, input_files, network, memory, cpus, timeout, env_vars.

list_sandbox_specsA

List all saved sandbox specs from ~/.claude/sandboxes/.

wrapA

Wrap a file or directory into the swarm ref system.

This is how you bring external objects INTO the monadic context. The wrapped file gets a ref that can be passed to any combinator.

Args: path: Absolute path to a file or directory on the host.

wrap_projectA

Register a project directory's pipelines, sandboxes, and types with the swarm.

Looks for pipelines/, sandboxes/, types/ subdirectories and adds them to the search paths. After wrapping, named resources from the project are discoverable by all swarm tools (pipeline, run, validate, etc.).

Args: project_dir: Absolute path to a project root containing pipelines/, sandboxes/, and/or types/ directories.

list_type_registryA

List all registered types from ~/.claude/types/.

get_type_definitionA

Get a type definition by name, optionally resolving [references] to other types.

Args: name: Type name (e.g. "mcp-server", "tarball", "code-review"). resolve_refs: Whether to inline [referenced] types (default: true).

validateA

Validate an artifact against a declared type. Runs a type-checker agent that inspects the artifact and reports VALID/PARTIAL/INVALID with per-criterion results.

Use this after a pipeline step to verify the output matches expectations. If validation fails, you know which agent to blame and can retry.

Args: artifact: Description of what to validate — e.g. the agent's output text, a file path, or a ref {"ref": "run_id/agent_id"}. declared_type: The type to validate against — either a type name (e.g. "mcp-server") or inline natural language description. sandbox: Named sandbox spec or inline JSON for the validator agent. model: Model for the validator (default: sonnet — needs to be good at analysis). timeout: Timeout for the validation agent.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stiege/swarm-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server