FleetQ

Overview Schema Related Servers Score Discussions

Experiment Manage Tool

experiment_manage

Destructive

Create, start, pause, resume, retry, or kill experiments. Validate state transitions and view costs, steps, or share public tokens.

Instructions

Experiments — the platform's core unit of work. Each experiment runs a workflow (DAG) through a 20-state machine: Draft → Scoring → Planning → Building → AwaitingApproval → Approved → Executing → CollectingMetrics → Evaluating → (Iterating | Completed). Lifecycle transitions are validated by ExperimentTransitionMap; use valid_transitions to discover what's currently allowed.

Actions:

list (read) — optional: status, workflow_id, limit.
get (read) — experiment_id.
create (write) — name, hypothesis; optional workflow_id (else uses default workflow).
start (write) — experiment_id. Transitions Draft → Scoring; reserves budget.
pause / resume (write) — experiment_id. Pause holds at the current stage.
retry (write) — experiment_id. Re-runs the failed stage.
retry_from_step (write) — experiment_id, step_id. Graph-aware BFS reset of step + downstream.
kill (DESTRUCTIVE) — experiment_id. Terminal; cannot resume.
valid_transitions (read) — experiment_id. Allowed next states for current state.
cost / steps / share (read) — experiment_id. Cost breakdown / step list / public share token.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`action`	Yes	Action to perform: list, get, create, start, pause, resume, retry, retry_from_step, kill, valid_transitions, cost, steps, share
`deadline_ms`	No	Optional: max wall-clock time (ms) the tool may spend. If exceeded during the call, returns a DEADLINE_EXCEEDED error. Minimum 100 ms. Leave unset for no deadline.
`status`	No	Filter by status: draft, scoring, planning, building, executing, completed, killed, paused, etc.
`limit`	No	Max results to return (default 10, max 100)
`experiment_id`	Yes	The experiment UUID
`title`	Yes	Experiment title
`thesis`	No	Experiment thesis/hypothesis
`track`	No	Experiment track: growth, retention, revenue, engagement (default: growth)	growth
`budget_cap_credits`	No	Budget cap in credits (default: 10000)
`step_id`	Yes	The playbook step UUID to retry from
`reason`	No	Reason for killing the experiment
`show_costs`	No	Whether to show cost data in the public view (for update action)
`show_stages`	No	Whether to show pipeline stages in the public view (for update action)
`show_outputs`	No	Whether to show stage outputs in the public view (for update action)
`expires_at`	No	ISO8601 expiry datetime after which the share link is invalid. Pass null to remove expiry. (for update action)

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond the annotations: it details the state machine, lifecycle transitions, and notes that `kill` is destructive and terminal. It also mentions budget reservation on `start` and graph-aware BFS reset for `retry_from_step`. While annotations already indicate `destructiveHint: true`, the description enriches the behavioral model.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear opening statement about experiments and a bullet list of actions. It is informative without being overly verbose, though the state machine detail could be considered slightly heavy for a function description. Overall, it is concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (15 parameters, 4 required, no output schema), the description fails to explain return values or error states for actions. While it covers actions and states, it does not describe expected outputs (e.g., what `list` returns, cost breakdown format). This is a notable gap for a tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all parameters. The description provides a high-level mapping of actions to required parameters (e.g., `list` uses optional `status`, `workflow_id`, `limit`), but does not add deeper semantic meaning beyond what the schema offers. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states that the tool manages experiments, which are the core unit of work. It lists all actions (list, get, create, start, etc.) and explains the 20-state machine, making the tool's purpose very clear and distinct from sibling tools focused on other domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by listing actions and their effects, but it does not explicitly state when to use this tool versus alternatives (e.g., other manage tools). There is guidance on using `valid_transitions` to check allowed state transitions, but no comparative guidance or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/escapeboy/agent-fleet-o'

If you have feedback or need assistance with the MCP directory API, please join our Discord server