Skip to main content
Glama

tm_start_run

Destructive

Start a traffic run for a profile, optionally waiting for completion and verdict. Use fail_on_verdict to gate on specified verdicts like FAIL or WARN.

Instructions

Start a traffic run for the given profile.

Default behavior (wait=False) is fire-and-return: POST to /api/v1/profiles/{id}/start, return the run id + initial status, exit. Useful for "kick this off and tell me when it's done" UI flows.

Set wait=True to drive the CI-gate shape (mirrors the tm runs start --wait CLI flow):

  1. Snapshot the profile's current top history id (so we can later distinguish "my run's row landed" from "a previous run's row is still there").

  2. Start the run.

  3. Poll /api/v1/profiles/{id} every poll_interval_seconds until status leaves {"RUNNING", "PAUSED"}.

  4. Fetch the post-run history row with bounded exponential backoff up to verdict_timeout_seconds — closes three race windows (terminal-status vs row-insert vs verdict worker; see _fetch_post_run_history for the full rationale).

fail_on_verdict triggers the gate evaluation. Accepts a list of verdict tokens — any of "FAIL", "WARN", "NO_BASELINE". PASS is deliberately not accepted. Requires wait=True (the gate needs the verdict, which requires waiting). Unknown tokens raise ToolError before any HTTP call — a typo like ["FAILL"] would otherwise produce a silent false-pass.

Return shape:

.. code-block:: python

# Without wait:
{
    "run_id": <str>,          # in-memory run id
    "profile_id": <int>,
    "status": <str>,          # initial — typically "RUNNING"
    "waited": False,
}

# With wait:
{
    "run_id": <int>,            # history row id (persisted; use with tm_get_run)
    "started_run_id": <str>,    # in-memory runId from /start (informational)
    "profile_id": <int>,
    "status": <str>,            # terminal — e.g. "COMPLETED" or "IDLE"
    "verdict": <str|None>,      # e.g. "FAIL", "PASS", "NO_BASELINE"
    "verdict_reasons": <str|None>,
    "fail_on_match": <bool|None>,  # True when verdict ∈ fail_on_verdict, OR
                                   # when gate is set but verdict is None
                                   # (fail-closed). None when fail_on_verdict
                                   # was not provided.
    "metrics": {
        "totalRequests": <int>,
        "totalErrors": <int>,
        "avgRps": <float>,
        "peakRps": <float>,
        "successRate": <float>,
        "latencyQuantiles": {<quantile>: <ms>, ...},
    },
    "verdict_pending": <bool>,  # True iff row appeared but verdict never
                                # populated within verdict_timeout_seconds
    "waited": True,
}

Run-correlation defense. Exact runId match. The server persists the in-memory runId on every RunHistory row (see RunHistory.runId field). The wait flow compares the row's runId to the one returned by /start; mismatch → fail closed with ToolError. This is the definitive correlation — no time-drift ambiguity, no race window where two starts within seconds of each other false-negative.

Additional defenses kept as belt-and-suspenders:

  • triggered_by filter on history fetches: both the pre-start snapshot and the post-run fetch filter triggered_by="api". Narrows the candidate set to rows we could plausibly have produced; protects the anchor logic from cross-channel collisions.

  • Time-drift fallback (transitional): if a row's runId is null (legacy data persisted before the column was added), falls back to the 30s forward / 5s backward drift check. Removable once production rows all carry the field.

Fail closed when ambiguous. Row with neither a runId nor a parseable startedAt → ToolError. Better to surface "can't verify identity" than silently return a possibly-wrong verdict.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
profile_idYes
waitNo
fail_on_verdictNo
wait_timeout_secondsNo
verdict_timeout_secondsNo
poll_interval_secondsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes far beyond the annotation hints (destructive, not idempotent) by detailing the two execution modes, polling behavior, race window handling, run-correlation defenses, and fail-closed logic. This provides a transparent view of internal workings and side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very detailed with multiple paragraphs and code blocks. While well-structured with sections, the length may overwhelm an AI agent looking for quick clarity. Tighter phrasing could improve conciseness without losing critical detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers all aspects: purpose, modes, return shapes (both modes), parameter behaviors, error handling, and run-correlation defenses. There is no output schema, so the detailed return shape documentation is essential and fully provided. Edge cases like unknown tokens and run-id mismatch are addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description compensates by explaining most parameters: wait (fire-and-return vs blocking), fail_on_verdict (list of verdict tokens, requires wait, unknown tokens cause error), poll_interval_seconds and verdict_timeout_seconds (used in wait flow). However, wait_timeout_seconds is not explained, leaving a minor gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool starts a traffic run for a given profile. It distinguishes two modes (fire-and-return vs wait) and explains the resource and action. This differentiates it from sibling tools like tm_stop_run or tm_get_run by focusing on initiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use each mode: fire-and-return for UI flows and wait for CI-gate shapes. It also details when fail_on_verdict is applicable and that it requires wait=True. However, it does not explicitly compare with sibling tools like tm_stop_run or tm_pause_run, though the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/trafficmorph-gif/tm-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server