tm_start_run
Start a traffic run for a profile, optionally waiting for completion and verdict. Use fail_on_verdict to gate on specified verdicts like FAIL or WARN.
Instructions
Start a traffic run for the given profile.
Default behavior (wait=False) is fire-and-return: POST to
/api/v1/profiles/{id}/start, return the run id + initial
status, exit. Useful for "kick this off and tell me when it's
done" UI flows.
Set wait=True to drive the CI-gate shape (mirrors the
tm runs start --wait CLI flow):
Snapshot the profile's current top history id (so we can later distinguish "my run's row landed" from "a previous run's row is still there").
Start the run.
Poll
/api/v1/profiles/{id}everypoll_interval_secondsuntilstatusleaves{"RUNNING", "PAUSED"}.Fetch the post-run history row with bounded exponential backoff up to
verdict_timeout_seconds— closes three race windows (terminal-status vs row-insert vs verdict worker; see_fetch_post_run_historyfor the full rationale).
fail_on_verdict triggers the gate evaluation. Accepts a
list of verdict tokens — any of "FAIL", "WARN",
"NO_BASELINE". PASS is deliberately not accepted. Requires
wait=True (the gate needs the verdict, which requires
waiting). Unknown tokens raise ToolError before any HTTP call —
a typo like ["FAILL"] would otherwise produce a silent
false-pass.
Return shape:
.. code-block:: python
# Without wait:
{
"run_id": <str>, # in-memory run id
"profile_id": <int>,
"status": <str>, # initial — typically "RUNNING"
"waited": False,
}
# With wait:
{
"run_id": <int>, # history row id (persisted; use with tm_get_run)
"started_run_id": <str>, # in-memory runId from /start (informational)
"profile_id": <int>,
"status": <str>, # terminal — e.g. "COMPLETED" or "IDLE"
"verdict": <str|None>, # e.g. "FAIL", "PASS", "NO_BASELINE"
"verdict_reasons": <str|None>,
"fail_on_match": <bool|None>, # True when verdict ∈ fail_on_verdict, OR
# when gate is set but verdict is None
# (fail-closed). None when fail_on_verdict
# was not provided.
"metrics": {
"totalRequests": <int>,
"totalErrors": <int>,
"avgRps": <float>,
"peakRps": <float>,
"successRate": <float>,
"latencyQuantiles": {<quantile>: <ms>, ...},
},
"verdict_pending": <bool>, # True iff row appeared but verdict never
# populated within verdict_timeout_seconds
"waited": True,
}Run-correlation defense. Exact runId match. The server
persists the in-memory runId on every RunHistory row (see
RunHistory.runId field). The wait flow compares the row's
runId to the one returned by /start; mismatch → fail
closed with ToolError. This is the definitive correlation —
no time-drift ambiguity, no race window where two starts
within seconds of each other false-negative.
Additional defenses kept as belt-and-suspenders:
triggered_by filter on history fetches: both the pre-start snapshot and the post-run fetch filter
triggered_by="api". Narrows the candidate set to rows we could plausibly have produced; protects the anchor logic from cross-channel collisions.Time-drift fallback (transitional): if a row's
runIdis null (legacy data persisted before the column was added), falls back to the 30s forward / 5s backward drift check. Removable once production rows all carry the field.
Fail closed when ambiguous. Row with neither a runId nor a parseable startedAt → ToolError. Better to surface "can't verify identity" than silently return a possibly-wrong verdict.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| profile_id | Yes | ||
| wait | No | ||
| fail_on_verdict | No | ||
| wait_timeout_seconds | No | ||
| verdict_timeout_seconds | No | ||
| poll_interval_seconds | No |