post_test_result
Store the scored result of a test case run after evaluating the model response. Provide a quality score from 0 to 100 with reasoning to track regression status.
Instructions
Store the scored result of one test case run.
Call after you run a test case against the model and evaluate the response. This makes the result visible in the UI and is used by get_regression_status.
Score 0–100 using this scale: 90–100: Correct, complete, well-structured — exceeds target. 70–89: Correct and complete — minor gaps or style issues. 50–69: Partially correct — key points present but missing important details. 30–49: Mostly wrong — one or two relevant points but fundamentally off. 0–29: Completely wrong, off-topic, or refused.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| workspaceId | Yes | ||
| testCaseId | Yes | ID from get_workspace_state testCases | |
| response | Yes | The full model response | |
| score | Yes | Quality score 0–100 | |
| reasoning | Yes | Why this score — what worked, what failed | |
| model | Yes | Model used, e.g. claude-haiku-4-5-20251001 |