evaluate_job
Perform an asynchronous LLM evaluation of a completed job using a custom rubric, scoring job inputs and outputs against each criterion with numeric scores and justifications.
Instructions
Trigger an LLM-as-judge evaluation of a completed job against a rubric.
The evaluation runs asynchronously: the judge LLM scores the job's input/output against each criterion in the rubric and produces a numeric score with a textual justification. Results are retrievable via get_job_evaluations.
Write operation — recorded in the audit log.
Args: job_id: UUID of the job to evaluate (must be in "success" or "failed" state). rubric_id: UUID of the rubric to apply (from list_eval_rubrics).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | ||
| rubric_id | Yes |