eval_custom_rubric
Score an LLM output against your own list of yes/no quality checks to evaluate compliance with custom criteria.
Instructions
Score an output against your own list of yes/no quality checks.
Each criterion is a [question, expect_yes] pair. The judge
answers each question with yes/no; the score is the fraction
answered as expected. Best for compliance-style rubrics where
each aspect should be auditable separately.
Args:
input: The prompt the LLM was responding to.
output: The LLM-generated response.
criteria: A list of [question_str, expect_yes_bool] pairs.
Example: [["Does it cite a source?", true], ["Does it speculate beyond the source?", false]].
name: Optional label for the rubric (appears in the result
dict's evaluator field).
context: Optional context string for the judge to consider
(e.g. retrieved RAG context, source document).
judge_model: Provider:model for the QAG judge.
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float, "evaluator": <name>}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes | ||
| output | Yes | ||
| criteria | Yes | ||
| name | No | custom_rubric | |
| context | No | ||
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||