eval_relevance
Evaluates if an LLM response addresses the user's question by generating and grading questions about relevance, returning a score and pass/fail verdict.
Instructions
Check whether an LLM output actually addresses the user's question.
QAG-graded — generates yes/no questions about whether the output answers the input, stays on topic, contains relevant content.
Args: input: The user's question. output: The LLM's response. judge_model: Provider:model for the QAG judge.
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes | ||
| output | Yes | ||
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||