eval_answer_accuracy
Evaluate answer accuracy by checking semantic equivalence to ground truth using question-answer generation, avoiding strict string matching issues.
Instructions
Evaluate whether an answer is semantically equivalent to the ground truth.
QAG-graded — generates yes/no questions about whether the actual answer matches the meaning of the expected answer. Useful when string match is too strict (e.g. paraphrased correct answers).
Args: expected_answer: Ground-truth answer. actual_answer: The LLM's answer. judge_model: Provider:model for the QAG judge.
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| expected_answer | Yes | ||
| actual_answer | Yes | ||
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||