eval_hallucination
Compare an AI output against a reference context to detect fabricated information, returning a score and pass/fail result.
Instructions
Detect fabricated information not present in the context.
Score 1.0 = no hallucination. Score 0.0 = significant hallucination.
Args: output: The LLM output to check. context: The ground-truth context the output should be grounded in. judge_model: Provider:model for the QAG judge.
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| output | Yes | ||
| context | Yes | ||
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||