eval_faithfulness
Check if an LLM's answer is grounded in the provided context by verifying factual claims against retrieved documents.
Instructions
Evaluate whether an LLM output is grounded in the retrieved context.
Uses multivon-eval's QAG-graded Faithfulness evaluator. Extracts factual claims from the output and verifies each one against the context. Score is the fraction of claims supported.
Use this when a RAG pipeline returned an answer and you want to check the LLM didn't invent facts not present in retrieved documents.
Args:
input: The user's question.
context: The retrieved context the LLM was given.
output: The LLM's answer being evaluated.
judge_model: Provider:model for the QAG judge.
Default "anthropic:claude-haiku-4-5" (cheap + calibrated).
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes | ||
| context | Yes | ||
| output | Yes | ||
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||