eval_faithfulness
Evaluate whether an LLM output is factually grounded in the provided context by extracting and verifying each claim. Returns a score and pass/fail indicator.
Instructions
Evaluate whether an LLM output is grounded in the retrieved context.
Uses multivon-eval's QAG-graded Faithfulness evaluator. Extracts factual claims from the output and verifies each one against the context. Score is the fraction of claims supported.
Use this when a RAG pipeline returned an answer and you want to check the LLM didn't invent facts not present in retrieved documents.
Args:
input: The user's question.
context: The retrieved context the LLM was given.
output: The LLM's answer being evaluated.
judge_model: Provider:model for the QAG judge.
Default "anthropic:claude-haiku-4-5" (cheap + calibrated).
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes | ||
| context | Yes | ||
| output | Yes | ||
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||