eval_context_recall
Measure if retrieved context contains sufficient information to answer a question, helping diagnose retriever misses versus generator errors in QA systems.
Instructions
Measure whether retrieved context contains enough information to answer.
High recall = the retriever found the information needed to derive the expected answer. The judge asks whether the expected answer could plausibly be reconstructed from the retrieved context alone.
Use this when you have a labelled QA dataset and want to diagnose whether failures are retriever misses vs. generator errors.
Args: input: The user's question. context: The retrieved context chunks (list or single string). expected_answer: The ground-truth answer the context should support. judge_model: Provider:model for the QAG judge.
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float, "evaluator": "context_recall"}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes | ||
| context | Yes | ||
| expected_answer | Yes | ||
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||