eval_context_precision
Measures the fraction of retrieved RAG context chunks that are relevant to the question, providing a precision score to diagnose retriever noise and quality.
Instructions
Measure whether retrieved RAG context chunks are relevant to the question.
High precision = the retriever returned mostly on-topic chunks; low noise. The judge asks "is this chunk relevant?" for each chunk (up to 8) and scores precision = fraction marked relevant.
Use this to diagnose retriever quality: if precision is low, your embedding model, chunk size, or reranker is returning noise.
Args: input: The user's question. context: Either a list of retrieved chunks, or a single string with the full retrieved context (will be evaluated as one chunk). judge_model: Provider:model for the QAG judge.
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float, "evaluator": "context_precision"}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes | ||
| context | Yes | ||
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||