eval_vqa_faithfulness
Verify whether an LLM's answer about an image is factually accurate by extracting and checking each claim against what's visible. Ideal for visual question answering and image captioning tasks.
Instructions
Check whether an LLM answer about an image is grounded in what's visible.
Image-grounded faithfulness. The vision judge extracts up to 3 factual claims from the answer, then verifies each one against the image. Score = fraction of claims that are accurate.
Use this for visual QA, image captioning, chart/diagram reading, and any LLM output that purports to describe an image.
Image input — exactly one of:
image: a local path, http(s) URL, or full data URI.image_base64: raw base64 (nodata:prefix); pair withmime_type(default"image/png").
Args:
input: The question or prompt the LLM was answering.
output: The LLM-generated answer to verify against the image.
image: Path / URL / data URI for the image.
image_base64: Alternative — raw base64 image bytes.
mime_type: Mime type when using image_base64. Default
"image/png". Other common values: "image/jpeg",
"image/webp".
judge_model: Provider:model for the vision judge. Must be
vision-capable. Default "google:gemini-2.5-flash"
(cheap). Other vision-capable options: "openai:gpt-4o-mini"
or "anthropic:claude-sonnet-4-6" (not haiku — Haiku 4-5
is not vision-capable).
Returns:
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float, "evaluator": "vqa_faithfulness"}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes | ||
| output | Yes | ||
| image | No | ||
| image_base64 | No | ||
| mime_type | No | image/png | |
| judge_model | No | google:gemini-2.5-flash |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||