eval_generate_cases
Generates synthetic evaluation cases from source text to build an initial eval suite. Creates question-answer-context triples from documents, FAQs, or knowledge bases, eliminating the cold-start problem.
Instructions
Generate synthetic eval cases from a source text.
Calls multivon-eval's synthetic generator to produce n eval
cases from raw text (docs, FAQ, knowledge base). Each case has
an input (question), expected_output (ground-truth
answer), and context (the source excerpt the answer was
grounded in). Eliminates the cold-start problem when building
a new eval suite from scratch.
Requires a provider API key in env so the underlying judge can propose question/answer pairs.
Args:
from_text: Source text to generate cases from (e.g. FAQ,
docs chunk, knowledge base article).
n: Number of cases to generate. Default 10.
task: One of "qa" (question/answer pairs — default),
"summarization" (text + expected summary), or
"hallucination" (faithful answer + expected_output = "faithful" for hallucination benchmarks).
judge_model: Provider:model string used to generate the
cases. The generator calls this judge under the hood;
it does NOT need to match the judge you eventually use
to evaluate the cases. Default
"anthropic:claude-haiku-4-5".
Returns:
A list of dicts {"input", "expected_output", "context", "metadata"} ready to feed into EvalCase(**d) or to
persist as a JSONL eval dataset.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| from_text | Yes | ||
| n | No | ||
| task | No | qa | |
| judge_model | No | anthropic:claude-haiku-4-5 |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |