evaluate_answers
Score LLM/RAG-generated answers against a golden dataset using TF-IDF cosine similarity. Works without any LLM call or API key.
Instructions
Score actual LLM/RAG-generated answers against the golden dataset using TF-IDF cosine similarity (no LLM call, no API key needed).
actual_answers must be supplied in the same order as the entries in
the target version. Omit version to evaluate against the current
committed version.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset_name | Yes | ||
| version | Yes | ||
| total_entries | Yes | ||
| avg_semantic_similarity | Yes | ||
| passed | Yes | ||
| results | Yes |