evaluate_rag_end_to_end
Run a complete RAG pipeline that retrieves chunks, generates answers, and scores them on context relevance and citation faithfulness. Returns per-query and aggregate metrics.
Instructions
Run the full RAG pipeline: retrieve chunks, generate answers using the retrieved chunks as context, and score with context_relevance and citation_faithfulness judges. Returns retrieval metrics, generation metrics, and judge scores per query, plus an aggregate.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset_path | Yes | ||
| corpus_path | Yes | ||
| models | Yes | Models to evaluate. | |
| k | No | ||
| adapter | No | bm25 | |
| judge | No | ||
| output_dir | No |