evaluate_retrieval
Run retrieval metrics like recall@k, precision@k, MRR, and nDCG@k on labelled datasets with a configurable adapter. Returns per-query and aggregate scores plus latency percentiles.
Instructions
Run retrieval metrics (recall@k, precision@k, MRR, nDCG@k) against a labelled dataset with a configurable retrieval adapter. Returns per-query metrics, dataset-level aggregate, and p50/p95 retrieval latency.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset_path | Yes | Path to JSONL dataset with relevant_chunk_ids on each entry. | |
| corpus_path | Yes | Path to JSONL corpus file. | |
| k | No | Top-k cutoff for all metrics (default 5). | |
| adapter | No | Retrieval adapter to use (default bm25). | bm25 |
| output_dir | No | Directory to save results (optional). |