evaluate_retrieval
Measure retrieval system accuracy by comparing search results against expected documents using MRR@5 and Recall@5 metrics.
Instructions
Evaluate retrieval quality with test queries.
Args:
test_cases: JSON string of test cases. Format: [{"query": "search term", "expected_filepath": "path/to/doc.md"}, ...]
Returns:
JSON string with MRR@5, Recall@5, and per-query resultsInput Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| test_cases | Yes |
Implementation Reference
- mcp_server/server.py:1173-1212 (handler)The `evaluate_retrieval` method inside `KnowledgeOrchestrator` class calculates MRR@5, Recall@5, and provides per-query analysis for given test cases.
def evaluate_retrieval(self, test_cases: List[Dict[str, str]]) -> Dict[str, Any]: """Evaluate retrieval quality with test queries. Returns MRR@5, Recall@5, Precision@5.""" per_query = [] mrr_sum = 0.0 recall_sum = 0.0 k = 5 for tc in test_cases: query = tc.get("query", "") expected = tc.get("expected_filepath", "") results = self.query(query, max_results=k) found_rank = None for i, r in enumerate(results): if expected in r.get("source", ""): found_rank = i + 1 break rr = 1.0 / found_rank if found_rank else 0.0 recall = 1.0 if found_rank else 0.0 mrr_sum += rr recall_sum += recall per_query.append({ "query": query, "expected": expected, "found_at_rank": found_rank, "reciprocal_rank": round(rr, 4), "top_result": results[0]["source"] if results else "none", }) n = len(test_cases) if test_cases else 1 return { "total_queries": len(test_cases), "mrr_at_5": round(mrr_sum / n, 4), "recall_at_5": round(recall_sum / n, 4), "per_query": per_query, } - mcp_server/server.py:1563-1586 (registration)Registration of the `evaluate_retrieval` MCP tool which parses JSON test cases and invokes `KnowledgeOrchestrator.evaluate_retrieval`.
@mcp.tool() def evaluate_retrieval(test_cases: str) -> str: """ Evaluate retrieval quality with test queries. Args: test_cases: JSON string of test cases. Format: [{"query": "search term", "expected_filepath": "path/to/doc.md"}, ...] Returns: JSON string with MRR@5, Recall@5, and per-query results """ try: cases = json.loads(test_cases) if isinstance(test_cases, str) else test_cases except json.JSONDecodeError: return json.dumps({"status": "error", "message": "Invalid JSON for test_cases"}) if not isinstance(cases, list) or not cases: return json.dumps({"status": "error", "message": "test_cases must be a non-empty JSON array"}) orchestrator = get_orchestrator() results = orchestrator.evaluate_retrieval(cases) return json.dumps({"status": "success", **results}, indent=2)