Skip to main content
Glama

evaluate_retrieval

Measure retrieval system accuracy by comparing search results against expected documents using MRR@5 and Recall@5 metrics.

Instructions

Evaluate retrieval quality with test queries.

Args:
    test_cases: JSON string of test cases. Format: [{"query": "search term", "expected_filepath": "path/to/doc.md"}, ...]

Returns:
    JSON string with MRR@5, Recall@5, and per-query results

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
test_casesYes

Implementation Reference

  • The `evaluate_retrieval` method inside `KnowledgeOrchestrator` class calculates MRR@5, Recall@5, and provides per-query analysis for given test cases.
    def evaluate_retrieval(self, test_cases: List[Dict[str, str]]) -> Dict[str, Any]:
        """Evaluate retrieval quality with test queries. Returns MRR@5, Recall@5, Precision@5."""
        per_query = []
        mrr_sum = 0.0
        recall_sum = 0.0
        k = 5
    
        for tc in test_cases:
            query = tc.get("query", "")
            expected = tc.get("expected_filepath", "")
    
            results = self.query(query, max_results=k)
    
            found_rank = None
            for i, r in enumerate(results):
                if expected in r.get("source", ""):
                    found_rank = i + 1
                    break
    
            rr = 1.0 / found_rank if found_rank else 0.0
            recall = 1.0 if found_rank else 0.0
    
            mrr_sum += rr
            recall_sum += recall
    
            per_query.append({
                "query": query,
                "expected": expected,
                "found_at_rank": found_rank,
                "reciprocal_rank": round(rr, 4),
                "top_result": results[0]["source"] if results else "none",
            })
    
        n = len(test_cases) if test_cases else 1
        return {
            "total_queries": len(test_cases),
            "mrr_at_5": round(mrr_sum / n, 4),
            "recall_at_5": round(recall_sum / n, 4),
            "per_query": per_query,
        }
  • Registration of the `evaluate_retrieval` MCP tool which parses JSON test cases and invokes `KnowledgeOrchestrator.evaluate_retrieval`.
    @mcp.tool()
    def evaluate_retrieval(test_cases: str) -> str:
        """
        Evaluate retrieval quality with test queries.
    
        Args:
            test_cases: JSON string of test cases. Format: [{"query": "search term", "expected_filepath": "path/to/doc.md"}, ...]
    
        Returns:
            JSON string with MRR@5, Recall@5, and per-query results
        """
        try:
            cases = json.loads(test_cases) if isinstance(test_cases, str) else test_cases
        except json.JSONDecodeError:
            return json.dumps({"status": "error", "message": "Invalid JSON for test_cases"})
    
        if not isinstance(cases, list) or not cases:
            return json.dumps({"status": "error", "message": "test_cases must be a non-empty JSON array"})
    
        orchestrator = get_orchestrator()
        results = orchestrator.evaluate_retrieval(cases)
    
        return json.dumps({"status": "success", **results}, indent=2)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lyonzin/knowledge-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server