Skip to main content
Glama

@arizeai/phoenix-mcp

Official
by Arize-ai
batch-evaluations.md3.24 kB
# Batch Evaluations ## Dataframe Evaluation Methods * `evaluate_dataframe` for synchronous dataframe evaluations * `async_evaluate_dataframe` an asynchronous version for optimized speed and ability to specify concurrency. Both methods run multiple evaluators over a pandas dataframe. The output is an augmented dataframe with two added columns per score: 1. `{score_name}_score` contains the JSON serialized score (or None if the evaluation failed) 2. `{evaluator_name}_execution_details` contains information about the execution status, duration, and any exceptions that occurred. #### Notes: * Bind `input_mappings` to your evaluators beforehand so they match your dataframe columns. * Failed evaluations: If an evaluation fails, the failure details will be recorded in the execution\_details column and the score will be None. #### Examples 1. Evaluator with more than one score returned: ```python import pandas as pd from phoenix.evals import evaluate_dataframe from phoenix.evals.metrics import PrecisionRecallFScore precision_recall_fscore = PrecisionRecallFScore(positive_label="Yes") df = pd.DataFrame( { "output": [["Yes", "Yes", "No"], ["Yes", "No", "No"]], "expected": [["Yes", "No", "No"], ["Yes", "No", "No"]], } ) result = evaluate_dataframe(df, [precision_recall_fscore]) result.head() ``` 2. Running multiple evaluators, one bound with an input\_mapping: ```python from phoenix.evals import bind_evaluator, evaluate_dataframe from phoenix.evals.llm import LLM from phoenix.evals.metrics import HallucinationEvaluator, exact_match df = pd.DataFrame( { # exact_match columns "output": ["Yes", "Yes", "No"], "expected": ["Yes", "No", "No"], # hallucination columns (need mapping) "context": ["This is a test", "This is another test", "This is a third test"], "query": [ "What is the name of this test?", "What is the name of this test?", "What is the name of this test?", ], "response": ["First test", "Another test", "Third test"], } ) llm = LLM(provider="openai", model="gpt-4o") hallucination_evaluator = bind_evaluator( HallucinationEvaluator(llm=llm), {"input": "query", "output": "response"} ) result = evaluate_dataframe(df, [exact_match, hallucination_evaluator]) result.head() ``` 3. Asynchronous evaluation ```python from phoenix.evals.llm import LLM from phoenix.evals.metrics import HallucinationEvaluator from phoenix.evals import async_evaluate_dataframe df = pd.DataFrame( { "context": ["This is a test", "This is another test", "This is a third test"], "input": [ "What is the name of this test?", "What is the name of this test?", "What is the name of this test?", ], "output": ["First test", "Another test", "Third test"], } ) llm = LLM(provider="openai", model="gpt-4o") hallucination_evaluator = HallucinationEvaluator(llm=llm) result = await async_evaluate_dataframe(df, [hallucination_evaluator], concurrency=5) result.head() ``` See [Using Evals with Phoenix](using-evals-with-phoenix.md) to learn how to run evals on project traces and upload them to Phoenix. 

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Arize-ai/phoenix'

If you have feedback or need assistance with the MCP directory API, please join our Discord server