07.09.2025-baseline-for-experiment-comparisons.md•888 B
---
description: Available in Phoenix 11.4+
---
# 07.09.2025: Baseline for Experiment Comparisons 🔁
{% embed url="https://storage.googleapis.com/arize-phoenix-assets/assets/videos/experiment-baseline-comparison.mp4" %}
You can now set a **baseline run** when comparing multiple experiments. This is especially useful when one run represents a known-good output (e.g. a previous model version or a CI-approved run), and you want to evaluate changes relative to it.
For example, in an evaluation like `accuracy`, you can easily see where the value flipped from `correct → incorrect` or `incorrect → correct` between your baseline and the current comparison - helping you quickly spot regressions or improvements.
This feature makes it easier to isolate the impact of changes like a new prompt, model, or dataset.
{% embed url="https://github.com/Arize-ai/phoenix/pull/8461" %}