@arizeai/phoenix-mcp

Official

Overview Schema Related Servers Score Discussions

phoenix
docs

phoenix.mdx•7.12 KiB

--- title: "What is Arize Phoenix?" description: AI Observability and Evaluation --- Phoenix helps you understand and improve AI applications by giving you a workflow for debugging and iteration. You can send detailed logging information, known as traces, from your app to see exactly what happened during a run, score outputs using evaluation tests to identify failures and regressions, iterate on your prompts using real production examples, and optimize your app with experiments that compare changes on the same inputs. Together, these tools help you move from inspecting individual runs to improving quality with evidence. Phoenix is built by [Arize AI](https://www.arize.com) and the open-source community. It is built on top of OpenTelemetry and is powered by [OpenInference](https://github.com/Arize-ai/openinference) instrumentation. See [Integrations](/docs/phoenix/integrations) for details. ## Features <Tabs> <Tab title="Tracing" icon="telescope"> <Frame caption="Tracing in Phoenix"> <video src="https://storage.googleapis.com/arize-phoenix-assets/assets/gifs/tracing.mp4" width="100%" height="100%" style={{ display: 'block', objectFit: 'fill', backgroundColor: 'transparent' }} controls autoPlay muted loop /> </Frame> [Tracing](/docs/phoenix/tracing/llm-traces) lets you see what happened during a single run of your AI application, step by step. A trace captures model calls, retrieval, tool use, and custom logic so you can debug behavior and understand where time is spent. Phoenix accepts traces over OpenTelemetry (OTLP) and provides [auto-instrumentation](/docs/phoenix/integrations) for popular frameworks (LlamaIndex, LangChain, DSPy, Mastra, Vercel AI SDK), providers (OpenAI, Bedrock, Anthropic), and languages (Python, TypeScript, Java). </Tab> <Tab title="Evaluation" icon="clipboard-check"> <Frame caption="Evals in the Phoenix UI"> <video src="https://storage.googleapis.com/arize-phoenix-assets/assets/gifs/evals.mp4" width="100%" height="100%" style={{ display: 'block', objectFit: 'fill', backgroundColor: 'transparent' }} controls autoPlay muted loop /> </Frame> [Evaluations](/docs/phoenix/evaluation/llm-evals) help you measure the output quality of your application. You can score traces & spans with LLM-based evaluators, code-based checks, or human labels so you can track performance and identify failures consistently. * [LLM-based evaluations](/docs/phoenix/evaluation/running-pre-tested-evals) — Run pre-built or custom evaluators on your data * [Evaluator integrations](/docs/phoenix/tracing/how-to-tracing/feedback-and-annotations/evaluating-phoenix-traces) — Use Phoenix evals, or bring your own from [Ragas](https://docs.ragas.io/), [Deepeval](https://github.com/confident-ai/deepeval), or [Cleanlab](https://cleanlab.ai/) * [Human annotations](/docs/phoenix/tracing/llm-traces/how-to-annotate-traces) — Attach ground truth labels directly in the UI </Tab> <Tab title="Prompt Engineering" icon="wand-magic-sparkles"> <Frame caption="Phoenix Prompt Playground"> <video src="https://storage.googleapis.com/arize-phoenix-assets/assets/gifs/prompt_playground.mp4" width="100%" height="100%" style={{ display: 'block', objectFit: 'fill', backgroundColor: 'transparent' }} controls autoPlay muted loop /> </Frame> Phoenix helps you [iterate on prompts](/docs/phoenix/prompt-engineering/overview-prompts) using real examples from your application. You can version prompts, test prompt variants across datasets, and replay calls to see how changes affect outputs before rolling them out. * [Prompt Management](/docs/phoenix/prompt-engineering/overview-prompts/prompt-management) — Version, store, and deploy prompts * [Prompt Playground](/docs/phoenix/prompt-engineering/overview-prompts/prompt-playground) — Experiment with prompts and models side-by-side * [Span Replay](/docs/phoenix/prompt-engineering/overview-prompts/span-replay) — Debug by replaying LLM calls with different inputs * [Prompts in Code](/docs/phoenix/prompt-engineering/overview-prompts/prompts-in-code) — Sync prompts across environments via SDK </Tab> <Tab title="Datasets & Experiments" icon="flask"> <Frame caption="Experiments in Phoenix"> <video src="https://storage.googleapis.com/arize-phoenix-assets/assets/gifs/experiments.mp4" width="100%" height="100%" style={{ display: 'block', objectFit: 'fill', backgroundColor: 'transparent' }} controls autoPlay muted loop /> </Frame> [Datasets & Experiments](/docs/phoenix/datasets-and-experiments/overview-datasets) help you test changes systematically using the same inputs. You can group traces into datasets, rerun them through different versions of your application, and compare evaluation results to confirm whether a change actually improved performance. * [Run Experiments](/docs/phoenix/datasets-and-experiments/how-to-experiments/run-experiments) — Compare different versions of your application * [Create Datasets](/docs/phoenix/datasets-and-experiments/how-to-datasets) — Collect traces or upload from code/CSV * [Test at Scale](/docs/phoenix/prompt-engineering/overview-prompts/prompt-playground) — Run datasets through Playground or export for fine-tuning </Tab> </Tabs> ## Quick Starts Running Phoenix for the first time? Select a quick start below. <CardGroup cols={2}> <Card title="Send Traces From Your App" icon="telescope" href="/docs/phoenix/get-started/get-started-tracing"> See what's happening inside your LLM application with distributed tracing </Card> <Card title="Measure Performance with Evaluations" icon="clipboard-check" href="/docs/phoenix/get-started/get-started-evaluations"> Measure quality with LLM-as-a-judge and custom evaluators </Card> <Card title="Iterate on Your Prompts" icon="wand-magic-sparkles" href="/docs/phoenix/get-started/get-started-prompt-playground"> Experiment with prompts, compare models, and version your work </Card> <Card title="Optimize Your App with Experiments" icon="flask" href="/docs/phoenix/get-started/get-started-datasets-and-experiments"> Test your application systematically and track performance over time </Card> </CardGroup> ## Next Steps The best next step is to start using Phoenix. Start with a quickstart to send data into Phoenix, then build from there. See the [Quickstart Overview](https://arize.com/docs/phoenix/get-started) for more information about what you'll build. ## Other Resources <CardGroup cols={2}> <Card title="Integrations" icon="puzzle-piece" href="/docs/phoenix/integrations"> Add instrumentation for OpenAI, LangChain, LlamaIndex, and more </Card> <Card title="Self-Host" icon="server" href="/docs/phoenix/self-hosting/environments"> Deploy Phoenix on Docker, Kubernetes, or your cloud of choice </Card> <Card title="Cookbooks" icon="book-open" href="/docs/phoenix/cookbook"> Example notebooks for tracing, evals, RAG analysis, and more </Card> <Card title="Community" icon="users" href="https://join.slack.com/t/arize-ai/shared_invite/zt-3lqwr2oc3-7rhdyYEh82zJL_UhPKrb0A"> Join the Phoenix Slack to ask questions and connect with developers </Card> </CardGroup>

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Arize-ai/phoenix'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

phoenix.mdx•7.12 KiB