Skip to main content
Glama

@arizeai/phoenix-mcp

Official
by Arize-ai
chatbot_with_human_feedback.ipynb14.6 kB
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<center>\n", " <p style=\"text-align:center\">\n", " <img alt=\"phoenix logo\" src=\"https://raw.githubusercontent.com/Arize-ai/phoenix-assets/9e6101d95936f4bd4d390efc9ce646dc6937fb2d/images/socal/github-large-banner-phoenix.jpg\" width=\"1000\"/>\n", " <br>\n", " <br>\n", " <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n", " |\n", " <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n", " |\n", " <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email\">Community</a>\n", " </p>\n", "</center>\n", "<h1 align=\"center\">Instrumenting a chatbot with human feedback</h1>\n", "\n", "Phoenix provides endpoints to associate user-provided feedback directly with OpenInference spans as annotations.\n", "\n", "In this tutorial, we will create a manually-instrument chatbot with user-triggered \"👍\" and \"👎\" feedback buttons. We will have those buttons trigger a callback that sends the user feedback to Phoenix and is viewable alongside the span. Automating associating feedback with spans is a powerful way to quickly focus on traces of your application that are not behaving as expected." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install -q arize-phoenix-otel \"arize-phoenix-client>=1.5.0\" gradio" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from getpass import getpass\n", "from typing import Any, Dict\n", "from uuid import uuid4\n", "\n", "import httpx\n", "\n", "from phoenix.client import Client\n", "from phoenix.otel import register" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n", " openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n", "\n", "if not (phoenix_api_key := os.getenv(\"PHOENIX_API_KEY\")):\n", " phoenix_api_key = getpass(\"🔑 Enter your Phoenix API key: \")\n", "\n", "os.environ[\"PHOENIX_CLIENT_HEADERS\"] = f\"api_key={phoenix_api_key}\"\n", "os.environ[\"PHOENIX_COLLECTOR_ENDPOINT\"] = \"https://app.phoenix.arize.com\"\n", "os.environ[\"PHOENIX_PROJECT_NAME\"] = \"Chatbot with Annotations\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define endpoints and configure OpenTelemetry tracing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tracer_provider = register()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "FEEDBACK_ENDPOINT = f\"{os.environ['PHOENIX_COLLECTOR_ENDPOINT']}/span_annotations\"\n", "OPENAI_API_URL = \"https://api.openai.com/v1/chat/completions\"\n", "tracer = tracer_provider.get_tracer(__name__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define and instrument chat service backend\n", "\n", "Here we define two functions:\n", "\n", "`generate_response` is a function that contains the chatbot logic for responding to a user query. `generate_response` is manually instrumented using the `OpenInference` semantic conventions. More information on how to manually instrument an application can be found [here](https://arize.com/docs/phoenix/tracing/how-to-tracing/manual-instrumentation). `generate_response` also returns the OpenTelemetry spanID, a hex-encoded string that is used to associate feedback with a specific trace.\n", "\n", "`send_feedback` is a function that sends user feedback to Phoenix via the `span_annotations` REST route." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client = Client()\n", "http_client = httpx.Client()\n", "\n", "\n", "def generate_response(\n", " input_text: str, model: str = \"gpt-3.5-turbo\", temperature: float = 0.1\n", ") -> Dict[str, Any]:\n", " user_message = {\"role\": \"user\", \"content\": input_text, \"uuid\": str(uuid4())}\n", " invocation_parameters = {\"temperature\": temperature}\n", " payload = {\n", " \"model\": model,\n", " **invocation_parameters,\n", " \"messages\": [user_message],\n", " }\n", " headers = {\n", " \"Content-Type\": \"application/json\",\n", " \"Authorization\": f\"Bearer {openai_api_key}\",\n", " }\n", " with tracer.start_as_current_span(\"llm_span\", openinference_span_kind=\"llm\") as span:\n", " span.set_input(user_message)\n", "\n", " # get the active hex-encoded spanID\n", " span_id = span.get_span_context().span_id.to_bytes(8, \"big\").hex()\n", " print(span_id)\n", "\n", " response = http_client.post(OPENAI_API_URL, headers=headers, json=payload)\n", "\n", " if not (200 <= response.status_code < 300):\n", " raise Exception(f\"Failed to call OpenAI API: {response.text}\")\n", " response_json = response.json()\n", "\n", " span.set_output(response_json)\n", "\n", " return response_json, span_id\n", "\n", "\n", "def send_feedback(span_id: str, feedback: int, user_id: str) -> None:\n", " label = \"👍\" if feedback == 1 else \"👎\"\n", " client.annotations.add_span_annotation(\n", " span_id=span_id,\n", " annotation_name=\"user_feedback\",\n", " label=label,\n", " score=feedback,\n", " metadata={\"example_key\": \"123\"},\n", " identifier=user_id,\n", " )\n", " print(f\"Feedback sent for span_id {span_id}: {label}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define an LLM evaluator to run on incorrect responses" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def run_llm_eval(span_id: str, input_text: str, assistant_content: str):\n", " \"\"\"\n", " Evaluates the quality of an LLM response by asking another LLM to classify its correctness.\n", "\n", " Args:\n", " span_id: The ID of the span to evaluate\n", " input_text: The original unchanged user query\n", " assistant_content: The assistant's response to evaluate\n", " \"\"\"\n", " # Create a prompt for the evaluation model\n", " eval_prompt = f\"\"\"\n", " You are an expert evaluator of AI assistant responses. Please evaluate the following:\n", "\n", " User Query: {input_text}\n", "\n", " Assistant Response: {assistant_content}\n", "\n", " Is this response correct, helpful, and appropriate for the user query?\n", " Provide a brief analysis and then classify as either \"CORRECT\" or \"INCORRECT\".\n", "\n", " Format your response as follows:\n", " Analysis: [Your analysis here]\n", " Classification: [CORRECT or INCORRECT]\n", " \"\"\"\n", "\n", " # Call the evaluation model using the OpenAI API\n", "\n", " headers = {\n", " \"Content-Type\": \"application/json\",\n", " \"Authorization\": f\"Bearer {openai_api_key}\",\n", " }\n", "\n", " payload = {\n", " \"model\": \"gpt-4o\", # Using a smaller model for evaluation\n", " \"messages\": [{\"role\": \"user\", \"content\": eval_prompt}],\n", " }\n", "\n", " # Increased timeout to prevent ReadTimeout errors\n", " eval_response = http_client.post(OPENAI_API_URL, headers=headers, json=payload, timeout=60.0)\n", " eval_response = eval_response.json()\n", " print(eval_response)\n", " eval_content = eval_response[\"choices\"][0][\"message\"][\"content\"]\n", "\n", " # Store the evaluation as an annotation\n", " client.annotations.add_span_annotation(\n", " span_id=span_id,\n", " annotation_name=\"correctness\",\n", " annotator_kind=\"LLM\",\n", " label=\"INCORRECT\" if \"Classification: INCORRECT\" in eval_content else \"CORRECT\",\n", " score=1 if \"Classification: INCORRECT\" in eval_content else 0,\n", " explanation=eval_content,\n", " )\n", "\n", " print(f\"LLM Evaluation for span_id {span_id}:\")\n", " print(eval_content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Chat Widget\n", "\n", "We create a simple chat application using IPython widgets. Alongside the chatbot responses we provide feedback buttons that a user can click to provide feedback. These can be seen inside the Phoenix UI!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_gradio_chat():\n", " import gradio as gr\n", "\n", " def chat_response(message, history, user_id):\n", " # Send the message to the OpenAI API and get the response\n", " response_data, span_id = generate_response(message)\n", " assistant_content = response_data[\"choices\"][0][\"message\"][\"content\"]\n", "\n", " # Store the span_id for feedback\n", " return assistant_content, span_id\n", "\n", " def submit_feedback(feedback_type, span_id, message, response, user_id):\n", " if feedback_type == \"positive\":\n", " send_feedback(span_id, 1, user_id)\n", " return \"Thanks for your positive feedback! We'll use it to improve our assistant.\"\n", " else: # negative feedback\n", " send_feedback(span_id, 0, user_id)\n", " run_llm_eval(span_id, message, response)\n", " return \"Thanks for your feedback. We'll work on improving this type of response.\"\n", "\n", " with gr.Blocks() as demo:\n", " gr.HTML(\"<h3>Encyclopedia Chatbot</h3>\")\n", " gr.HTML(\n", " \"<p>Welcome to the Encyclopedia Chatbot. Ask any question about the world, and provide feedback to help us improve!</p>\"\n", " )\n", "\n", " user_id = gr.Dropdown(\n", " choices=[\"user1\", \"user2\", \"user3\", \"user4\", \"user5\"], value=\"user1\", label=\"User ID\"\n", " )\n", "\n", " chatbot = gr.Chatbot(height=400)\n", " msg = gr.Textbox(placeholder=\"Type your message here...\")\n", "\n", " # Hidden state to store the current span_id\n", " current_span_id = gr.State(\"\")\n", " feedback_message = gr.Markdown(\"\")\n", "\n", " def respond(message, chat_history, user_id):\n", " # Get bot response\n", " bot_response, span_id = chat_response(message, chat_history, user_id)\n", "\n", " # Update chat history\n", " chat_history.append((message, bot_response))\n", "\n", " return \"\", chat_history, span_id\n", "\n", " # Send button\n", " msg.submit(respond, [msg, chatbot, user_id], [msg, chatbot, current_span_id])\n", "\n", " with gr.Row():\n", " thumbs_up = gr.Button(\"👍\", scale=1)\n", " thumbs_down = gr.Button(\"👎\", scale=1)\n", "\n", " # Feedback handlers\n", " def handle_positive_feedback(span_id, chat_history, user_id):\n", " if not chat_history:\n", " return \"No message to provide feedback on.\"\n", "\n", " last_user_msg, last_bot_msg = chat_history[-1]\n", " return submit_feedback(\"positive\", span_id, last_user_msg, last_bot_msg, user_id)\n", "\n", " def handle_negative_feedback(span_id, chat_history, user_id):\n", " if not chat_history:\n", " return \"No message to provide feedback on.\"\n", "\n", " last_user_msg, last_bot_msg = chat_history[-1]\n", " return submit_feedback(\"negative\", span_id, last_user_msg, last_bot_msg, user_id)\n", "\n", " thumbs_up.click(\n", " handle_positive_feedback, [current_span_id, chatbot, user_id], feedback_message\n", " )\n", "\n", " thumbs_down.click(\n", " handle_negative_feedback, [current_span_id, chatbot, user_id], feedback_message\n", " )\n", "\n", " return demo\n", "\n", "\n", "# Create and display the Gradio interface\n", "demo = create_gradio_chat()\n", "demo.launch(inline=True, share=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyze feedback using the Phoenix Client\n", "\n", "We can use the Phoenix client to pull the annotated spans. By combining `get_spans_dataframe`\n", "and `get_span_annotations_dataframe` we can create a dataframe of all annotations alongside\n", "span data for analysis!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spans_df = client.spans.get_spans_dataframe(project_identifier=os.environ[\"PHOENIX_PROJECT_NAME\"])\n", "annotations_df = client.spans.get_span_annotations_dataframe(\n", " spans_dataframe=spans_df, project_identifier=os.environ[\"PHOENIX_PROJECT_NAME\"]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "annotations_df.join(spans_df, how=\"inner\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client.spans.get_span_annotations(\n", " span_ids=spans_df.index, project_identifier=os.environ[\"PHOENIX_PROJECT_NAME\"]\n", ")" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Arize-ai/phoenix'

If you have feedback or need assistance with the MCP directory API, please join our Discord server