evaluate_user_frustration_classifications.ipynb•1.05 MB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "dJeZx71wW7kc"
},
"source": [
"<center>\n",
" <p style=\"text-align:center\">\n",
" <img alt=\"phoenix logo\" src=\"https://storage.googleapis.com/arize-assets/phoenix/assets/phoenix-logo-light.svg\" width=\"200\"/>\n",
" <br>\n",
" <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n",
" |\n",
" <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
" |\n",
" <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email\">Community</a>\n",
" </p>\n",
"</center>\n",
"<h1 align=\"center\">User Frustration Evals</h1>\n",
"\n",
"Arize provides tooling to evaluate LLM applications, including tools to determine if a user became frustrated during a conversation with an AI assistant.\n",
"\n",
"The purpose of this notebook is:\n",
"\n",
"- to evaluate the performance of an LLM-assisted user frustration detection\n",
"- to provide an experimental framework for users to iterate and improve on the default classification template.\n",
"\n",
"## Install Dependencies and Import Libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "bo5abN-5W7kf"
},
"outputs": [],
"source": [
"#####################\n",
"## N_EVAL_SAMPLE_SIZE\n",
"#####################\n",
"# Eval sample size determines the run time\n",
"# 100 samples: GPT-4 ~ 80 sec / GPT-3.5 ~ 40 sec\n",
"# 1,000 samples: GPT-4 ~15-17 min / GPT-3.5 ~ 6-7min (depending on retries)\n",
"# 10,000 samples GPT-4 ~170 min / GPT-3.5 ~ 70min\n",
"N_EVAL_SAMPLE_SIZE = 100"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0lCBpUmpW7kg",
"outputId": "cf2d2195-13ee-4128-e439-c4556780cf53"
},
"outputs": [],
"source": [
"!pip install -qqq \"arize-phoenix\" \"openai>=1\" ipython matplotlib pycm scikit-learn tiktoken openinference-instrumentation-openai 'httpx<0.28'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "K-pQ_E9cW7kg"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/jasonlopatecki/vs_projects/haystack-venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"source": [
"import os\n",
"from getpass import getpass\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import openai\n",
"import pandas as pd\n",
"from pycm import ConfusionMatrix\n",
"from sklearn.metrics import classification_report\n",
"\n",
"from phoenix.evals import (\n",
" USER_FRUSTRATION_PROMPT_RAILS_MAP,\n",
" USER_FRUSTRATION_PROMPT_TEMPLATE,\n",
" OpenAIModel,\n",
" llm_classify,\n",
")\n",
"\n",
"pd.set_option(\"display.max_colwidth\", None)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"OrderedDict([(True, 'frustrated'), (False, 'ok')])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"USER_FRUSTRATION_PROMPT_RAILS_MAP"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IXpQwB9oRNTw"
},
"source": [
"## Download Benchmark Dataset (need to add this dataset to repo)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 397
},
"id": "e1VVyWz0RNuO",
"outputId": "5b0fcb0d-5701-4c95-a67f-9c3c3c17fa3d"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>conversation</th>\n",
" <th>frustrated</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CUSTOMER: Hi, I seem to be having trouble with my internet service. It's been cutting in and out all morning.\\n\\nASSISTANT: I'm sorry to hear that you're experiencing this issue. Can you tell me more about the problem?\\n\\nCUSTOMER: It keeps disconnecting, despite resetting the modem multiple times. I've been trying to work online and it's becoming increasingly frustrating.\\n\\nASSISTANT: I appreciate your patience. Have you tried performing a speed test on your connection?\\n\\nCUSTOMER: Yes, I have. The results were all over the place, there's clearly instability. Can I speak to a live customer service rep about this?\\n\\nASSISTANT: Of course, let me connect you to our live support team who will be able to assist you further.</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>CUSTOMER: Hi, I am experiencing some issues with the software I recently subscribed to. \\n\\nASSISTANT: I'm sorry to hear that you are having trouble. Can you describe the problem you're facing?\\n\\nCUSTOMER: Well, I am unable to sign in. The system does not recognize my email or password.\\n\\nASSISTANT: That's odd. I'm sure we can figure this out. Can you please confirm your email? \\n\\nCUSTOMER: Look, I don't think you get it. I did not receive the confirmation email you promised at registration. That is the issue!\\n\\nASSISTANT: I apologize for this oversight. I'll ensure the email is sent immediately. Please, check your inbox shortly.</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>CUSTOMER: Hi, I'm having issues with my insurance policy claim.\\n\\nASSISTANT: I'm sorry to hear you're having trouble. Can you elaborate on the issue?\\n\\nCUSTOMER: Well, I submitted my claim two weeks ago and haven't received any updates yet.\\n\\nASSISTANT: I understand your concern. Let me check the status of your claim.\\n\\nCUSTOMER: I have been waiting for a while, can I speak with a live customer service representative instead?\\n\\nASSISTANT: Absolutely, allow me to connect you to a live agent who can assist you further.</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>CUSTOMER: Hi, I'm having problems with your online booking service.\\n\\nASSISTANT: I'm sorry to hear that you are having issues. Can you elaborate on what's happening?\\n\\nCUSTOMER: Well, the system is not letting me sign in to complete a reservation.\\n\\nASSISTANT: I see, have you tried using the 'Forgot Password' option to reset your password?\\n\\nCUSTOMER: I've done that twice now and I'm still unable to sign in.\\n\\nASSISTANT: My apologies for the inconvenience. Would you like me to connect you with a live customer service representative?\\n</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>CUSTOMER: Hi, my new Fitness Band isn't tracking my heart rate correctly. \\n\\nASSISTANT: Can you elaborate on the problem that is occurring with the heart rate tracker? \\n\\nCUSTOMER: Yes, the numbers are fluctuating way too much. It got up to 190 bpm during a light jog.\\n\\nASSISTANT: Have you tried resetting the band? \\n\\nCUSTOMER: Yes, several times actually. Nothing seems to resolve this. Can I please speak directly with someone who can handle this?\\n\\nASSISTANT: Of course, I'm sorry for your inconvenience. Let me connect you with our technical support team.\\n</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" conversation \\\n",
"0 CUSTOMER: Hi, I seem to be having trouble with my internet service. It's been cutting in and out all morning.\\n\\nASSISTANT: I'm sorry to hear that you're experiencing this issue. Can you tell me more about the problem?\\n\\nCUSTOMER: It keeps disconnecting, despite resetting the modem multiple times. I've been trying to work online and it's becoming increasingly frustrating.\\n\\nASSISTANT: I appreciate your patience. Have you tried performing a speed test on your connection?\\n\\nCUSTOMER: Yes, I have. The results were all over the place, there's clearly instability. Can I speak to a live customer service rep about this?\\n\\nASSISTANT: Of course, let me connect you to our live support team who will be able to assist you further. \n",
"1 CUSTOMER: Hi, I am experiencing some issues with the software I recently subscribed to. \\n\\nASSISTANT: I'm sorry to hear that you are having trouble. Can you describe the problem you're facing?\\n\\nCUSTOMER: Well, I am unable to sign in. The system does not recognize my email or password.\\n\\nASSISTANT: That's odd. I'm sure we can figure this out. Can you please confirm your email? \\n\\nCUSTOMER: Look, I don't think you get it. I did not receive the confirmation email you promised at registration. That is the issue!\\n\\nASSISTANT: I apologize for this oversight. I'll ensure the email is sent immediately. Please, check your inbox shortly. \n",
"2 CUSTOMER: Hi, I'm having issues with my insurance policy claim.\\n\\nASSISTANT: I'm sorry to hear you're having trouble. Can you elaborate on the issue?\\n\\nCUSTOMER: Well, I submitted my claim two weeks ago and haven't received any updates yet.\\n\\nASSISTANT: I understand your concern. Let me check the status of your claim.\\n\\nCUSTOMER: I have been waiting for a while, can I speak with a live customer service representative instead?\\n\\nASSISTANT: Absolutely, allow me to connect you to a live agent who can assist you further. \n",
"3 CUSTOMER: Hi, I'm having problems with your online booking service.\\n\\nASSISTANT: I'm sorry to hear that you are having issues. Can you elaborate on what's happening?\\n\\nCUSTOMER: Well, the system is not letting me sign in to complete a reservation.\\n\\nASSISTANT: I see, have you tried using the 'Forgot Password' option to reset your password?\\n\\nCUSTOMER: I've done that twice now and I'm still unable to sign in.\\n\\nASSISTANT: My apologies for the inconvenience. Would you like me to connect you with a live customer service representative?\\n \n",
"4 CUSTOMER: Hi, my new Fitness Band isn't tracking my heart rate correctly. \\n\\nASSISTANT: Can you elaborate on the problem that is occurring with the heart rate tracker? \\n\\nCUSTOMER: Yes, the numbers are fluctuating way too much. It got up to 190 bpm during a light jog.\\n\\nASSISTANT: Have you tried resetting the band? \\n\\nCUSTOMER: Yes, several times actually. Nothing seems to resolve this. Can I please speak directly with someone who can handle this?\\n\\nASSISTANT: Of course, I'm sorry for your inconvenience. Let me connect you with our technical support team.\\n \n",
"\n",
" frustrated \n",
"0 True \n",
"1 True \n",
"2 True \n",
"3 True \n",
"4 True "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_parquet(\n",
" \"https://storage.googleapis.com/arize-assets/phoenix/evals/user_frustration-classification/example-user-frustration-dataset.parquet\"\n",
")\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CygC8n-XXOr3"
},
"source": [
"## Phoenix UI (link) for Evals Debugging!!\n",
"Click the link below to see Evals in Phoenix UI. Runs locally on Colab Server and collects OpenAI calls as they are made by the Evals Library."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 71
},
"id": "ab44gf2hXM1S",
"outputId": "a2c967f9-7392-47be-b684-752f85ba9d63"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"🌍 To view the Phoenix app in your browser, visit http://localhost:6006/\n",
"📺 To view the Phoenix app in a notebook, run `px.active_session().view()`\n",
"📖 For more information on how to use Phoenix, check out https://arize.com/docs/phoenix\n"
]
}
],
"source": [
"from openinference.instrumentation.openai import OpenAIInstrumentor\n",
"\n",
"import phoenix as px\n",
"from phoenix.otel import register\n",
"\n",
"(session := px.launch_app()).view()\n",
"tracer_provider = register()\n",
"OpenAIInstrumentor(tracer_provider=tracer_provider).instrument()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UCbpF0YaaY3_"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ThbviZTrW7ki"
},
"source": [
"## Display User Frustration Classification Template (Need to add below template to repo)\n",
"\n",
"View the default template used to classify user frustration. You can tweak this template and evaluate its performance relative to the default.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xHIYx0FEW7ki",
"outputId": "f1f7a055-18d1-46ac-9225-db1b4bfeacc0"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" You are given a conversation where between a user and an assistant.\n",
" Here is the conversation:\n",
" [BEGIN DATA]\n",
" *****************\n",
" Conversation:\n",
" {conversation}\n",
" *****************\n",
" [END DATA]\n",
"\n",
" Examine the conversation and determine whether or not the user got frustrated from the experience.\n",
" Frustration can range from midly frustrated to extremely frustrated. If the user seemed frustrated\n",
" at the beginning of the conversation but seemed satisfied at the end, they should not be deemed\n",
" as frustrated. Focus on how the user left the conversation.\n",
"\n",
" Your response must be a single word, either \"frustrated\" or \"ok\", and should not\n",
" contain any text or characters aside from that word. \"frustrated\" means the user was left\n",
" frustrated as a result of the conversation. \"ok\" means that the user did not get frustrated\n",
" from the conversation.\n",
"\n"
]
}
],
"source": [
"print(USER_FRUSTRATION_PROMPT_TEMPLATE)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TkqPkZuAW7kj"
},
"source": [
"The template variables are:\n",
"\n",
"- **conversation:** the chat conversation between a user and an assistant."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VaJES2g2W7kj"
},
"source": [
"## Configure the LLM\n",
"\n",
"Configure your OpenAI API key."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "bXpXaQ4BW7kj",
"outputId": "3a2c12a2-17da-4d90-9e9a-0de7dc1260e9"
},
"outputs": [],
"source": [
"if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n",
" openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n",
"openai.api_key = openai_api_key\n",
"os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FH-pm8_gW7kj"
},
"source": [
"## Benchmark Dataset Sample\n",
"Sample size determines run time. It's recommended to start with a small sample (e.g., 100 data points) and iterate from there."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zR4NXdeGW7kk",
"outputId": "74be4ab6-dd7b-493d-b47d-85be5e42ac3c"
},
"outputs": [
{
"data": {
"text/plain": [
"(100, 2)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_sample = df.sample(n=N_EVAL_SAMPLE_SIZE).reset_index(drop=True)\n",
"df_sample.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zZBUZXK8W7kk"
},
"source": [
"## LLM Evals: User Frustration Classifications GPT-4\n",
"Run user frustration classifications against a subset of the data.\n",
"Instantiate the LLM and set parameters."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "i3xNu2J4W7kk",
"outputId": "a2970668-aa39-4c88-a74a-453225dad198"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The `model_name` field is deprecated. Use `model` instead. This will be removed in a future release.\n"
]
}
],
"source": [
"model = OpenAIModel(\n",
" model_name=\"gpt-4\",\n",
" temperature=0.0,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "MbG8V-fhW7kk",
"outputId": "7fb2fd0b-fc1a-4e3d-c2d6-13cc2de41c35"
},
"outputs": [
{
"data": {
"text/plain": [
"\"Hello! I'm working perfectly. How can I assist you today?\""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model(\"Hello world, this is a test if you are working?\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pj4yLqm5W7kk"
},
"source": [
"## Run User Frustration Classifications\n",
"\n",
"Run user frustration classifications against a subset of the data.\n",
"\n",
"Verbose mode is active below, it will print out RateLimit handeling and rails\n",
"(railing / cleaning up the text ouptut to fixed values)\n",
"\n",
"Rails will take a messy text output like \"frustrated...\" -> \"frustrated\"\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"id": "TAVaQvVJQAbE"
},
"outputs": [],
"source": [
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"OrderedDict([(True, 'frustrated'), (False, 'ok')])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"USER_FRUSTRATION_PROMPT_RAILS_MAP"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000,
"referenced_widgets": [
"90aa2c05581b437481faab5c6d0f4ec2",
"789d884dbb9547b2871a7f4e0b3d4c82",
"d37a3ba29c264c50a4263407e2b175ff",
"81b61a82d6ad4b3d852941fe1b62e704",
"a9b25926eaba49a294f11d5959ff1126",
"58a5bc20d66c47d393fdbdc7c3fc1c34",
"821bcdb367604e998d80f0ed92aa4c44",
"bb24856156f04436b17b3d102d3c2056",
"72375a9bd56448b38134735ee2507e9a",
"76b4ee1c0ec342d580a8e0dc8f3bdad2",
"62db587137984e47825f6d8d18deac9a"
]
},
"id": "944__QIAW7kl",
"outputId": "f3d8b8a2-7072-4ea2-ea22-f8aede1f8042"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"llm_classify | | 0/100 (0.0%) | ⏳ 00:00<? | ?it/s"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"llm_classify |██████████| 100/100 (100.0%) | ⏳ 00:03<00:00 | 28.89it/s\n"
]
}
],
"source": [
"# The rails is used to hold the output to specific values based on the template\n",
"# It will remove text such as \",,,\" or \"...\"\n",
"# Will ensure the binary value expected from the template is returned\n",
"rails = list(USER_FRUSTRATION_PROMPT_RAILS_MAP.values())\n",
"\n",
"frustration_classifications = llm_classify(\n",
" dataframe=df_sample,\n",
" template=USER_FRUSTRATION_PROMPT_TEMPLATE,\n",
" model=model,\n",
" rails=rails,\n",
" concurrency=20,\n",
")[\"label\"].tolist()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BSaAQlz1W7kl"
},
"source": [
"## Evaluate Classifications\n",
"\n",
"Evaluate the predictions against human-labeled ground-truth user frustration labels."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 651
},
"id": "jB6y9j4BW7kl",
"outputId": "3e09f1b0-1268-4121-9894-2a3ed2e859e1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" frustrated 1.00 0.80 0.89 49\n",
" ok 0.84 1.00 0.91 51\n",
"\n",
" accuracy 0.90 100\n",
" macro avg 0.92 0.90 0.90 100\n",
"weighted avg 0.92 0.90 0.90 100\n",
"\n"
]
},
{
"data": {
"text/plain": [
"<Axes: title={'center': 'Confusion Matrix (Normalized)'}, xlabel='Predicted Classes', ylabel='Actual Classes'>"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"true_labels = df_sample[\"frustrated\"].map(USER_FRUSTRATION_PROMPT_RAILS_MAP).tolist()\n",
"\n",
"print(classification_report(true_labels, frustration_classifications, labels=rails))\n",
"confusion_matrix = ConfusionMatrix(\n",
" actual_vector=true_labels, predict_vector=frustration_classifications, classes=rails\n",
")\n",
"confusion_matrix.plot(\n",
" cmap=plt.colormaps[\"Blues\"],\n",
" number_label=True,\n",
" normalized=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WizGtDCbW7kl"
},
"source": [
"## LLM Evals: User Frustration Classifications GPT-3.5\n",
"Run user frustration classifications against a subset of the data using GPT-3.5."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "Sftw-qP3W7kl"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The `model_name` field is deprecated. Use `model` instead. This will be removed in a future release.\n"
]
}
],
"source": [
"model = OpenAIModel(model_name=\"gpt-3.5-turbo\", temperature=0.0, request_timeout=20)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 49,
"referenced_widgets": [
"522e8eb0b35847f2bd02c2cc1b5991e7",
"e9081b98ab0444b28ec184856072a1ef",
"071b56bd8fb34b368602d29bc1707cdd",
"547dd6eb2ad14f01a8703116656d4301",
"be387a3166664a14bf9300a425d833be",
"b13aa9b245284d8e8d5bc54e7c78de5a",
"1ff8614880b34909b8a782693d389e2c",
"741e9f62186445b58e135b3400fd3e57",
"36ca785ab8b84f3983194627dfebee9e",
"8c8f5900feaf4d17bf178e465c764060",
"58345bcb04ce4aed8dc6700160133cbc"
]
},
"id": "lDPyaPbDW7kl",
"outputId": "7fac6234-d231-43e3-9172-41c99bee97e3"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"llm_classify |██████████| 100/100 (100.0%) | ⏳ 00:19<00:00 | 5.13it/s\n"
]
}
],
"source": [
"rails = list(USER_FRUSTRATION_PROMPT_RAILS_MAP.values())\n",
"\n",
"frustration_classifications = llm_classify(\n",
" dataframe=df_sample,\n",
" template=USER_FRUSTRATION_PROMPT_TEMPLATE,\n",
" model=model,\n",
" rails=rails,\n",
" verbose=False,\n",
")[\"label\"].tolist()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 651
},
"id": "Ize44U4EW7km",
"outputId": "326119b8-9c82-43b6-9be1-a125b180e77b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" frustrated 0.98 0.96 0.97 49\n",
" ok 0.96 0.98 0.97 51\n",
"\n",
" accuracy 0.97 100\n",
" macro avg 0.97 0.97 0.97 100\n",
"weighted avg 0.97 0.97 0.97 100\n",
"\n"
]
},
{
"data": {
"text/plain": [
"<Axes: title={'center': 'Confusion Matrix (Normalized)'}, xlabel='Predicted Classes', ylabel='Actual Classes'>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"true_labels = df_sample[\"frustrated\"].map(USER_FRUSTRATION_PROMPT_RAILS_MAP).tolist()\n",
"\n",
"print(classification_report(true_labels, frustration_classifications, labels=rails))\n",
"confusion_matrix = ConfusionMatrix(\n",
" actual_vector=true_labels, predict_vector=frustration_classifications, classes=rails\n",
")\n",
"confusion_matrix.plot(\n",
" cmap=plt.colormaps[\"Blues\"],\n",
" number_label=True,\n",
" normalized=True,\n",
")"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"071b56bd8fb34b368602d29bc1707cdd": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_741e9f62186445b58e135b3400fd3e57",
"max": 100,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_36ca785ab8b84f3983194627dfebee9e",
"value": 100
}
},
"1ff8614880b34909b8a782693d389e2c": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"36ca785ab8b84f3983194627dfebee9e": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"522e8eb0b35847f2bd02c2cc1b5991e7": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_e9081b98ab0444b28ec184856072a1ef",
"IPY_MODEL_071b56bd8fb34b368602d29bc1707cdd",
"IPY_MODEL_547dd6eb2ad14f01a8703116656d4301"
],
"layout": "IPY_MODEL_be387a3166664a14bf9300a425d833be"
}
},
"547dd6eb2ad14f01a8703116656d4301": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_8c8f5900feaf4d17bf178e465c764060",
"placeholder": "",
"style": "IPY_MODEL_58345bcb04ce4aed8dc6700160133cbc",
"value": " 100/100 (100.0%) | ⏳ 01:04<00:00 | 1.72it/s"
}
},
"58345bcb04ce4aed8dc6700160133cbc": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"58a5bc20d66c47d393fdbdc7c3fc1c34": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"62db587137984e47825f6d8d18deac9a": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"72375a9bd56448b38134735ee2507e9a": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"741e9f62186445b58e135b3400fd3e57": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"76b4ee1c0ec342d580a8e0dc8f3bdad2": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"789d884dbb9547b2871a7f4e0b3d4c82": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_58a5bc20d66c47d393fdbdc7c3fc1c34",
"placeholder": "",
"style": "IPY_MODEL_821bcdb367604e998d80f0ed92aa4c44",
"value": "llm_classify "
}
},
"81b61a82d6ad4b3d852941fe1b62e704": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_76b4ee1c0ec342d580a8e0dc8f3bdad2",
"placeholder": "",
"style": "IPY_MODEL_62db587137984e47825f6d8d18deac9a",
"value": " 100/100 (100.0%) | ⏳ 00:20<00:00 | 20.76it/s"
}
},
"821bcdb367604e998d80f0ed92aa4c44": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"8c8f5900feaf4d17bf178e465c764060": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"90aa2c05581b437481faab5c6d0f4ec2": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_789d884dbb9547b2871a7f4e0b3d4c82",
"IPY_MODEL_d37a3ba29c264c50a4263407e2b175ff",
"IPY_MODEL_81b61a82d6ad4b3d852941fe1b62e704"
],
"layout": "IPY_MODEL_a9b25926eaba49a294f11d5959ff1126"
}
},
"a9b25926eaba49a294f11d5959ff1126": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"b13aa9b245284d8e8d5bc54e7c78de5a": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"bb24856156f04436b17b3d102d3c2056": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"be387a3166664a14bf9300a425d833be": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"d37a3ba29c264c50a4263407e2b175ff": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_bb24856156f04436b17b3d102d3c2056",
"max": 100,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_72375a9bd56448b38134735ee2507e9a",
"value": 100
}
},
"e9081b98ab0444b28ec184856072a1ef": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_b13aa9b245284d8e8d5bc54e7c78de5a",
"placeholder": "",
"style": "IPY_MODEL_1ff8614880b34909b8a782693d389e2c",
"value": "llm_classify "
}
}
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}