openai_tracing_tutorial.ipynb•15.8 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
" <p style=\"text-align:center\">\n",
" <img alt=\"phoenix logo\" src=\"https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg\" width=\"200\"/>\n",
" <br>\n",
" <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n",
" |\n",
" <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
" |\n",
" <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email\">Community</a>\n",
" </p>\n",
"</center>\n",
"<h1 align=\"center\">Tracing and Evaluating a Structured Data Extraction Service</h1>\n",
"\n",
"In this tutorial, you will:\n",
"\n",
"- Use OpenAI's [tool calling](https://platform.openai.com/docs/assistants/tools/function-calling) to perform structured data extraction: the task of transforming unstructured input (e.g., user requests in natural language) into structured format (e.g., tabular format),\n",
"- Instrument your OpenAI client to record trace data in [OpenInference tracing](https://github.com/Arize-ai/open-inference-spec/blob/main/trace/spec/traces.md) format,\n",
"- Inspect the traces and spans of your application to visualize your trace data,\n",
"- Export your trace data to run an evaluation on the quality of your structured extractions.\n",
"\n",
"## Background\n",
"\n",
"One powerful feature of the OpenAI chat completions API is tool calling, wherein a user describes the signature and arguments of one or more functions to the OpenAI API via a JSON Schema and natural language descriptions, and the LLM decides when to call each function and provides argument values depending on the context of the conversation. In addition to its primary purpose of integrating function inputs and outputs into a sequence of chat messages, function calling is also useful for structured data extraction, since you can specify a \"function\" that describes the desired format of your structured output. Structured data extraction is useful for a variety of purposes, including ETL or as input to another machine learning model such as a recommender system.\n",
"\n",
"While it's possible to produce structured output without using function calling via careful prompting, function calling is more reliable at producing output that conforms to a particular format. For more details on OpenAI's function calling API, see the [OpenAI documentation](https://platform.openai.com/docs).\n",
"\n",
"Let's get started!\n",
"\n",
"ℹ️ This notebook requires an OpenAI API key."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Install Dependencies and Import Libraries\n",
"\n",
"Install dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install \"openai>=1.0.0\" arize-phoenix arize-phoenix-client jsonschema openinference-instrumentation-openai 'httpx<0.28'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import libraries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import os\n",
"from getpass import getpass\n",
"from typing import Any, Dict, Literal, TypedDict\n",
"\n",
"import jsonschema\n",
"import pandas as pd\n",
"from openai import OpenAI\n",
"\n",
"pd.set_option(\"display.max_colwidth\", None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Configure Your OpenAI API Key and Instantiate Your OpenAI Client\n",
"\n",
"Set your OpenAI API key if it is not already set as an environment variable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n",
" openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n",
"client = OpenAI(api_key=openai_api_key)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Instrument Your OpenAI Client\n",
"\n",
"Instrument your OpenAI client with a tracer that emits telemetry data in OpenInference format. [OpenInference](https://arize-ai.github.io/open-inference-spec/trace/) is an open standard for capturing and storing LLM application traces that enables LLM applications to seamlessly integrate with LLM observability solutions such as Phoenix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from openinference.instrumentation.openai import OpenAIInstrumentor\n",
"\n",
"from phoenix.otel import register\n",
"\n",
"tracer_provider = register()\n",
"OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Run Phoenix in the Background\n",
"\n",
"Launch Phoenix as a background session to collect the trace data emitted by your instrumented OpenAI client."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import phoenix as px\n",
"\n",
"(session := px.launch_app()).view()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Extract Your Structured Data\n",
"\n",
"We'll extract structured data from the following list of ten travel requests."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"travel_requests = [\n",
" \"Can you recommend a luxury hotel in Tokyo with a view of Mount Fuji for a romantic honeymoon?\",\n",
" \"I'm looking for a mid-range hotel in London with easy access to public transportation for a solo backpacking trip. Any suggestions?\",\n",
" \"I need a budget-friendly hotel in San Francisco close to the Golden Gate Bridge for a family vacation. What do you recommend?\",\n",
" \"Can you help me find a boutique hotel in New York City with a rooftop bar for a cultural exploration trip?\",\n",
" \"I'm planning a business trip to Tokyo and I need a hotel near the financial district. What options are available?\",\n",
" \"I'm traveling to London for a solo vacation and I want to stay in a trendy neighborhood with great shopping and dining options. Any recommendations for hotels?\",\n",
" \"I'm searching for a luxury beachfront resort in San Francisco for a relaxing family vacation. Can you suggest any options?\",\n",
" \"I need a mid-range hotel in New York City with a fitness center and conference facilities for a business trip. Any suggestions?\",\n",
" \"I'm looking for a budget-friendly hotel in Tokyo with easy access to public transportation for a backpacking trip. What do you recommend?\",\n",
" \"I'm planning a honeymoon in London and I want a luxurious hotel with a spa and romantic atmosphere. Can you suggest some options?\",\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The OpenAI API uses [JSON Schema](https://json-schema.org/) and natural language descriptions to specify the signature of a function to call. In this case, we'll describe a function to record the following attributes of the unstructured text input:\n",
"\n",
"- **location:** The desired destination,\n",
"- **budget_level:** A categorical budget preference,\n",
"- **purpose:** The purpose of the trip.\n",
"\n",
"The use of JSON Schema enables us to define the type of each field in the output and even enumerate valid values in the case of categorical outputs. OpenAI function calling can thus be used for tasks that might previously have been performed by named-entity recognition (NER) and/ or classification models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"parameters_schema = {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"location\": {\n",
" \"type\": \"string\",\n",
" \"description\": 'The desired destination location. Use city, state, and country format when possible. If no destination is provided, return \"unstated\".',\n",
" },\n",
" \"budget_level\": {\n",
" \"type\": \"string\",\n",
" \"enum\": [\"low\", \"medium\", \"high\", \"not_stated\"],\n",
" \"description\": 'The desired budget level. If no budget level is provided, return \"not_stated\".',\n",
" },\n",
" \"purpose\": {\n",
" \"type\": \"string\",\n",
" \"enum\": [\"business\", \"pleasure\", \"other\", \"non_stated\"],\n",
" \"description\": 'The purpose of the trip. If no purpose is provided, return \"not_stated\".',\n",
" },\n",
" },\n",
" \"required\": [\"location\", \"budget_level\", \"purpose\"],\n",
"}\n",
"tool_schema = {\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"record_travel_request_attributes\",\n",
" \"description\": \"Records the attributes of a travel request\",\n",
" \"parameters\": parameters_schema,\n",
" },\n",
"}\n",
"system_message = (\n",
" \"You are an assistant that parses and records the attributes of a user's travel request.\"\n",
")\n",
"\n",
"\n",
"def extract_raw_travel_request_attributes_string(\n",
" travel_request: str,\n",
" tool_schema: Dict[str, Any],\n",
" system_message: str,\n",
" client: OpenAI,\n",
" model: str = \"gpt-4o\",\n",
") -> str:\n",
" chat_completion = client.chat.completions.create(\n",
" model=model,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_message},\n",
" {\"role\": \"user\", \"content\": travel_request},\n",
" ],\n",
" tools=[tool_schema],\n",
" # By default, the LLM will choose whether or not to call a function given the conversation context.\n",
" # The line below forces the LLM to call the function so that the output conforms to the schema.\n",
" tool_choice={\"type\": \"function\", \"function\": {\"name\": tool_schema[\"function\"][\"name\"]}},\n",
" )\n",
" tool_call = chat_completion.choices[0].message.tool_calls[0]\n",
" return tool_call.function.arguments"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the extractions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"raw_travel_attributes_column = []\n",
"for travel_request in travel_requests:\n",
" print(\"Travel request:\")\n",
" print(\"==============\")\n",
" print(travel_request)\n",
" print()\n",
" raw_travel_attributes = extract_raw_travel_request_attributes_string(\n",
" travel_request, tool_schema, system_message, client\n",
" )\n",
" raw_travel_attributes_column.append(raw_travel_attributes)\n",
" print(\"Raw Travel Attributes:\")\n",
" print(\"=====================\")\n",
" print(raw_travel_attributes)\n",
" print()\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your trace data should appear in real-time in the Phoenix UI."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f\"🔥🐦 Open the Phoenix UI if you haven't already: {session.url}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While function calling improves the reliability of your LLM output, it does not guarantee that the output conforms to the input JSON schema, or even that the output is JSON-parseable. It is thus recommended to validate the output before feeding it into any downstream processes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class TravelAttributes(TypedDict):\n",
" location: str\n",
" budget_level: Literal[\"low\", \"medium\", \"high\", \"not_stated\"]\n",
" purpose: Literal[\"business\", \"pleasure\", \"other\", \"non_stated\"]\n",
"\n",
"\n",
"def parse_travel_attributes(\n",
" travel_attributes_string: str, schema: Dict[str, Any]\n",
") -> TravelAttributes:\n",
" \"\"\"\n",
" Parses a raw travel attributes string into a structured dictionary of travel attributes.\n",
"\n",
" Parameters:\n",
" - travel_attributes_string (str): A raw travel attributes string.\n",
" - schema (Dict[str, Any]): A JSON Schema to validate.\n",
"\n",
" Returns:\n",
" - TravelAttributes: Parsed travel attributes.\n",
" \"\"\"\n",
" travel_attributes = json.loads(travel_attributes_string)\n",
" jsonschema.validate(instance=travel_attributes, schema=schema)\n",
" return travel_attributes\n",
"\n",
"\n",
"records = []\n",
"for travel_request, raw_travel_attributes in zip(travel_requests, raw_travel_attributes_column):\n",
" is_json_parseable = True\n",
" conforms_to_json_schema = True\n",
" travel_attributes = None\n",
" try:\n",
" travel_attributes = parse_travel_attributes(raw_travel_attributes, parameters_schema)\n",
" except json.JSONDecodeError:\n",
" is_json_parseable = False\n",
" conforms_to_json_schema = False\n",
" except jsonschema.ValidationError:\n",
" conforms_to_json_schema = False\n",
" records.append(\n",
" {\n",
" \"travel_request\": travel_request,\n",
" \"raw_travel_attributes\": raw_travel_attributes,\n",
" \"is_json_parseable\": is_json_parseable,\n",
" \"conforms_to_json_schema\": conforms_to_json_schema,\n",
" **(travel_attributes or {}),\n",
" }\n",
" )\n",
"\n",
"output_df = pd.DataFrame(records)\n",
"output_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 6. Export and Evaluate Your Trace Data\n",
"\n",
"Your OpenInference trace data is collected by Phoenix and can be exported to a pandas dataframe for further analysis and evaluation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from phoenix.client import Client\n",
"\n",
"trace_df = Client().spans.get_spans_dataframe()\n",
"trace_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Recap\n",
"\n",
"Congrats! In this tutorial, you:\n",
"\n",
"- Built a service to perform structured data extraction on unstructured text using OpenAI function calling\n",
"- Instrumented your service with an OpenInference tracer\n",
"- Examined your telemetry data in Phoenix\n",
"\n",
"Check back soon for tips on evaluating the performance of your service using LLM evals."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}