deno_datasets_and_experiments_quickstart.ipynb•13.5 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
" <p style=\"text-align:center\">\n",
" <img alt=\"phoenix logo\" src=\"https://raw.githubusercontent.com/Arize-ai/phoenix-assets/9e6101d95936f4bd4d390efc9ce646dc6937fb2d/images/socal/github-large-banner-phoenix.jpg\" width=\"1000\"/>\n",
" <br>\n",
" <br>\n",
" <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n",
" |\n",
" <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
" |\n",
" <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email\">Community</a>\n",
" </p>\n",
"</center>\n",
"<h1 align=\"center\">Quickstart: Datasets and Experiments in Deno</h1>\n",
"\n",
"Phoenix helps you run experiments over your AI and LLM applications to evaluate and iteratively improve their performance. This quickstart shows you how to get up and running quickly with the JavaScript SDK in a Deno environment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Let's start by importing the necessary packages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import { createClient } from \"npm:@arizeai/phoenix-client@latest\";\n",
"import { runExperiment, asEvaluator, evaluateExperiment } from \"npm:@arizeai/phoenix-client@latest/experiments\";\n",
"import { createDataset } from \"npm:@arizeai/phoenix-client@latest/datasets\";\n",
"import { OpenAI } from \"npm:openai\";"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set up your OpenAI API key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"const openaiApiKey = prompt(\"Enter your OpenAI API key:\");\n",
"\n",
"if (!openaiApiKey) {\n",
" console.error('Please enter your OpenAI API key to continue');\n",
" Deno.exit(1);\n",
"}\n",
"\n",
"const openai = new OpenAI({\n",
" apiKey: openaiApiKey,\n",
"});"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **Note:** The code below only initializes the Phoenix client. You must have the Phoenix server running separately.\n",
"> See the [Docker deployment guide](https://arize.com/docs/phoenix/self-hosting/deployment-options/docker#docker) for information on how to start the Phoenix server."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"const client = createClient();\n",
"console.log('Phoenix client initialized. Access Phoenix UI at http://localhost:6006');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating a Dataset\n",
"\n",
"Let's create examples for our dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"console.log('Creating dataset examples...');\n",
"\n",
"// Create examples directly as an array\n",
"const { datasetId } = await createDataset({\n",
" name: `quickstart-dataset-${Date.now()}`,\n",
" description: \"Dataset for quickstart example\",\n",
" examples: [\n",
" {\n",
" id: `example-1`,\n",
" updatedAt: new Date(),\n",
" input: { question: \"What is Paul Graham known for?\" },\n",
" output: { answer: \"Co-founding Y Combinator and writing on startups and techology.\" },\n",
" metadata: { topic: \"tech\" }\n",
" },\n",
" {\n",
" id: `example-2`,\n",
" updatedAt: new Date(),\n",
" input: { question: \"What companies did Elon Musk found?\" },\n",
" output: { answer: \"Tesla, SpaceX, Neuralink, The Boring Company, and co-founded PayPal.\" },\n",
" metadata: { topic: \"entrepreneurs\" }\n",
" },\n",
" {\n",
" id: `example-3`,\n",
" updatedAt: new Date(),\n",
" input: { question: \"What is Moore's Law?\" },\n",
" output: { answer: \"The observation that the number of transistors in a dense integrated circuit doubles about every two years.\" },\n",
" metadata: { topic: \"computing\" }\n",
" }\n",
"]})\n",
"\n",
"console.log(`examples created`);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Defining the Task\n",
"\n",
"Define the task function that will be evaluated."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import { type RunExperimentParams } from \"npm:@arizeai/phoenix-client/experiments\";\n",
"\n",
"const taskPromptTemplate = \"Answer in a few words: {question}\";\n",
"\n",
"const task: RunExperimentParams[\"task\"] = async (example) => {\n",
" const question = example.input.question || \"No question provided\";\n",
" const messageContent = taskPromptTemplate.replace('{question}', question);\n",
" \n",
" const response = await openai.chat.completions.create({\n",
" model: \"gpt-4o\",\n",
" messages: [{ role: \"user\", content: messageContent }]\n",
" });\n",
" \n",
" return response.choices[0]?.message?.content || \"\";\n",
"};"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating Evaluators\n",
"\n",
"Let's define evaluators for our experiment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"// 1. Code-based evaluator that checks if response contains specific keywords\n",
"const containsKeyword = asEvaluator({\n",
" name: \"contains_keyword\",\n",
" kind: \"CODE\",\n",
" evaluate: async ({ output }) => {\n",
" const keywords = [\"Y Combinator\", \"YC\"];\n",
" const outputStr = String(output).toLowerCase();\n",
" const contains = keywords.some(keyword => \n",
" outputStr.toLowerCase().includes(keyword.toLowerCase())\n",
" );\n",
" \n",
" return {\n",
" score: contains ? 1.0 : 0.0,\n",
" label: contains ? \"contains_keyword\" : \"missing_keyword\",\n",
" metadata: { keywords },\n",
" explanation: contains ? \n",
" `Output contains one of the keywords: ${keywords.join(\", \")}` : \n",
" `Output does not contain any of the keywords: ${keywords.join(\", \")}`\n",
" };\n",
" }\n",
"});\n",
"\n",
"console.log(\"Created 'contains_keyword' evaluator\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import { createClassificationEvaluator } from \"npm:@arizeai/phoenix-evals@latest\";\n",
"import { openai } from \"npm:@ai-sdk/openai@latest\";\n",
"\n",
"// 2. LLM-based evaluator for conciseness\n",
"const concisenessEval = createClassificationEvaluator({\n",
" model: openai(\"gpt-4o\"),\n",
" promptTemplate: \"Rate the following test as concise or verbose {{output}}\",\n",
" choices: { concise: 1, verbose: 0 }\n",
"});\n",
"\n",
"const conciseness = asEvaluator({\n",
" name: \"conciseness\",\n",
" kind: \"LLM\", \n",
" evaluate: async ({ output }) => {\n",
" const res = await concisenessEval.evaluate({ output });\n",
" \n",
" return {\n",
" ...res,\n",
" metadata: {},\n",
" };\n",
" }\n",
"});\n",
"\n",
"console.log(\"Created 'conciseness' evaluator\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"// 3. Custom Jaccard similarity evaluator\n",
"const jaccardSimilarity = asEvaluator({\n",
" name: \"jaccard_similarity\",\n",
" kind: \"CODE\",\n",
" evaluate: async ({ output, expected }) => {\n",
" const actualWords = new Set(String(output).toLowerCase().split(\" \"));\n",
" const expectedAnswer = expected?.answer || \"\";\n",
" const expectedWords = new Set(expectedAnswer.toLowerCase().split(\" \"));\n",
" \n",
" const wordsInCommon = new Set(\n",
" [...actualWords].filter(word => expectedWords.has(word))\n",
" );\n",
" \n",
" const allWords = new Set([...actualWords, ...expectedWords]);\n",
" const score = wordsInCommon.size / allWords.size;\n",
" \n",
" return {\n",
" score,\n",
" label: score > 0.5 ? \"similar\" : \"dissimilar\",\n",
" metadata: { \n",
" actualWordsCount: actualWords.size,\n",
" expectedWordsCount: expectedWords.size,\n",
" commonWordsCount: wordsInCommon.size,\n",
" allWordsCount: allWords.size\n",
" },\n",
" explanation: `Jaccard similarity: ${score}`\n",
" };\n",
" }\n",
"});\n",
"\n",
"console.log(\"Created 'jaccard_similarity' evaluator\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import { createClassificationEvaluator } from \"npm:@arizeai/phoenix-evals@latest\";\n",
"import { openai } from \"npm:@ai-sdk/openai@latest\";\n",
"\n",
"const accuracyEval = createClassificationEvaluator({\n",
" model: openai(\"gpt-4o\"),\n",
" promptTemplate: \"Given the QUESTION and REFERENCE_ANSWER, determine whether the ANSWER is accurate. <question>{{question}}</question><reference>{{referenceAnswer}}</reference><answer>{{referenceAnswer}}</answer>\"\n",
"});\n",
"\n",
"// 4. LLM-based accuracy evaluator\n",
"const accuracy = asEvaluator({\n",
" name: \"accuracy\",\n",
" kind: \"LLM\",\n",
" evaluate: async ({ input, output, expected }) => {\n",
" // Safely access question and answer with type assertions and fallbacks\n",
" const question = input.question || \"No question provided\";\n",
" const referenceAnswer = expected?.answer || \"No reference answer provided\";\n",
" \n",
" const res = await accuracyEval.evaluate({ question, referenceAnswer, answer: String(output) })\n",
" return {\n",
" ...res,\n",
" metadata: {},\n",
" };\n",
" }\n",
"});\n",
"\n",
"console.log(\"Created 'accuracy' evaluator\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running the Experiment\n",
"\n",
"Now let's run the experiment with our defined evaluators."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"console.log('Running initial experiment...');\n",
"\n",
"// Pass dataset directly as the array of examples\n",
"const experiment = await runExperiment({\n",
" client,\n",
" experimentName: \"simple-experiment\",\n",
" dataset: {\n",
" datasetId,\n",
" },\n",
" task,\n",
" evaluators: [jaccardSimilarity, accuracy],\n",
" logger: console,\n",
"});\n",
"\n",
"console.log('Initial experiment completed with ID:', experiment.id);\n",
"\n",
"await Deno.jupyter.md`\n",
"### Initial experiment results\n",
"\n",
"Experiment ID: \\`${experiment.id}\\`\n",
"`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running Additional Evaluators\n",
"\n",
"Let's run more evaluators on the same experiment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"console.log('Running additional evaluators...');\n",
"\n",
"// Use evaluateExperiment to add evaluators to an existing experiment\n",
"const evaluation = await evaluateExperiment({\n",
" client,\n",
" experiment, // Use the existing experiment object\n",
" evaluators: [containsKeyword, conciseness],\n",
" logger: console\n",
"});\n",
"\n",
"console.log('Additional evaluations completed');\n",
"console.log('Evaluation runs:', evaluation.runs.length);\n",
"\n",
"await Deno.jupyter.md`\n",
"### Additional Evaluation Results\n",
"\n",
"The experiment has been evaluated with additional evaluators. View the complete results in the Phoenix UI:\n",
"\n",
"http://localhost:6006\n",
"\n",
"Experiment ID: \\`${experiment.id}\\`\n",
"`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"You've successfully:\n",
"1. Created a dataset with examples\n",
"2. Defined a task using the OpenAI API\n",
"3. Created multiple evaluators using both code-based and LLM-based approaches\n",
"4. Run an experiment and evaluated the results\n",
"5. Added additional evaluators to the experiment\n",
"\n",
"You can now explore the results in the Phoenix UI and iterate on your experiments to improve performance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"language_info": {
"name": "typescript"
}
},
"nbformat": 4,
"nbformat_minor": 2
}