mcp-playwright

Overview Schema Related Servers Score Discussions

crewAI-examples
notebooks
Coding Assistant

coding_assistant_eval.ipynb

coding_assistant_eval.ipynb•94.8 KiB

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Set environment variables" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import getpass\n", "import os\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")\n", "# Apply a patch to allow nested asyncio loops in Jupyter\n", "import nest_asyncio\n", "nest_asyncio.apply()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Langgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install dependencies" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: langchain_community in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (0.3.12)\n", "Requirement already satisfied: langchain-openai in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (0.2.12)\n", "Requirement already satisfied: langchain-anthropic in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (0.3.0)\n", "Requirement already satisfied: langchain in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (0.3.12)\n", "Requirement already satisfied: langgraph in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (0.2.59)\n", "Requirement already satisfied: bs4 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (0.0.2)\n", "Requirement already satisfied: langchain_core in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (0.3.25)\n", "Requirement already satisfied: PyYAML>=5.3 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (6.0.1)\n", "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (2.0.35)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (3.9.3)\n", "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (0.6.4)\n", "Requirement already satisfied: httpx-sse<0.5.0,>=0.4.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (0.4.0)\n", "Requirement already satisfied: langsmith<0.3,>=0.1.125 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (0.1.144)\n", "Requirement already satisfied: numpy<2,>=1.22.4 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (1.26.4)\n", "Requirement already satisfied: pydantic-settings<3.0.0,>=2.4.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (2.6.0)\n", "Requirement already satisfied: requests<3,>=2 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (2.32.3)\n", "Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_community) (8.2.3)\n", "Requirement already satisfied: openai<2.0.0,>=1.55.3 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain-openai) (1.57.4)\n", "Requirement already satisfied: tiktoken<1,>=0.7 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain-openai) (0.7.0)\n", "Requirement already satisfied: anthropic<1,>=0.39.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain-anthropic) (0.40.0)\n", "Requirement already satisfied: defusedxml<0.8.0,>=0.7.1 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain-anthropic) (0.7.1)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain-anthropic) (2.7.4)\n", "Requirement already satisfied: langchain-text-splitters<0.4.0,>=0.3.3 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain) (0.3.3)\n", "Requirement already satisfied: langgraph-checkpoint<3.0.0,>=2.0.4 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langgraph) (2.0.9)\n", "Requirement already satisfied: langgraph-sdk<0.2.0,>=0.1.42 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langgraph) (0.1.45)\n", "Requirement already satisfied: beautifulsoup4 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from bs4) (4.12.3)\n", "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_core) (1.33)\n", "Requirement already satisfied: packaging<25,>=23.2 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_core) (23.2)\n", "Requirement already satisfied: typing-extensions>=4.7 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langchain_core) (4.12.2)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain_community) (1.3.1)\n", "Requirement already satisfied: attrs>=17.3.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain_community) (23.2.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain_community) (1.4.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain_community) (6.0.5)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain_community) (1.9.4)\n", "Requirement already satisfied: anyio<5,>=3.5.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from anthropic<1,>=0.39.0->langchain-anthropic) (3.7.1)\n", "Requirement already satisfied: distro<2,>=1.7.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from anthropic<1,>=0.39.0->langchain-anthropic) (1.9.0)\n", "Requirement already satisfied: httpx<1,>=0.23.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from anthropic<1,>=0.39.0->langchain-anthropic) (0.27.0)\n", "Requirement already satisfied: jiter<1,>=0.4.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from anthropic<1,>=0.39.0->langchain-anthropic) (0.4.2)\n", "Requirement already satisfied: sniffio in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from anthropic<1,>=0.39.0->langchain-anthropic) (1.3.0)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain_community) (3.20.2)\n", "Requirement already satisfied: typing-inspect<1,>=0.4.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain_community) (0.9.0)\n", "Requirement already satisfied: jsonpointer>=1.9 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain_core) (2.4)\n", "Requirement already satisfied: msgpack<2.0.0,>=1.1.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langgraph-checkpoint<3.0.0,>=2.0.4->langgraph) (1.1.0)\n", "Requirement already satisfied: orjson>=3.10.1 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langgraph-sdk<0.2.0,>=0.1.42->langgraph) (3.10.5)\n", "Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from langsmith<0.3,>=0.1.125->langchain_community) (1.0.0)\n", "Requirement already satisfied: tqdm>4 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from openai<2.0.0,>=1.55.3->langchain-openai) (4.66.1)\n", "Requirement already satisfied: annotated-types>=0.4.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.4->langchain-anthropic) (0.6.0)\n", "Requirement already satisfied: pydantic-core==2.18.4 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.4->langchain-anthropic) (2.18.4)\n", "Requirement already satisfied: python-dotenv>=0.21.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from pydantic-settings<3.0.0,>=2.4.0->langchain_community) (1.0.0)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from requests<3,>=2->langchain_community) (3.3.2)\n", "Requirement already satisfied: idna<4,>=2.5 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from requests<3,>=2->langchain_community) (3.10)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from requests<3,>=2->langchain_community) (2.2.3)\n", "Requirement already satisfied: certifi>=2017.4.17 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from requests<3,>=2->langchain_community) (2024.8.30)\n", "Requirement already satisfied: regex>=2022.1.18 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from tiktoken<1,>=0.7->langchain-openai) (2024.9.11)\n", "Requirement already satisfied: soupsieve>1.2 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from beautifulsoup4->bs4) (2.5)\n", "Requirement already satisfied: httpcore==1.* in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from httpx<1,>=0.23.0->anthropic<1,>=0.39.0->langchain-anthropic) (1.0.4)\n", "Requirement already satisfied: h11<0.15,>=0.13 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->anthropic<1,>=0.39.0->langchain-anthropic) (0.14.0)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/joaomoura/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain_community) (1.0.0)\n", "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" ] } ], "source": [ "! pip install -U langchain_community langchain-openai langchain-anthropic langchain langgraph bs4 langchain_core" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Docs" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup as Soup\n", "from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader\n", "\n", "# LCEL docs\n", "url = \"https://python.langchain.com/docs/how_to/sequence/#related\"\n", "loader = RecursiveUrlLoader(\n", " url=url, max_depth=20, extractor=lambda x: Soup(x, \"html.parser\").text\n", ")\n", "docs = loader.load()\n", "\n", "# Sort the list based on the URLs and get the text\n", "d_sorted = sorted(docs, key=lambda x: x.metadata[\"source\"])\n", "d_reversed = list(reversed(d_sorted))\n", "concatenated_content = \"\\n\\n\\n --- \\n\\n\\n\".join(\n", " [doc.page_content for doc in d_reversed]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create CodeChain" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "code(prefix=\"To build a Retrieval Augmented Generation (RAG) chain in LCEL, you need to combine a retriever with a language model to generate responses based on retrieved documents. The process involves creating a retriever to fetch relevant documents, a prompt template to format the input for the language model, and then chaining these components together. The retriever fetches documents based on a query, and the language model generates a response using the retrieved documents as context. Here's how you can set up a simple RAG chain using LangChain.\", imports='from langchain_core.retrievers import SimpleRetriever\\nfrom langchain_core.prompts import ChatPromptTemplate\\nfrom langchain_openai import ChatOpenAI\\nfrom langchain_core.output_parsers import StrOutputParser', code='# Initialize the retriever\\nretriever = SimpleRetriever()\\n\\n# Define a prompt template that includes the retrieved documents\\nprompt = ChatPromptTemplate.from_template(\"Using the following documents: {documents}, answer the question: {question}\")\\n\\n# Initialize the language model\\nmodel = ChatOpenAI(model=\"gpt-4o-mini\")\\n\\n# Chain the retriever, prompt, and model together\\nrag_chain = retriever | prompt | model | StrOutputParser()\\n\\n# Invoke the chain with a query\\nresponse = rag_chain.invoke({\"question\": \"What is the capital of France?\"})\\nprint(response)')" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain_core.prompts import ChatPromptTemplate\n", "from langchain_openai import ChatOpenAI\n", "from pydantic import BaseModel, Field\n", "\n", "### OpenAI\n", "\n", "# Grader prompt\n", "code_gen_prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\n", " \"system\",\n", " \"\"\"You are a coding assistant with expertise in LCEL, LangChain expression language. \\n\n", " Here is a full set of LCEL documentation: \\n ------- \\n {context} \\n ------- \\n Answer the user\n", " question based on the above provided documentation. Ensure any code you provide can be executed \\n\n", " with all required imports and variables defined. Structure your answer with a description of the code solution. \\n\n", " Then list the imports. And finally list the functioning code block. Here is the user question:\"\"\",\n", " ),\n", " (\"placeholder\", \"{messages}\"),\n", " ]\n", ")\n", "\n", "\n", "# Data model\n", "class code(BaseModel):\n", " \"\"\"Schema for code solutions to questions about LCEL.\"\"\"\n", "\n", " prefix: str = Field(description=\"Description of the problem and approach\")\n", " imports: str = Field(description=\"Code block import statements\")\n", " code: str = Field(description=\"Code block not including import statements\")\n", "\n", "\n", "expt_llm = \"gpt-4o\"\n", "llm = ChatOpenAI(temperature=0, model=expt_llm)\n", "code_gen_chain_oai = code_gen_prompt | llm.with_structured_output(code)\n", "question = \"How do I build a RAG chain in LCEL?\"\n", "solution = code_gen_chain_oai.invoke(\n", " {\"context\": concatenated_content, \"messages\": [(\"user\", question)]}\n", ")\n", "solution" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "from langchain_anthropic import ChatAnthropic\n", "from langchain_core.prompts import ChatPromptTemplate\n", "\n", "### Anthropic\n", "\n", "# Prompt to enforce tool use\n", "code_gen_prompt_claude = ChatPromptTemplate.from_messages(\n", " [\n", " (\n", " \"system\",\n", " \"\"\"<instructions> You are a coding assistant with expertise in LCEL, LangChain expression language. \\n\n", " Here is the LCEL documentation: \\n ------- \\n {context} \\n ------- \\n Answer the user question based on the \\n\n", " above provided documentation. Ensure any code you provide can be executed with all required imports and variables \\n\n", " defined. Structure your answer: 1) a prefix describing the code solution, 2) the imports, 3) the functioning code block. \\n\n", " Invoke the code tool to structure the output correctly. </instructions> \\n Here is the user question:\"\"\",\n", " ),\n", " (\"placeholder\", \"{messages}\"),\n", " ]\n", ")\n", "\n", "\n", "# LLM\n", "expt_llm = \"claude-3-opus-20240229\"\n", "llm = ChatAnthropic(\n", " model=expt_llm,\n", " default_headers={\"anthropic-beta\": \"tools-2024-04-04\"},\n", ")\n", "\n", "structured_llm_claude = llm.with_structured_output(code, include_raw=True)\n", "\n", "\n", "# Optional: Check for errors in case tool use is flaky\n", "def check_claude_output(tool_output):\n", " \"\"\"Check for parse error or failure to call the tool\"\"\"\n", "\n", " # Error with parsing\n", " if tool_output[\"parsing_error\"]:\n", " # Report back output and parsing errors\n", " print(\"Parsing error!\")\n", " raw_output = str(tool_output[\"raw\"].content)\n", " error = tool_output[\"parsing_error\"]\n", " raise ValueError(\n", " f\"Error parsing your output! Be sure to invoke the tool. Output: {raw_output}. \\n Parse error: {error}\"\n", " )\n", "\n", " # Tool was not invoked\n", " elif not tool_output[\"parsed\"]:\n", " print(\"Failed to invoke tool!\")\n", " raise ValueError(\n", " \"You did not use the provided tool! Be sure to invoke the tool to structure the output.\"\n", " )\n", " return tool_output\n", "\n", "\n", "# Chain with output check\n", "code_chain_claude_raw = (\n", " code_gen_prompt_claude | structured_llm_claude | check_claude_output\n", ")\n", "\n", "\n", "def insert_errors(inputs):\n", " \"\"\"Insert errors for tool parsing in the messages\"\"\"\n", "\n", " # Get errors\n", " error = inputs[\"error\"]\n", " messages = inputs[\"messages\"]\n", " messages += [\n", " (\n", " \"assistant\",\n", " f\"Retry. You are required to fix the parsing errors: {error} \\n\\n You must invoke the provided tool.\",\n", " )\n", " ]\n", " return {\n", " \"messages\": messages,\n", " \"context\": inputs[\"context\"],\n", " }\n", "\n", "\n", "# This will be run as a fallback chain\n", "fallback_chain = insert_errors | code_chain_claude_raw\n", "N = 3 # Max re-tries\n", "code_gen_chain_re_try = code_chain_claude_raw.with_fallbacks(\n", " fallbacks=[fallback_chain] * N, exception_key=\"error\"\n", ")\n", "\n", "\n", "def parse_output(solution):\n", " \"\"\"When we add 'include_raw=True' to structured output,\n", " it will return a dict w 'raw', 'parsed', 'parsing_error'.\"\"\"\n", "\n", " return solution[\"parsed\"]\n", "\n", "\n", "# Optional: With re-try to correct for failure to invoke tool\n", "code_gen_chain = code_gen_chain_re_try | parse_output\n", "\n", "# No re-try\n", "code_gen_chain = code_gen_prompt_claude | structured_llm_claude | parse_output\n", "\n", "code_gen_chain = code_gen_chain_oai" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "code(prefix=\"To build a Retrieval Augmented Generation (RAG) chain in LCEL, you need to combine a retriever with a language model to generate responses based on retrieved documents. The process involves setting up a retriever to fetch relevant documents and then using a language model to generate a response based on those documents. Here's a step-by-step guide to building a RAG chain:\", imports='from langchain_core.retrievers import VectorStoreRetriever\\nfrom langchain_core.prompts import ChatPromptTemplate\\nfrom langchain_openai import ChatOpenAI\\nfrom langchain_core.output_parsers import StrOutputParser\\nfrom langchain_core.runnables import RunnableParallel', code='# Step 1: Set up the retriever\\nretriever = VectorStoreRetriever(vector_store=my_vector_store, search_type=\\'similarity\\', k=5)\\n\\n# Step 2: Define the prompt template\\nprompt_template = ChatPromptTemplate.from_template(\"Based on the following documents, answer the question: {question}\\\\nDocuments: {documents}\")\\n\\n# Step 3: Initialize the language model\\nmodel = ChatOpenAI(model=\"gpt-4o-mini\")\\n\\n# Step 4: Chain the components together\\nrag_chain = (retriever | (lambda docs: {\\'documents\\': docs}) | prompt_template | model | StrOutputParser())\\n\\n# Step 5: Invoke the chain with a question\\nresponse = rag_chain.invoke({\\'question\\': \\'What are the benefits of using LCEL?\\'})\\nprint(response)')" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test\n", "question = \"How do I build a RAG chain in LCEL?\"\n", "solution = code_gen_chain.invoke(\n", " {\"context\": concatenated_content, \"messages\": [(\"user\", question)]}\n", ")\n", "solution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Langgraph State" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "from typing import List\n", "from typing_extensions import TypedDict\n", "\n", "\n", "class GraphState(TypedDict):\n", " \"\"\"\n", " Represents the state of our graph.\n", "\n", " Attributes:\n", " error : Binary flag for control flow to indicate whether test error was tripped\n", " messages : With user question, error messages, reasoning\n", " generation : Code solution\n", " iterations : Number of tries\n", " \"\"\"\n", "\n", " error: str\n", " messages: List\n", " generation: str\n", " iterations: int" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Langgraph Graph" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "### Parameter\n", "\n", "# Max tries\n", "max_iterations = 3\n", "# Reflect\n", "# flag = 'reflect'\n", "flag = \"do not reflect\"\n", "\n", "### Nodes\n", "\n", "\n", "def generate(state: GraphState):\n", " \"\"\"\n", " Generate a code solution\n", "\n", " Args:\n", " state (dict): The current graph state\n", "\n", " Returns:\n", " state (dict): New key added to state, generation\n", " \"\"\"\n", "\n", " print(\"---GENERATING CODE SOLUTION---\")\n", "\n", " # State\n", " messages = state[\"messages\"]\n", " iterations = state[\"iterations\"]\n", " error = state[\"error\"]\n", "\n", " # We have been routed back to generation with an error\n", " if error == \"yes\":\n", " messages += [\n", " (\n", " \"user\",\n", " \"Now, try again. Invoke the code tool to structure the output with a prefix, imports, and code block:\",\n", " )\n", " ]\n", "\n", " # Solution\n", " code_solution = code_gen_chain.invoke(\n", " {\"context\": concatenated_content, \"messages\": messages}\n", " )\n", " messages += [\n", " (\n", " \"assistant\",\n", " f\"{code_solution.prefix} \\n Imports: {code_solution.imports} \\n Code: {code_solution.code}\",\n", " )\n", " ]\n", "\n", " # Increment\n", " iterations = iterations + 1\n", " return {\"generation\": code_solution, \"messages\": messages, \"iterations\": iterations}\n", "\n", "\n", "def code_check(state: GraphState):\n", " \"\"\"\n", " Check code\n", "\n", " Args:\n", " state (dict): The current graph state\n", "\n", " Returns:\n", " state (dict): New key added to state, error\n", " \"\"\"\n", "\n", " print(\"---CHECKING CODE---\")\n", "\n", " # State\n", " messages = state[\"messages\"]\n", " code_solution = state[\"generation\"]\n", " iterations = state[\"iterations\"]\n", "\n", " # Get solution components\n", " imports = code_solution.imports\n", " code = code_solution.code\n", "\n", " # Check imports\n", " try:\n", " exec(imports)\n", " except Exception as e:\n", " print(\"---CODE IMPORT CHECK: FAILED---\")\n", " error_message = [(\"user\", f\"Your solution failed the import test: {e}\")]\n", " messages += error_message\n", " return {\n", " \"generation\": code_solution,\n", " \"messages\": messages,\n", " \"iterations\": iterations,\n", " \"error\": \"yes\",\n", " }\n", "\n", " # Check execution\n", " try:\n", " exec(imports + \"\\n\" + code)\n", " except Exception as e:\n", " print(\"---CODE BLOCK CHECK: FAILED---\")\n", " error_message = [(\"user\", f\"Your solution failed the code execution test: {e}\")]\n", " messages += error_message\n", " return {\n", " \"generation\": code_solution,\n", " \"messages\": messages,\n", " \"iterations\": iterations,\n", " \"error\": \"yes\",\n", " }\n", "\n", " # No errors\n", " print(\"---NO CODE TEST FAILURES---\")\n", " return {\n", " \"generation\": code_solution,\n", " \"messages\": messages,\n", " \"iterations\": iterations,\n", " \"error\": \"no\",\n", " }\n", "\n", "\n", "def reflect(state: GraphState):\n", " \"\"\"\n", " Reflect on errors\n", "\n", " Args:\n", " state (dict): The current graph state\n", "\n", " Returns:\n", " state (dict): New key added to state, generation\n", " \"\"\"\n", "\n", " print(\"---GENERATING CODE SOLUTION---\")\n", "\n", " # State\n", " messages = state[\"messages\"]\n", " iterations = state[\"iterations\"]\n", " code_solution = state[\"generation\"]\n", "\n", " # Prompt reflection\n", "\n", " # Add reflection\n", " reflections = code_gen_chain.invoke(\n", " {\"context\": concatenated_content, \"messages\": messages}\n", " )\n", " messages += [(\"assistant\", f\"Here are reflections on the error: {reflections}\")]\n", " return {\"generation\": code_solution, \"messages\": messages, \"iterations\": iterations}\n", "\n", "\n", "### Edges\n", "\n", "\n", "def decide_to_finish(state: GraphState):\n", " \"\"\"\n", " Determines whether to finish.\n", "\n", " Args:\n", " state (dict): The current graph state\n", "\n", " Returns:\n", " str: Next node to call\n", " \"\"\"\n", " error = state[\"error\"]\n", " iterations = state[\"iterations\"]\n", "\n", " if error == \"no\" or iterations == max_iterations:\n", " print(\"---DECISION: FINISH---\")\n", " return \"end\"\n", " else:\n", " print(\"---DECISION: RE-TRY SOLUTION---\")\n", " if flag == \"reflect\":\n", " return \"reflect\"\n", " else:\n", " return \"generate\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Create Langgraph Nodes Mapping" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "from langgraph.graph import END, StateGraph, START\n", "\n", "workflow = StateGraph(GraphState)\n", "\n", "# Define the nodes\n", "workflow.add_node(\"generate\", generate) # generation solution\n", "workflow.add_node(\"check_code\", code_check) # check code\n", "workflow.add_node(\"reflect\", reflect) # reflect\n", "\n", "# Build graph\n", "workflow.add_edge(START, \"generate\")\n", "workflow.add_edge(\"generate\", \"check_code\")\n", "workflow.add_conditional_edges(\n", " \"check_code\",\n", " decide_to_finish,\n", " {\n", " \"end\": END,\n", " \"reflect\": \"reflect\",\n", " \"generate\": \"generate\",\n", " },\n", ")\n", "workflow.add_edge(\"reflect\", \"generate\")\n", "app = workflow.compile()" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Why do bears have sticky fur?\n", "\n", "Because they always use bear-ly any shampoo!\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n" ] } ], "source": [ "question = \"How can I directly pass a string to a runnable and use it to construct the input needed for my prompt?\"\n", "solution = app.invoke({\"messages\": [(\"user\", question)], \"iterations\": 0, \"error\": \"\"})" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "code(prefix=\"To directly pass a string to a runnable and use it to construct the input needed for your prompt, you can use a lambda function to transform the string into the required input format. In this example, we'll pass a string representing a topic to a lambda function, which will convert it into a dictionary format expected by the prompt template. This dictionary is then passed through the chain of runnables, which includes a prompt template, a chat model, and an output parser.\", imports='from langchain_openai import ChatOpenAI\\nfrom langchain_core.output_parsers import StrOutputParser\\nfrom langchain_core.prompts import ChatPromptTemplate', code='# Initialize the chat model\\nmodel = ChatOpenAI(model=\"gpt-4o-mini\")\\n\\n# Create a prompt template\\nprompt = ChatPromptTemplate.from_template(\"tell me a joke about {topic}\")\\n\\n# Chain the prompt, model, and output parser together\\nchain = prompt | model | StrOutputParser()\\n\\n# Use a lambda function to convert a string into the required input format\\ncomposed_chain_with_lambda = (\\n (lambda topic: {\"topic\": topic}) # Convert string to dict\\n | chain\\n)\\n\\n# Invoke the chain with a string input\\nresult = composed_chain_with_lambda.invoke(\"bears\")\\nprint(result)')" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "solution[\"generation\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# CrewAI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Crew" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:17:32,412 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] } ], "source": [ "# Importing Crew related components\n", "from crewai import Agent, Task, Crew\n", "\n", "# Importing CrewAI Tools\n", "from crewai_tools import WebsiteSearchTool\n", "\n", "# Importing Pydantic\n", "from pydantic import BaseModel, Field\n", "\n", "class CodeSolution(BaseModel):\n", " prefix: str = Field(description=\"Description of the problem and approach\")\n", " imports: str = Field(description=\"Code block import statements\")\n", " code: str = Field(description=\"Code block not including import statements\")\n", "\n", "# Create the coding assistant agent\n", "coding_assistant = Agent(\n", " role='Coding Assistant',\n", " goal='Provide accurate and executable code solutions using LCEL',\n", " backstory=\"\"\"You are a coding assistant with expertise in LCEL, LangChain expression language. \\n\n", " Here is the LCEL documentation: \\n ------- \\n {context} \\n ------- \\n\n", " Answer the user question based on the \\n\n", " above provided documentation. Ensure any code you provide can be executed with all required imports and variables \\n\n", " defined.\"\"\",\n", " verbose=False,\n", " llm='gpt-4o'\n", ")\n", "\n", "# Create task for code generation\n", "code_generation_task = Task(\n", " description=\"\"\"Answer the user question based on the above provided documentation. Ensure any code you provide can be executed\n", " with all required imports and variables defined. Structure your answer:\n", " 1) a prefix describing the code solution\n", " 2) the imports\n", " 3) the functioning code block\n", "\n", " Your coding task:\n", " {question}\n", " \"\"\",\n", " expected_output=\"Code solution with prefix description, imports, and executable code block\",\n", " agent=coding_assistant,\n", " output_pydantic=CodeSolution\n", ")\n", "\n", "# Create the crew\n", "code_crew = Crew(\n", " agents=[coding_assistant],\n", " tasks=[code_generation_task],\n", " verbose=False\n", ")" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "# code_crew.train(\n", "# n_iterations=2,\n", "# filename=\"code_crew.pkl\",\n", "# inputs={\n", "# \"question\": 'How do I build a RAG chain in LCEL?',\n", "# \"context\": str(concatenated_content)\n", "# }\n", "# )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Flow State" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "from typing import List\n", "\n", "class CodeGenState(BaseModel):\n", " \"\"\"\n", " State for the code generation flow\n", " \"\"\"\n", " error: str = \"\"\n", " question: str = \"\"\n", " messages: List = []\n", " generation: str = \"\"\n", " iterations: int = 0\n", " max_iterations: int = 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating the Code Flow" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [], "source": [ "# Importing CrewAI Flow related components\n", "from crewai.flow.flow import Flow, listen, start, router\n", "\n", "class CodeGenFlow(Flow[CodeGenState]):\n", " def check_code(self):\n", " print(\"---CHECKING CODE---\")\n", "\n", " code_solution = self.state.generation\n", " imports = code_solution.imports\n", " code = code_solution.code\n", "\n", " try:\n", " exec(imports)\n", " except Exception as e:\n", " print(\"---CODE IMPORT CHECK: FAILED---\")\n", " self.state.error = str(e)\n", " return \"code_failed\"\n", "\n", " try:\n", " exec(imports + \"\\n\" + code)\n", " except Exception as e:\n", " print(\"---CODE BLOCK CHECK: FAILED---\")\n", " self.state.error = str(e)\n", " return \"code_failed\"\n", "\n", " print(\"---NO CODE TEST FAILURES---\")\n", " return \"success\"\n", "\n", " def fix_code(self):\n", " if self.state.error != \"\":\n", " print(\"---FIXING CODE---\")\n", " # Create task for fixing code\n", " code_fix_task = Task(\n", " description=\"\"\"You are a coding assistant with expertise in LCEL, LangChain expression language.\n", " Here is a full set of LCEL documentation:\n", " -------\n", " {context}\n", " -------\n", "\n", " The previous code attempt failed with the following error:\n", " {error}\n", "\n", " Your coding task:\n", " {question}\n", "\n", " Previous code attempt:\n", " {explanation}\n", " {imports}\n", " {code}\n", "\n", " Answer with a description of the code solution, followed by the imports, and finally the functioning code block.\n", " Ensure all imports are correct and the code is executable.\"\"\",\n", " expected_output= \"A working code solution to the problem\",\n", " agent=coding_assistant,\n", " output_pydantic=CodeSolution\n", " )\n", "\n", " # Create crew for fixing code\n", " fix_crew = Crew(\n", " agents=[coding_assistant],\n", " tasks=[code_fix_task]\n", " )\n", "\n", " # Execute fix\n", " result = fix_crew.kickoff(\n", " inputs={\n", " \"error\": self.state.error,\n", " \"question\": self.state.question,\n", " \"explanation\": self.state.generation.prefix,\n", " \"imports\": self.state.generation.imports,\n", " \"code\": self.state.generation.code,\n", " \"context\": concatenated_content\n", " }\n", " )\n", " self.state.generation = result.pydantic\n", " self.state.error = \"\"\n", "\n", " @start()\n", " def generate_code(self):\n", " print(\"---GENERATING CODE SOLUTION---\")\n", " result = code_crew.kickoff(\n", " inputs={\n", " \"question\": self.state.question,\n", " \"context\": concatenated_content\n", " }\n", " )\n", " self.state.generation = result.pydantic\n", " self.state.error = \"\"\n", "\n", " @router(generate_code)\n", " def run_check(self):\n", " result = self.check_code()\n", " if result != \"success\":\n", " return \"fix_code\"\n", "\n", " @listen('fix_code')\n", " def run_fix(self):\n", " self.fix_code()\n", "\n", " @router(run_fix)\n", " def re_run_check(self):\n", " result = self.check_code()\n", " if result != \"success\":\n", " return \"refix_code\"\n", "\n", " @listen('refix_code')\n", " def re_run_fix(self):\n", " self.fix_code()\n", "\n", " @listen(re_run_fix)\n", " def re_re_run_check(self):\n", " self.check_code()\n", "\n" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---GENERATING CODE SOLUTION---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:42:24,147 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:42:38,562 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "Retrieval Augmented Generation (RAG) is a machine learning approach that combines retrieval-based methods with generative models to enhance the quality and relevance of generated text. In RAG, a system first retrieves relevant documents or information from a knowledge base based on the input query. This retrieved information is then used to inform and guide the generation of a response or text output. \n", "\n", "The process typically involves two main components: a retriever, which searches for and selects relevant documents, and a generator, which creates text based on both the input query and the retrieved documents. This approach allows the model to produce more accurate and contextually appropriate responses by leveraging external knowledge, making it particularly useful for tasks where factual accuracy and detail are important, such as question answering or content generation.\n", "---NO CODE TEST FAILURES---\n" ] }, { "data": { "text/plain": [ "CodeSolution(prefix=\"To build a Retrieval Augmented Generation (RAG) chain in LCEL, you need to use existing runnable components like a valid retriever, a language model, and an output parser. Here, I'll correct the example to ensure that we have working components by simulating a simple retriever and question source within the code.\", imports='from langchain_openai import ChatOpenAI\\nfrom langchain_core.output_parsers import StrOutputParser\\nfrom langchain_core.prompts import ChatPromptTemplate\\nfrom langchain_core.runnables import RunnableParallel\\nimport getpass\\nimport os', code=\"class MockRetriever:\\n def __call__(self, args):\\n # Dummy implementation, replace with your actual retriever logic\\n return 'This is your retrieved context.'\\n\\nclass MockQuestionSource:\\n def __call__(self, args):\\n # Dummy implementation, use your actual question source logic\\n return args['question']\\n\\n# Assuming this block handles API key setup\\nif not os.environ.get('OPENAI_API_KEY'):\\n os.environ['OPENAI_API_KEY'] = getpass.getpass('Enter API key for OpenAI: ')\\n\\nretriever = MockRetriever() # Initialize your retriever here\\nmodel = ChatOpenAI(model='gpt-4o-mini')\\nprompt = ChatPromptTemplate.from_template('Given this context: {context}, answer this question: {question}')\\n\\nrag_chain = RunnableParallel({\\n 'context': retriever, # Run the retriever to get context\\n 'question': MockQuestionSource() # Replace with your question source\\n})\\nrag_chain = rag_chain.pipe(prompt).pipe(model).pipe(StrOutputParser())\\n\\nresult = rag_chain.invoke({'question': 'What is Retrieval Augmented Generation?'})\\nprint(result)\")" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "code_flow = CodeGenFlow()\n", "code_flow.kickoff(inputs={\"question\": 'How do I build a RAG chain in LCEL?'})\n", "code_flow.state.generation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluations\n", "\n", "We will check for imports, code execution and overall compare with the correct solution." ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [], "source": [ "def check_import(solution) -> dict:\n", " imports = solution.imports\n", " try:\n", " exec(imports)\n", " return {\"key\": \"import_check\", \"score\": 1}\n", " except Exception:\n", " return {\"key\": \"import_check\", \"score\": 0}\n", "\n", "\n", "def check_execution(solution) -> dict:\n", " imports = solution.imports\n", " code = solution.code\n", " try:\n", " exec(imports + \"\\n\" + code)\n", " return {\"key\": \"code_execution_check\", \"score\": 1}\n", " except Exception:\n", " return {\"key\": \"code_execution_check\", \"score\": 0}" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "content='Why do bears have hairy coats?\\n\\nBecause they look silly in sweaters!' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 13, 'total_tokens': 27, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_6fc10e10eb', 'finish_reason': 'stop', 'logprobs': None} id='run-7ec4c116-9ed0-4780-be4c-7658ae35bf85-0' usage_metadata={'input_tokens': 13, 'output_tokens': 14, 'total_tokens': 27, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "content='Why do bears have hairy coats?\\n\\nBecause they look silly in sweaters!' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 13, 'total_tokens': 27, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_6fc10e10eb', 'finish_reason': 'stop', 'logprobs': None} id='run-fc294d0e-c002-4642-b719-33222acac988-0' usage_metadata={'input_tokens': 13, 'output_tokens': 14, 'total_tokens': 27, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: FINISH---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "<class 'langchain_core.utils.pydantic.PromptInput'>\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "<class 'langchain_core.utils.pydantic.PromptInput'>\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "The translation of \"Where did Harrison work?\" to Italian is \"Dove lavorava Harrison?\"\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "The translation of \"Where did Harrison work?\" in Italian is \"Dove lavorava Harrison?\"\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: FINISH---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Itemgetter: What is the capital of France?\n", "Lambda: What is the capital of France?\n", "Get method: What is the capital of France?\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "Itemgetter: What is the capital of France?\n", "Lambda: What is the capital of France?\n", "Get method: What is the capital of France?\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Processed question: {'question': 'how do I use Anthropic?'}\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "Processed question: {'question': 'how do I use Anthropic?'}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "{'a': 1, 'b': 2, 'c': 3}\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "{'a': 1, 'b': 2, 'c': 3}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Why do bears have hairy coats?\n", "\n", "Because they look silly in sweaters!\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "Why do bears have hairy coats?\n", "\n", "Because they look silly in sweaters!\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: FINISH---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE IMPORT CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE IMPORT CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: FINISH---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "{'output': {'num': 1, 'num2': 2}}\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "{'output': {'num': 1, 'num2': 2}}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Why do bears have hairy coats?\n", "\n", "Because they look silly in sweaters!\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "Why do bears have hairy coats?\n", "\n", "Because they look silly in sweaters!\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "That's a cute joke! It plays on the word \"root,\" which connects to beets being root vegetables, and \"rootin'-tootin'\" adds a fun, playful twist. Whether it's funny or not can depend on the audience, but it definitely has a lighthearted charm!\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "That's a cute joke! It has a clever play on words with \"root\" and the idea of a relationship under pressure. Humor can be quite subjective, but many people would likely find it amusing, especially if they enjoy puns. Keep sharing the veggie humor!\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "{'joke': 'Why do bears have sticky fur?\\n\\nBecause they always use honeycombs!', 'fact': 'Bears are known for their excellent sense of smell, which is estimated to be about seven times stronger than that of dogs. This keen sense helps them locate food from great distances, including fruits, nuts, and even carrion.'}\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "{'joke': 'Why do bears have hairy coats?\\n\\nBecause they look silly in sweaters!', 'fact': 'Bears are known for their impressive sense of smell, which is estimated to be seven times stronger than that of a bloodhound. This keen sense helps them locate food from great distances, making it essential for their survival in the wild.'}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Why do bears have hairy coats?\n", "\n", "Because they look silly in sweaters!\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "What do you call a bear with no teeth? \n", "\n", "A gummy bear!\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: FINISH---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE IMPORT CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE IMPORT CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE IMPORT CHECK: FAILED---\n", "---DECISION: FINISH---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE IMPORT CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: FINISH---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---DECISION: RE-TRY SOLUTION---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "{\"joke\":\"That's a cute joke! It has a playful twist and a bit of absurdity, which can definitely make people smile. Humor often depends on the audience, but it's light-hearted and suitable for all ages. If your goal is to get a chuckle, it likely will!\",\"analysis\":\"That's a cute joke! It has a playful twist and a bit of absurdity, which can definitely make people smile. Humor often depends on the audience, but it's light-hearted and suitable for all ages. If your goal is to get a chuckle, it likely will!\"}\n", "---NO CODE TEST FAILURES---\n", "---DECISION: FINISH---\n", "{\"joke\":\"That's a cute joke! It's light-hearted and has a playful twist. Humor is often subjective, but many people enjoy simple, punny jokes like this one. If it brings a smile, then it's definitely funny!\",\"analysis\":\"That's a cute joke! It's light-hearted and has a playful twist. Humor is often subjective, but many people enjoy simple, punny jokes like this one. If it brings a smile, then it's definitely funny!\"}\n", "\n", "Evaluation Results:\n", " question import_check \\\n", "0 How can I use a prompt and model to create a c... 1 \n", "1 How can I add memory to an arbitrary chain usi... 1 \n", "2 I've defined a LCEL runnable chain = prompt | ... 1 \n", "3 I have a LCEL runnable, chain, and am passing ... 1 \n", "4 I am passing text key 'foo' to my prompt and w... 1 \n", "5 My LCEL map contains the key 'question'. What ... 1 \n", "6 I'm invoking a LCEL chain with a map that cont... 1 \n", "7 I’m passing {'a':1} and want to create an out... 1 \n", "8 How can I make the output of my LCEL chain a s... 1 \n", "9 How can I apply a custom function to one of th... 1 \n", "10 With a RAG chain in LCEL, why are documents re... 1 \n", "11 I am passing a map with {'num': 1} to a LCEL c... 1 \n", "12 How can I configure the temperature of an LLM ... 1 \n", "13 How can we apply a function call to an LLM in ... 1 \n", "14 How can I run two LCEL chains in parallel and ... 1 \n", "15 How can I directly pass a string to a runnable... 1 \n", "16 How can I use a custom function to route betwe... 1 \n", "17 How do I set up a retrieval-augmented generati... 0 \n", "18 How can I create a LCEL chain that queries a S... 1 \n", "19 How to structure output of an LCEL chain as a ... 1 \n", "\n", " execution_check \n", "0 1 \n", "1 0 \n", "2 1 \n", "3 1 \n", "4 0 \n", "5 1 \n", "6 1 \n", "7 1 \n", "8 1 \n", "9 0 \n", "10 0 \n", "11 1 \n", "12 1 \n", "13 1 \n", "14 1 \n", "15 1 \n", "16 0 \n", "17 0 \n", "18 0 \n", "19 1 \n" ] } ], "source": [ "import pandas as pd\n", "\n", "# Load the evaluation data\n", "df = pd.read_csv(\"eval.csv\")\n", "\n", "# Store evaluation results\n", "results = []\n", "\n", "for _, row in df.iterrows():\n", " question = row[\"question\"]\n", " # Run the workflow for each question\n", " solution = app.invoke({\"messages\": [(\"user\", question)], \"iterations\": 0, \"error\": \"\"})\n", "\n", " # Run evaluations\n", " import_check = check_import(solution[\"generation\"])\n", " execution_check = check_execution(solution[\"generation\"])\n", "\n", " # Store results\n", " result = {\n", " \"question\": question,\n", " \"import_check\": import_check[\"score\"],\n", " \"execution_check\": execution_check[\"score\"]\n", " }\n", " results.append(result)\n", "\n", "# Convert results to dataframe\n", "lg_df = pd.DataFrame(results)\n", "print(\"\\nEvaluation Results:\")\n", "print(lg_df)\n" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>question</th>\n", " <th>import_check</th>\n", " <th>execution_check</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>How can I use a prompt and model to create a c...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>How can I add memory to an arbitrary chain usi...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>I've defined a LCEL runnable chain = prompt | ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>I have a LCEL runnable, chain, and am passing ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>I am passing text key 'foo' to my prompt and w...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>My LCEL map contains the key 'question'. What ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>I'm invoking a LCEL chain with a map that cont...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>I’m passing {'a':1} and want to create an out...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>How can I make the output of my LCEL chain a s...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>How can I apply a custom function to one of th...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>With a RAG chain in LCEL, why are documents re...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>I am passing a map with {'num': 1} to a LCEL c...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>How can I configure the temperature of an LLM ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>How can we apply a function call to an LLM in ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>How can I run two LCEL chains in parallel and ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>How can I directly pass a string to a runnable...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>How can I use a custom function to route betwe...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>How do I set up a retrieval-augmented generati...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>18</th>\n", " <td>How can I create a LCEL chain that queries a S...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>How to structure output of an LCEL chain as a ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " question import_check \\\n", "0 How can I use a prompt and model to create a c... 1 \n", "1 How can I add memory to an arbitrary chain usi... 1 \n", "2 I've defined a LCEL runnable chain = prompt | ... 1 \n", "3 I have a LCEL runnable, chain, and am passing ... 1 \n", "4 I am passing text key 'foo' to my prompt and w... 1 \n", "5 My LCEL map contains the key 'question'. What ... 1 \n", "6 I'm invoking a LCEL chain with a map that cont... 1 \n", "7 I’m passing {'a':1} and want to create an out... 1 \n", "8 How can I make the output of my LCEL chain a s... 1 \n", "9 How can I apply a custom function to one of th... 1 \n", "10 With a RAG chain in LCEL, why are documents re... 1 \n", "11 I am passing a map with {'num': 1} to a LCEL c... 1 \n", "12 How can I configure the temperature of an LLM ... 1 \n", "13 How can we apply a function call to an LLM in ... 1 \n", "14 How can I run two LCEL chains in parallel and ... 1 \n", "15 How can I directly pass a string to a runnable... 1 \n", "16 How can I use a custom function to route betwe... 1 \n", "17 How do I set up a retrieval-augmented generati... 0 \n", "18 How can I create a LCEL chain that queries a S... 1 \n", "19 How to structure output of an LCEL chain as a ... 1 \n", "\n", " execution_check \n", "0 1 \n", "1 0 \n", "2 1 \n", "3 1 \n", "4 0 \n", "5 1 \n", "6 1 \n", "7 1 \n", "8 1 \n", "9 0 \n", "10 0 \n", "11 1 \n", "12 1 \n", "13 1 \n", "14 1 \n", "15 1 \n", "16 0 \n", "17 0 \n", "18 0 \n", "19 1 " ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lg_df" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---GENERATING CODE SOLUTION---\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "content=\"I'm sorry, but I don't have real-time weather information. To get the current weather in New York, I recommend checking a reliable weather website or app for the most accurate and up-to-date information.\" additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 40, 'prompt_tokens': 17, 'total_tokens': 57, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_39a40c96a0', 'finish_reason': 'stop', 'logprobs': None} id='run-9bb0dfdc-5d7e-4378-aeb6-5edb6ac60af2-0' usage_metadata={'input_tokens': 17, 'output_tokens': 40, 'total_tokens': 57, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}\n", "---NO CODE TEST FAILURES---\n", "content=\"I'm sorry, but I can't provide real-time weather updates. However, you can easily check the current weather in New York through a weather website or app. If you need general information about the climate in New York during this time of year, feel free to ask!\" additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 53, 'prompt_tokens': 17, 'total_tokens': 70, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_6fc10e10eb', 'finish_reason': 'stop', 'logprobs': None} id='run-e956b80e-fbd6-49b7-baf5-22950ecddfa4-0' usage_metadata={'input_tokens': 17, 'output_tokens': 53, 'total_tokens': 70, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}\n", "---GENERATING CODE SOLUTION---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:48:37,955 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE IMPORT CHECK: FAILED---\n", "---FIXING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:48:46,033 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "<class 'langchain_core.utils.pydantic.PromptInput'>\n", "---NO CODE TEST FAILURES---\n", "<class 'langchain_core.utils.pydantic.PromptInput'>\n", "---GENERATING CODE SOLUTION---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:49:10,929 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:49:16,374 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---GENERATING CODE SOLUTION---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:49:28,753 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:49:33,880 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Using itemgetter: What is LCEL?\n", "Using lambda: What is LCEL?\n", "Using dict.get: What is LCEL?\n", "---NO CODE TEST FAILURES---\n", "Using itemgetter: What is LCEL?\n", "Using lambda: What is LCEL?\n", "Using dict.get: What is LCEL?\n", "---GENERATING CODE SOLUTION---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:49:47,030 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "Processed: how do I use Anthropic?\n", "---NO CODE TEST FAILURES---\n", "Processed: how do I use Anthropic?\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "{'a': 1, 'b': 2, 'c': 3}\n", "---NO CODE TEST FAILURES---\n", "{'a': 1, 'b': 2, 'c': 3}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Why do bears have hairy coats?\n", "\n", "Because they look silly in sweaters!\n", "---NO CODE TEST FAILURES---\n", "Why do bears have hairy coats?\n", "\n", "Because they look silly in sweaters!\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Yes, that's a classic! It's a pun that plays on the double meaning of \"peeling\" and \"feeling.\" It's light-hearted and can definitely get a chuckle, especially if someone enjoys wordplay. 🍌 Do you have any other jokes you'd like to share?\n", "---NO CODE TEST FAILURES---\n", "Yes, that's a classic and lighthearted joke! The play on words with \"peeling\" and \"feeling\" makes it amusing, especially for kids. It's a fun way to incorporate a pun into a simple setup. Plus, the banana emoji adds a nice touch!\n", "---GENERATING CODE SOLUTION---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:50:27,066 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE IMPORT CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "content='Harrison worked at Initech before moving to Initrode.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 34, 'total_tokens': 46, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_6fc10e10eb', 'finish_reason': 'stop', 'logprobs': None} id='run-4b30ea59-66cf-4ee2-ae47-eee1db927ba4-0' usage_metadata={'input_tokens': 34, 'output_tokens': 12, 'total_tokens': 46, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}\n", "---NO CODE TEST FAILURES---\n", "content='Harrison worked at Initech before moving to Initrode.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 34, 'total_tokens': 46, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_6fc10e10eb', 'finish_reason': 'stop', 'logprobs': None} id='run-f11c9940-58a8-460d-827a-0e0e4955cc74-0' usage_metadata={'input_tokens': 34, 'output_tokens': 12, 'total_tokens': 46, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "{'output': {'num': 1, 'num2': 2}}\n", "---NO CODE TEST FAILURES---\n", "{'output': {'num': 1, 'num2': 2}}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "Why did the sun go to school?\n", "\n", "To get a little brighter!\n", "---NO CODE TEST FAILURES---\n", "Why did the sun go to school? \n", "\n", "Because it wanted to get a little brighter!\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "That's a clever pun! It plays on the double meaning of \"beet\" and the phrase \"beating around the bush.\" If you enjoy wordplay and puns, it's definitely a funny joke! Humor is subjective, so it might get a chuckle from some and a groan from others, but that's part of the fun with jokes like this.\n", "---NO CODE TEST FAILURES---\n", "That's a cute pun! The play on words with \"root\" relationships is clever and fits well with the vegetable theme. Humor is subjective, but many people who enjoy wordplay and puns might find it funny. If you're sharing it with others who appreciate light-hearted jokes, it could definitely get a chuckle!\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "{'joke_result': 'Why did the computer go to therapy?\\n\\nBecause it had too many bytes from its past!', 'fact_result': 'One interesting fact about computers is that the first programmable computer, the Z3, was built by German engineer Konrad Zuse in 1941. The Z3 was a pioneering machine that used electromechanical relays and was capable of performing complex calculations, laying the groundwork for modern computing technology.'}\n", "---NO CODE TEST FAILURES---\n", "{'joke_result': 'Why did the computer go to therapy?\\n\\nBecause it had too many bytes of emotional baggage!', 'fact_result': \"One interesting fact about computers is that the first program ever written specifically for a computer was created by Ada Lovelace in the mid-1800s. She developed an algorithm for Charles Babbage's early mechanical general-purpose computer, the Analytical Engine. This makes her one of the first computer programmers in history!\"}\n", "---GENERATING CODE SOLUTION---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:51:26,267 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:51:34,908 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "That's a cute joke! The pun with \"bear-y\" is clever and adds a playful twist. While humor is subjective, many people enjoy light-hearted animal puns like this one. If you’re looking to make someone smile, it’s a good choice!\n", "---NO CODE TEST FAILURES---\n", "That's a cute joke! The play on the idea of bears wearing sweaters adds a humorous visual element. Humor is subjective, so while some might find it funny, others may not. It has a lighthearted, whimsical charm!\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n", "{'question': 'How can I use a custom function to route between 2 chains in LCEL?', 'import_check': 1, 'execution_check': 1}\n", "---NO CODE TEST FAILURES---\n", "{'question': 'How can I use a custom function to route between 2 chains in LCEL?', 'import_check': 1, 'execution_check': 1}\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:52:17,166 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:52:30,100 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "---GENERATING CODE SOLUTION---\n", "---CHECKING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:53:04,414 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-12-16 11:53:12,588 - 8071896128 - __init__.py-__init__:521 - WARNING: Overriding of current TracerProvider is not allowed\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "---CODE BLOCK CHECK: FAILED---\n", "---FIXING CODE---\n", "---CHECKING CODE---\n", "---CODE BLOCK CHECK: FAILED---\n", "\n", "Evaluation Results:\n", " question import_check \\\n", "0 How can I use a prompt and model to create a c... 1 \n", "1 How can I add memory to an arbitrary chain usi... 1 \n", "2 I've defined a LCEL runnable chain = prompt | ... 1 \n", "3 I have a LCEL runnable, chain, and am passing ... 1 \n", "4 I am passing text key 'foo' to my prompt and w... 1 \n", "5 My LCEL map contains the key 'question'. What ... 1 \n", "6 I'm invoking a LCEL chain with a map that cont... 1 \n", "7 I’m passing {'a':1} and want to create an out... 1 \n", "8 How can I make the output of my LCEL chain a s... 1 \n", "9 How can I apply a custom function to one of th... 1 \n", "10 With a RAG chain in LCEL, why are documents re... 1 \n", "11 I am passing a map with {'num': 1} to a LCEL c... 1 \n", "12 How can I configure the temperature of an LLM ... 1 \n", "13 How can we apply a function call to an LLM in ... 1 \n", "14 How can I run two LCEL chains in parallel and ... 1 \n", "15 How can I directly pass a string to a runnable... 1 \n", "16 How can I use a custom function to route betwe... 1 \n", "17 How do I set up a retrieval-augmented generati... 1 \n", "18 How can I create a LCEL chain that queries a S... 1 \n", "19 How to structure output of an LCEL chain as a ... 1 \n", "\n", " execution_check \n", "0 1 \n", "1 0 \n", "2 1 \n", "3 0 \n", "4 0 \n", "5 1 \n", "6 1 \n", "7 1 \n", "8 1 \n", "9 1 \n", "10 1 \n", "11 1 \n", "12 1 \n", "13 1 \n", "14 1 \n", "15 0 \n", "16 1 \n", "17 1 \n", "18 0 \n", "19 0 \n" ] } ], "source": [ "# Load the evaluation data\n", "df = pd.read_csv(\"eval.csv\")\n", "\n", "# Store evaluation results\n", "results = []\n", "\n", "for _, row in df.iterrows():\n", " question = row[\"question\"]\n", " # Run the workflow for each question\n", " code_flow = CodeGenFlow()\n", " code_flow.kickoff(inputs={\"question\": question})\n", "\n", " # Run evaluations\n", " import_check = check_import(code_flow.state.generation)\n", " execution_check = check_execution(code_flow.state.generation)\n", "\n", " # Store results\n", " result = {\n", " \"question\": question,\n", " \"import_check\": import_check[\"score\"],\n", " \"execution_check\": execution_check[\"score\"]\n", " }\n", " results.append(result)\n", "\n", "# Convert results to dataframe\n", "ca_df = pd.DataFrame(results)\n", "print(\"\\nEvaluation Results:\")\n", "print(ca_df)" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>question</th>\n", " <th>import_check</th>\n", " <th>execution_check</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>How can I use a prompt and model to create a c...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>How can I add memory to an arbitrary chain usi...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>I've defined a LCEL runnable chain = prompt | ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>I have a LCEL runnable, chain, and am passing ...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>I am passing text key 'foo' to my prompt and w...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>My LCEL map contains the key 'question'. What ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>I'm invoking a LCEL chain with a map that cont...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>I’m passing {'a':1} and want to create an out...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>How can I make the output of my LCEL chain a s...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>How can I apply a custom function to one of th...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>With a RAG chain in LCEL, why are documents re...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>I am passing a map with {'num': 1} to a LCEL c...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>How can I configure the temperature of an LLM ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>How can we apply a function call to an LLM in ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>How can I run two LCEL chains in parallel and ...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>How can I directly pass a string to a runnable...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>How can I use a custom function to route betwe...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>How do I set up a retrieval-augmented generati...</td>\n", " <td>1</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>18</th>\n", " <td>How can I create a LCEL chain that queries a S...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>How to structure output of an LCEL chain as a ...</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " question import_check \\\n", "0 How can I use a prompt and model to create a c... 1 \n", "1 How can I add memory to an arbitrary chain usi... 1 \n", "2 I've defined a LCEL runnable chain = prompt | ... 1 \n", "3 I have a LCEL runnable, chain, and am passing ... 1 \n", "4 I am passing text key 'foo' to my prompt and w... 1 \n", "5 My LCEL map contains the key 'question'. What ... 1 \n", "6 I'm invoking a LCEL chain with a map that cont... 1 \n", "7 I’m passing {'a':1} and want to create an out... 1 \n", "8 How can I make the output of my LCEL chain a s... 1 \n", "9 How can I apply a custom function to one of th... 1 \n", "10 With a RAG chain in LCEL, why are documents re... 1 \n", "11 I am passing a map with {'num': 1} to a LCEL c... 1 \n", "12 How can I configure the temperature of an LLM ... 1 \n", "13 How can we apply a function call to an LLM in ... 1 \n", "14 How can I run two LCEL chains in parallel and ... 1 \n", "15 How can I directly pass a string to a runnable... 1 \n", "16 How can I use a custom function to route betwe... 1 \n", "17 How do I set up a retrieval-augmented generati... 1 \n", "18 How can I create a LCEL chain that queries a S... 1 \n", "19 How to structure output of an LCEL chain as a ... 1 \n", "\n", " execution_check \n", "0 1 \n", "1 0 \n", "2 1 \n", "3 0 \n", "4 0 \n", "5 1 \n", "6 1 \n", "7 1 \n", "8 1 \n", "9 1 \n", "10 1 \n", "11 1 \n", "12 1 \n", "13 1 \n", "14 1 \n", "15 0 \n", "16 1 \n", "17 1 \n", "18 0 \n", "19 0 " ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ca_df" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Comparison of Evaluation Results (%):\n", " Metric LangGraph CrewAI\n", "0 Import Check Pass Rate 95.0 100.0\n", "1 Execution Check Pass Rate 65.0 70.0\n" ] } ], "source": [ "# Compare evaluation metrics between LangGraph and CrewAI approaches\n", "comparison_df = pd.DataFrame({\n", " 'Metric': ['Import Check Pass Rate', 'Execution Check Pass Rate'],\n", " 'LangGraph': [\n", " lg_df['import_check'].mean() * 100,\n", " lg_df['execution_check'].mean() * 100\n", " ],\n", " 'CrewAI': [\n", " ca_df['import_check'].mean() * 100,\n", " ca_df['execution_check'].mean() * 100\n", " ]\n", "})\n", "\n", "# Format percentages to 2 decimal places\n", "comparison_df['LangGraph'] = comparison_df['LangGraph'].round(2)\n", "comparison_df['CrewAI'] = comparison_df['CrewAI'].round(2)\n", "\n", "print(\"\\nComparison of Evaluation Results (%):\")\n", "print(comparison_df)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 2 }

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/crewAIInc/crewAI-examples'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

coding_assistant_eval.ipynb•94.8 KiB