Replicating OpenAI's Assistant Tools: A Step-by-Step Guide

As part of building Glama, I am trying to build a deeper understanding of the concepts behind existing services, such as OpenAI's assistant tools. So I decided to write a small PoC that attempts to replicate the functionality.

What are assistant tools?

This blog post captures well the concepts behind the tools. In short, the tools are a way to define a set of functions that can be called by the model in response to user queries. Furthermore, the model can call multiple functions in sequence to answer complex queries. It can deduce the correct order of function calls to complete a task, eliminating the need for complex routing logic.

Practical examples of tools include:

Fetching information from external sources (e.g. fetching the current weather in a given location)
Calculating complex mathematical expressions (e.g. calculating the total cost of a shopping cart)
Performing actions on the user's behalf (e.g. sending an email)

However, not all models support tools. I wanted to write my own routing implementation so that I could enable access to the tools for all models.

Implementing tools

I started with writing a simple test case that describes the happy path.

import { routeMessage } from './routeMessage'; import { expect, it } from 'vitest'; import { z } from 'zod'; it('uses tools if relevant tools are available', async () => { const plan = await routeMessage({ messages: [ { content: 'What is 2+2?', role: 'user' as const, }, ], tools: [ { description: 'Adds two numbers.', execute: async ({ a, b }) => { return { sum: a + b, }; }, name: 'addNumbers', parameters: z.object({ a: z.number(), b: z.number(), }), response: z.object({ sum: z.number(), }), }, ], }); expect(plan).toEqual({ actions: [ { name: 'addNumbers', parameters: { a: 2, b: 2, }, }, ], }); });

The expectation is that the routeMessage function will understand that the user is asking for the sum of two numbers, and will instruct to call the addNumbers tool to get the result.

In order to do this, we need to:

Describe the tools available to the model
Provide model with the conversation history

Describing tools

Tool descriptions need to be expressed in a way that the model can understand. I simply defaulted to using JSON.

The only complexity here is that I've used zod to describe the expected parameters and response schema of the tools. So the first thing we need to do is convert the zod schema to JSON schema.

import { zodToJsonSchema } from 'zod-to-json-schema'; import { type AnyZodObject, z } from 'zod'; type Tool<P extends AnyZodObject, R extends AnyZodObject> = { description: string; execute: (parameters?: z.infer<P>) => Promise<z.infer<R>>; name: string; parameters?: P; response: R; }; const describeTool = (tool: Tool<AnyZodObject, AnyZodObject>) => { return { description: tool.description, name: tool.name, parameters: tool.parameters ? zodToJsonSchema(tool.parameters) : null, }; };

describeTool is a helper function that we will use to serializing tools to JSON.

Now that we have a way to describe tools, we need to describe the conversation history.

Describing conversation history

I am using ai library message format because most readers are familiar with it. However, this implementation does not depend on any specific framework.

import { type CoreMessage } from 'ai'; export const routeMessage = async ({ messages, tools, }: { messages: CoreMessage[]; tools: ReadonlyArray<Tool<AnyZodObject, AnyZodObject>>; }) => {}

I like the ai library message format because it captures every important aspect of the conversation, including the role of each participant, the content of the message, and the tools used. We need the conversation history to include the tool invocations, so that the model would have the context about what tools were already used in the conversation.

Writing the prompt

At the end, the entire routing logic is expressed as a prompt.

I've experimented with different prompts, and this is what I landed on.

You are an assistant with access to the following tools: [tools] Tools are described in the following format: * "description" describes what the tool does. * "name" is the name of the tool. * "parameters" is the JSON schema of the tool parameters (or null if the tool does not have parameters). You are also given the following conversation history: [messages] The conversation history is a list of messages exchanged between the user and the assistant. It may also describe previous actions taken by the assistant. Based on the conversation history, and the tools you have access to, propose a plan for how to answer the user's question. The response should be a JSON object with "actions" property, which is an array of tools to use. Each tool is represented as an object with the following properties: * "name": the name of the tool to use. * "parameters": the parameters to pass to the tool (or null if the tool does not have parameters). The same tool can be used multiple times in the plan. If the conversation does not necessitate the use of tools, respond with an empty action plan, e.g., { actions: [], }

Implementing the routing logic

Now that we have all the pieces in place, we can put them together.

quickPrompt is a simple utility function that I use to execute prompts with expected response schema.

import { quickPrompt } from './quickPrompt'; import { type CoreMessage } from 'ai'; import multiline from 'multiline-ts'; import { type AnyZodObject, z } from 'zod'; import { zodToJsonSchema } from 'zod-to-json-schema'; const SerializableZodSchema = z.record( z.union([z.string(), z.number(), z.boolean()]), ); type Tool<P extends AnyZodObject, R extends AnyZodObject> = { description: string; execute: (parameters?: z.infer<P>) => Promise<z.infer<R>>; name: string; parameters?: P; response: R; }; const describeTool = (tool: Tool<AnyZodObject, AnyZodObject>) => { return { description: tool.description, name: tool.name, parameters: tool.parameters ? zodToJsonSchema(tool.parameters) : null, }; }; export const routeMessage = async ({ messages, tools, }: { messages: CoreMessage[]; tools: ReadonlyArray<Tool<AnyZodObject, AnyZodObject>>; }) => { return await quickPrompt({ model: 'openai@gpt-4o-mini', name: 'routeMessage', prompt: multiline` You are an assistant with access to the following tools: ${JSON.stringify( tools.map((tool) => { return describeTool(tool); }) )} Tools are described in the following format: * "description" describes what the tool does. * "name" is the name of the tool. * "parameters" is the JSON schema of the tool parameters (or null if the tool does not have parameters). You are also given the following conversation history: ${JSON.stringify(messages)} The conversation history is a list of messages exchanged between the user and the assistant. It may also describe previous actions taken by the assistant. Based on the conversation history, and the tools you have access to, propose a plan for how to answer the user's question. The response should be a JSON object with "actions" property, which is an array of tools to use. Each tool is represented as an object with the following properties: * "name": the name of the tool to use. * "parameters": the parameters to pass to the tool (or null if the tool does not have parameters). The same tool can be used multiple times in the plan. If the conversation does not necessitate the use of tools, respond with an empty action plan, e.g., { actions: [], } `, zodSchema: z.object({ actions: z.array( z.object({ name: z.string(), parameters: SerializableZodSchema.nullable(), }), ), }), }); };

This function captures the essence of the routing logic. It takes the conversation history and the tools available to the model, and returns a plan for how to answer the user's question.

Evaluating the routing logic

The idea is that every time the user asks a question, we should use routeMessage to determine if the question requires the use of tools, and if so, which tools to use.

Inside Glama, I am using routeMessage in the following way:

// We may have multiple cycles of invocations. // See the explanation after this code example. while (toolCycle < MAX_TOOL_CYCLES) { const plan = await routeMessage({ messages, tools, }); // If `routeMessage` returns an empty plan, it means that the conversation does not require the use of tools. if (plan.actions.length === 0) { break; } // For each tool we use, we need to record the invocation and the result in the conversation history. for (const action of plan.actions) { const toolCallId = randomUUID(); messages.push({ content: [ { args: action.parameters, toolCallId, toolName: action.name, type: 'tool-call' as const, }, ], role: 'assistant', }); const result = await invokeUserTool(action.name, action.parameters); messages.push({ content: [ { result, toolCallId, toolName: action.name, type: 'tool-result' as const, }, ], role: 'tool', }); } } // Now pass the messages history to the LLM which will use the recorded tool calls to generate a response. await streamAssistantResponse({ messages, signal: abortController.signal, visitor: context.visitor, });

The code above is mostly self-explanatory. The only tricky part is that we need to invoke routeMessage in a loop because the model may need to use multiple tools to answer the user's question. For example, if the user asks 'What's the weather in New York?', the model may first use a tool to reverse geocode the location and then use another tool to fetch the current weather at the given coordinates.

This appears to be the entirety of the routing logic.

Benefits of self-implemented routing logic

At the end, I prefer to implement the routing logic myself because:

It allows me to use tools with models that do not natively support tools.
I can expand the logic for how the tools are resolved. For example, I want to load a subset of tools based on the user's prompt, or I want to prioritize tools based on the frequency of use.
There is no ambiguity about the cost of using tools. To this day, I have no clue what is OpenAI's pricing is for utilising tools.