Hayhooks

hayhooks
docs
features

openai-compatibility.md•9.96 KiB

# OpenAI Compatibility Hayhooks provides OpenAI-compatible endpoints for Haystack pipelines and agents, enabling integration with OpenAI-compatible tools and frameworks. !!! tip "Open WebUI Integration" Looking to integrate with Open WebUI? Check out the complete [Open WebUI Integration](openwebui-integration.md) guide for detailed setup instructions, event handling, and advanced features. ## Overview Hayhooks can automatically generate OpenAI-compatible endpoints if you implement the `run_chat_completion` or `run_chat_completion_async` method in your pipeline wrapper. This makes Hayhooks compatible with any OpenAI-compatible client or tool, including chat interfaces, agent frameworks, and custom applications. ## Key Features - **Automatic Endpoint Generation**: OpenAI-compatible endpoints are created automatically - **Streaming Support**: Real-time streaming responses for chat interfaces - **Async Support**: High-performance async chat completion - **Multiple Integration Options**: Works with various OpenAI-compatible clients - **Open WebUI Ready**: Full support for [Open WebUI](openwebui-integration.md) with events and tool call interception ## Implementation ### Basic Chat Completion ```python from pathlib import Path from typing import Union, Generator from haystack import Pipeline from hayhooks import get_last_user_message, BasePipelineWrapper, log class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: # Initialize your pipeline pipeline_yaml = (Path(__file__).parent / "pipeline.yml").read_text() self.pipeline = Pipeline.loads(pipeline_yaml) def run_chat_completion(self, model: str, messages: list[dict], body: dict) -> str | Generator: log.trace("Running pipeline with model: {}, messages: {}, body: {}", model, messages, body) question = get_last_user_message(messages) log.trace("Question: {}", question) # Pipeline run, returns a string result = self.pipeline.run({"prompt": {"query": question}}) return result["llm"]["replies"][0] ``` ### Async Chat Completion with Streaming ```python from collections.abc import AsyncGenerator from hayhooks import async_streaming_generator, get_last_user_message, log class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: # Initialize async pipeline pipeline_yaml = (Path(__file__).parent / "pipeline.yml").read_text() self.pipeline = AsyncPipeline.loads(pipeline_yaml) async def run_chat_completion_async(self, model: str, messages: list[dict], body: dict) -> AsyncGenerator: log.trace("Running pipeline with model: {}, messages: {}, body: {}", model, messages, body) question = get_last_user_message(messages) log.trace("Question: {}", question) # Async streaming pipeline run return async_streaming_generator( pipeline=self.pipeline, pipeline_run_args={"prompt": {"query": question}}, ) ``` ## Method Signatures ### run_chat_completion(...) ```python def run_chat_completion(self, model: str, messages: list[dict], body: dict) -> str | Generator: """ Run the pipeline for OpenAI-compatible chat completion. Args: model: The pipeline name messages: List of messages in OpenAI format body: Full request body with additional parameters Returns: str: Non-streaming response Generator: Streaming response generator """ ``` ### run_chat_completion_async(...) ```python async def run_chat_completion_async(self, model: str, messages: list[dict], body: dict) -> str | AsyncGenerator: """ Async version of run_chat_completion. Args: model: The pipeline name messages: List of messages in OpenAI format body: Full request body with additional parameters Returns: str: Non-streaming response AsyncGenerator: Streaming response generator """ ``` ## Generated Endpoints When you implement chat completion methods, Hayhooks automatically creates: ### Chat Endpoints - `/{pipeline_name}/chat` - Direct chat endpoint for a specific pipeline - `/chat/completions` - OpenAI-compatible endpoint (routes to the model specified in request) - `/v1/chat/completions` - OpenAI API v1 compatible endpoint All endpoints support the standard OpenAI chat completion request format: ```json { "model": "pipeline_name", "messages": [ {"role": "user", "content": "Your message"} ], "stream": false } ``` ### Available Models Use the `/v1/models` endpoint to list all deployed pipelines that support chat completion: ```bash curl http://localhost:1416/v1/models ``` ## Streaming Responses ### Streaming Generator ```python from hayhooks import streaming_generator def run_chat_completion(self, model: str, messages: list[dict], body: dict) -> Generator: question = get_last_user_message(messages) return streaming_generator( pipeline=self.pipeline, pipeline_run_args={"prompt": {"query": question}}, ) ``` ### Async Streaming Generator ```python from hayhooks import async_streaming_generator async def run_chat_completion_async(self, model: str, messages: list[dict], body: dict) -> AsyncGenerator: question = get_last_user_message(messages) return async_streaming_generator( pipeline=self.pipeline, pipeline_run_args={"prompt": {"query": question}}, ) ``` ## Using Hayhooks with Haystack's OpenAIChatGenerator Hayhooks' OpenAI-compatible endpoints can be used as a backend for Haystack's `OpenAIChatGenerator`, enabling you to create pipelines that consume other Hayhooks-deployed pipelines: ```python from haystack.components.generators.chat import OpenAIChatGenerator from haystack.utils import Secret from haystack.dataclasses import ChatMessage # Connect to a Hayhooks-deployed pipeline client = OpenAIChatGenerator( model="chat_with_website", # Your deployed pipeline name api_key=Secret.from_token("not-used"), # Hayhooks doesn't require authentication api_base_url="http://localhost:1416/v1/", streaming_callback=lambda chunk: print(chunk.content, end="") ) # Use it like any OpenAI client result = client.run([ChatMessage.from_user("What is Haystack?")]) print(result["replies"][0].content) ``` This enables powerful use cases: - **Pipeline Composition**: Chain multiple Hayhooks pipelines together - **Testing**: Test your pipelines using Haystack's testing tools - **Hybrid Deployments**: Mix local and remote pipeline execution !!! warning "Limitations" If you customize your Pipeline wrapper to emit [Open WebUI Events](../features/openwebui-integration.md#open-webui-events), it may break out-of-the-box compatibility with Haystack's `OpenAIChatGenerator`. ## Examples ### Sync Chat Pipeline (Non-Streaming) ```python class SyncChatWrapper(BasePipelineWrapper): def setup(self) -> None: from haystack.components.builders import ChatPromptBuilder from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage template = [ChatMessage.from_user("Answer: {{query}}")] chat_prompt_builder = ChatPromptBuilder(template=template) llm = OpenAIChatGenerator(model="gpt-4o-mini") self.pipeline = Pipeline() self.pipeline.add_component("chat_prompt_builder", chat_prompt_builder) self.pipeline.add_component("llm", llm) self.pipeline.connect("chat_prompt_builder.prompt", "llm.messages") def run_chat_completion(self, model: str, messages: list[dict], body: dict) -> str: question = get_last_user_message(messages) result = self.pipeline.run({"chat_prompt_builder": {"query": question}}) return result["llm"]["replies"][0].content ``` ### Async Streaming Pipeline ```python class AsyncStreamingWrapper(BasePipelineWrapper): def setup(self) -> None: from haystack.components.builders import ChatPromptBuilder from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage template = [ChatMessage.from_user("Answer: {{query}}")] chat_prompt_builder = ChatPromptBuilder(template=template) llm = OpenAIChatGenerator(model="gpt-4o") self.pipeline = Pipeline() self.pipeline.add_component("chat_prompt_builder", chat_prompt_builder) self.pipeline.add_component("llm", llm) self.pipeline.connect("chat_prompt_builder.prompt", "llm.messages") async def run_chat_completion_async(self, model: str, messages: list[dict], body: dict) -> AsyncGenerator: question = get_last_user_message(messages) return async_streaming_generator( pipeline=self.pipeline, pipeline_run_args={"chat_prompt_builder": {"query": question}}, ) ``` ## Request Parameters The OpenAI-compatible endpoints support standard parameters from the `body` argument: ```python def run_chat_completion(self, model: str, messages: list[dict], body: dict) -> str: # Access additional parameters temperature = body.get("temperature", 0.7) max_tokens = body.get("max_tokens", 150) stream = body.get("stream", False) # Use them in your pipeline result = self.pipeline.run({ "llm": { "generation_kwargs": { "temperature": temperature, "max_tokens": max_tokens } } }) return result["llm"]["replies"][0].content ``` **Common parameters include:** - `temperature`: Controls randomness (0.0 to 2.0) - `max_tokens`: Maximum number of tokens to generate - `stream`: Enable streaming responses - `stop`: Stop sequences - `top_p`: Nucleus sampling parameter See the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create) for the complete list of parameters. ## Next Steps - [Open WebUI Integration](openwebui-integration.md) - Use Hayhooks with Open WebUI chat interface - [Examples](../examples/overview.md) - Working examples and use cases - [File Upload Support](file-upload-support.md) - Handle file uploads in pipelines

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/deepset-ai/hayhooks'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

openai-compatibility.md•9.96 KiB