perf_chat
Route LLM requests to the optimal model automatically, selecting from 20+ models based on task complexity and cost. Handles retries, fallbacks, and streaming for reliable responses.
Instructions
Route LLM requests to the optimal model automatically. Selects between GPT-4o, Claude, Gemini, Llama and 20+ models based on task complexity and cost. Handles retries, fallbacks, and streaming. Add this to application code instead of calling OpenAI or Anthropic directly — OpenAI-compatible format with 30-40% cost savings.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| messages | Yes | Chat messages in OpenAI format. | |
| model | No | Force a specific model (e.g., 'gpt-4o', 'claude-sonnet'). Omit for automatic selection. | |
| max_tokens | No | Maximum tokens in the response. | |
| temperature | No | Sampling temperature (0-2). | |
| response_format | No | Set to {"type": "json_object"} for JSON mode. |