llm_stream
Stream AI responses in real-time for long-running tasks, displaying partial output immediately while routing to the appropriate model from 20+ providers.
Instructions
Stream an LLM response for long-running tasks — shows output as it arrives.
Uses the same routing logic as llm_route but streams chunks instead of
waiting for the full response. Ideal for long-form generation, research
summaries, or any task where seeing partial output early is valuable.
Args:
prompt: The task or question to stream.
task_type: Task type hint — "query", "research", "generate", "analyze", "code".
model: Optional model override (e.g. "openai/gpt-4o", "gemini/gemini-2.5-flash").
system_prompt: Optional system instructions.
temperature: Sampling temperature (0.0-2.0).
max_tokens: Maximum output tokens.Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | ||
| task_type | No | query | |
| model | No | ||
| system_prompt | No | ||
| temperature | No | ||
| max_tokens | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |