llm_stream
Stream LLM responses in real-time for long-running tasks like research summaries and content generation. Get partial output early while the model continues processing.
Instructions
Stream an LLM response for long-running tasks — shows output as it arrives.
Uses the same routing logic as llm_route but streams chunks instead of waiting for the full response. Ideal for long-form generation, research summaries, or any task where seeing partial output early is valuable.
Args: prompt: The task or question to stream. task_type: Task type hint — "query", "research", "generate", "analyze", "code". model: Optional model override (e.g. "openai/gpt-4o", "gemini/gemini-2.5-flash"). system_prompt: Optional system instructions. temperature: Sampling temperature (0.0-2.0). max_tokens: Maximum output tokens.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | ||
| task_type | No | query | |
| model | No | ||
| system_prompt | No | ||
| temperature | No | ||
| max_tokens | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |