ollama_chat
Send multi-turn chat requests to Ollama models for conversational interactions requiring message history, such as follow-up questions or multi-step reasoning with context.
Instructions
Send a multi-turn chat completion request to an Ollama model. Use this tool for conversational interactions where message history matters — for example, follow-up questions, multi-step reasoning, or dialogue with context. Do not use this for single-prompt completions without history; use ollama_generate instead to avoid the overhead of the messages array. Prerequisites: The 'model' must already be installed locally. Call ollama_list_models to verify availability; use ollama_pull_model to download if missing. Behavior: Read-only (no state changes on the server), not idempotent — each call generates a new response even with identical inputs. No authentication required. No rate limits. Network-dependent; response time varies from seconds to minutes based on model size and prompt length. Safe to retry on timeout. On model-not-found error, returns an error object without throwing.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model | Yes | Exact Ollama model identifier. Must match a 'name' value from ollama_list_models output (e.g., 'llama3.1:8b', 'qwen2.5:7b'). Cloud-hosted models use a '-cloud' suffix (e.g., 'deepseek-v3:671b-cloud'). If unsure which models are available, call ollama_list_models first. | |
| messages | Yes | Ordered conversation history sent to the model. Place system instructions first (role 'system'), then alternate user/assistant turns. The model sees all messages in order. If you only need a system prompt with one user message, consider using the 'system' parameter instead of a system-role message. | |
| temperature | No | Sampling temperature controlling output randomness. 0.0 = deterministic (always pick the most likely token), 2.0 = maximum creativity. Default is model-dependent, typically ~0.7. Use low values (0.0–0.3) for factual tasks, higher (0.7–1.0) for creative tasks. | |
| max_tokens | No | Maximum number of tokens to generate in the response. Maps to Ollama's internal 'num_predict' parameter. Use -1 for unlimited generation (model stops at its natural end token). Default is model-dependent, typically ~2048. | |
| system | No | System prompt prepended before the messages array. Use this as a shortcut to set model behavior without adding a system-role message to the 'messages' array. If both this field and a system-role message are provided, this field takes precedence. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| message | No | The assistant's response message. | |
| model | Yes | The model that generated the response. | |
| total_duration | No | Total time in nanoseconds including load and inference. | |
| eval_count | No | Number of tokens generated in the response. | |
| error | No | Error message if the request failed (e.g., model not found). Only present on failure. |