DeepSeek Chat Completion
deepseek_chatChat with DeepSeek V4 models (flash for speed, pro for capability) offering 1M context, multi-turn sessions, function calling, thinking mode, JSON output, and multimodal input.
Instructions
Chat with DeepSeek V4 models. deepseek-v4-flash (fast, economical) and deepseek-v4-pro (most capable), both 1M context with optional chain-of-thought thinking mode. deepseek-chat and deepseek-reasoner are accepted as backward-compatible aliases (resolve to v4-flash). Features: multi-turn sessions (session_id), function calling (tools parameter), thinking mode, JSON output mode, multimodal input (when enabled), automatic cost tracking, and model fallback with circuit breaker resilience.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| messages | Yes | Array of conversation messages. Each message has role (system/user/assistant/tool) and content (string or array of content parts for multimodal). Tool messages require tool_call_id. | |
| model | No | Model to use. deepseek-v4-flash (default, fast/economical) or deepseek-v4-pro (most capable), both 1M context, up to 384K output. Non-thinking by default for speed; pass thinking:{type:"enabled"} to reason. Aliases: deepseek-chat -> v4-flash non-thinking, deepseek-reasoner -> v4-flash thinking. | deepseek-v4-flash |
| temperature | No | Sampling temperature (0-2). Higher = more random. Default: 1.0. Ignored when thinking mode is enabled. | |
| max_tokens | No | Maximum tokens to generate. V4 models support up to 384000 output tokens. | |
| stream | No | Enable streaming mode. Returns full response after streaming completes. | |
| tools | No | Array of tool definitions for function calling. Each tool has type "function" and a function object with name, description, and parameters (JSON Schema). | |
| tool_choice | No | Controls which tool the model calls. "auto" (default), "none", "required", or {type:"function",function:{name:"..."}} | |
| thinking | No | Toggle chain-of-thought thinking mode. Use {type: "enabled"} to reason, {type: "disabled"} for a fast direct answer (the default here). When enabled, temperature/top_p are ignored. | |
| reasoning_effort | No | Reasoning effort while thinking mode is active: "high" (default) or "max". Only applies when thinking is enabled. | |
| json_mode | No | Enable JSON output mode. The model will output valid JSON. Include the word "json" in your prompt for best results. Supported by both models. | |
| session_id | No | Session ID for multi-turn conversations. When provided, previous messages from this session are prepended to the current messages. If the session does not exist, it is created automatically. Omit for stateless single-turn requests. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| content | Yes | ||
| reasoning_content | No | ||
| model | Yes | ||
| usage | Yes | ||
| finish_reason | Yes | ||
| tool_calls | No | ||
| cost_usd | No | ||
| routed_from | No | ||
| session_id | No |