ask
Send a prompt to one AI backend and return the response. Smart routing selects the optimal backend based on task complexity, or specify a model directly.
Instructions
Send one prompt to one AI backend and return the response. model:'auto' lets SAB's router pick the best backend by task complexity + current health; passing a specific model name forces that provider. Use this for direct LLM queries that don't fit a more specialized tool. For multi-backend consensus on the same prompt, use council. For agentic multi-step work with a defined role, use spawn_subagent. For LLM-driven file generation or editing, use generate_file / modify_file so the file content stays out of Claude's context window. Read-only: makes one HTTP call to the chosen backend. Returns: {success, model, requested_backend, actual_backend, prompt (truncated preview), response (the LLM output), backend_used, fallback_chain, response_time, cache_status, thinking_enabled, max_tokens, was_truncated, smart_routing_applied, routing, processing_time}.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model | Yes | AI backend to query: auto (smart routing selects optimal backend), local (autodiscover vLLM/llama.cpp/LM Studio), gemini (Gemini Enhanced, 32K tokens), nvidia_deepseek (NVIDIA DeepSeek with streaming + reasoning, 8K tokens), nvidia_qwen (NVIDIA Qwen3 Coder 480B, 32K tokens), openai (OpenAI GPT-5.2, 128K context, premium reasoning), groq (Llama 3.3 70B, ultra-fast 500+ t/s). The friendly aliases `deepseek` and `qwen3` are also accepted (mapped to nvidia_deepseek / nvidia_qwen), matching the other tools. | |
| prompt | Yes | Your question or prompt (Unity/complex generations automatically get high token limits) | |
| thinking | No | Enable thinking mode for DeepSeek (shows reasoning) | |
| max_tokens | No | Maximum response length (auto-calculated if not specified: Unity=16K, Complex=8K, Simple=2K) | |
| enable_chunking | No | Enable automatic request chunking for extremely large generations (fallback if truncated) | |
| force_backend | No | Force specific backend (bypasses smart routing) - use backend keys like "local", "gemini", "nvidia_deepseek", "nvidia_qwen", "openai", "groq" | |
| model_profile | No | Router mode model profile for local backend. Available profiles: coding-reap25b (complex refactoring, ~25s), coding-seed-coder (standard coding, ~8s), coding-qwen-7b (fast coding, ~10s), agents-qwen3-14b (multi-agent, ~10s), agents-seed-coder (high throughput, ~8s), fast-deepseek-lite (quick analysis, ~8s), fast-qwen14b (fast coding, ~12s) | |
| auto_profile | No | Enable automatic profile selection based on task type detection. When true, auto-selects coding-seed-coder for coding tasks if no explicit model_profile is set. |