ask
Send prompts to multiple AI backends using smart routing that automatically selects the best model based on task complexity, with automatic fallback and token scaling.
Instructions
MULTI-AI Direct Query - Ask any backend with smart fallback chains. Features automatic Unity detection, dynamic token scaling, and response headers with backend tracking.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model | Yes | AI backend to query: auto (smart routing selects optimal backend), local (autodiscover vLLM/llama.cpp/LM Studio), gemini (Gemini Enhanced, 32K tokens), nvidia_deepseek (NVIDIA DeepSeek with streaming + reasoning, 8K tokens), nvidia_qwen (NVIDIA Qwen3 Coder 480B, 32K tokens), openai (OpenAI GPT-5.2, 128K context, premium reasoning), groq (Llama 3.3 70B, ultra-fast 500+ t/s) | |
| prompt | Yes | Your question or prompt (Unity/complex generations automatically get high token limits) | |
| thinking | No | Enable thinking mode for DeepSeek (shows reasoning) | |
| max_tokens | No | Maximum response length (auto-calculated if not specified: Unity=16K, Complex=8K, Simple=2K) | |
| enable_chunking | No | Enable automatic request chunking for extremely large generations (fallback if truncated) | |
| force_backend | No | Force specific backend (bypasses smart routing) - use backend keys like "local", "gemini", "nvidia_deepseek", "nvidia_qwen", "openai", "groq" | |
| model_profile | No | Router mode model profile for local backend. Available profiles: coding-reap25b (complex refactoring, ~25s), coding-seed-coder (standard coding, ~8s), coding-qwen-7b (fast coding, ~10s), agents-qwen3-14b (multi-agent, ~10s), agents-seed-coder (high throughput, ~8s), fast-deepseek-lite (quick analysis, ~8s), fast-qwen14b (fast coding, ~12s) | |
| auto_profile | No | Enable automatic profile selection based on task type detection. When true, auto-selects coding-seed-coder for coding tasks if no explicit model_profile is set. |