Skip to main content
Glama

ask

Query multiple AI backends with smart routing and automatic fallback chains. Features dynamic token scaling, Unity detection, and response tracking across local, Gemini, DeepSeek, and Qwen models.

Instructions

๐Ÿค– MULTI-AI Direct Query - Ask any backend with BLAZING FAST smart fallback chains! Features automatic Unity detection, dynamic token scaling, and response headers with backend tracking.

Input Schema

NameRequiredDescriptionDefault
enable_chunkingNoEnable automatic request chunking for extremely large generations (fallback if truncated)
force_backendNoForce specific backend (bypasses smart routing) - use backend keys like "local", "gemini", "nvidia_deepseek", "nvidia_qwen"
max_tokensNoMaximum response length (auto-calculated if not specified: Unity=16K, Complex=8K, Simple=2K)
modelYesAI backend to query: local (Qwen2.5-Coder-7B-Instruct-FP8-Dynamic, 128K+ tokens), gemini (Gemini Enhanced, 32K tokens), deepseek3.1 (NVIDIA DeepSeek V3.1, 8K tokens), qwen3 (NVIDIA Qwen3 Coder 480B, 32K tokens)
promptYesYour question or prompt (Unity/complex generations automatically get high token limits)
thinkingNoEnable thinking mode for DeepSeek (shows reasoning)

Input Schema (JSON Schema)

{ "properties": { "enable_chunking": { "default": false, "description": "Enable automatic request chunking for extremely large generations (fallback if truncated)", "type": "boolean" }, "force_backend": { "description": "Force specific backend (bypasses smart routing) - use backend keys like \"local\", \"gemini\", \"nvidia_deepseek\", \"nvidia_qwen\"", "type": "string" }, "max_tokens": { "description": "Maximum response length (auto-calculated if not specified: Unity=16K, Complex=8K, Simple=2K)", "type": "number" }, "model": { "description": "AI backend to query: local (Qwen2.5-Coder-7B-Instruct-FP8-Dynamic, 128K+ tokens), gemini (Gemini Enhanced, 32K tokens), deepseek3.1 (NVIDIA DeepSeek V3.1, 8K tokens), qwen3 (NVIDIA Qwen3 Coder 480B, 32K tokens)", "enum": [ "local", "gemini", "deepseek3.1", "qwen3" ], "type": "string" }, "prompt": { "description": "Your question or prompt (Unity/complex generations automatically get high token limits)", "type": "string" }, "thinking": { "default": true, "description": "Enable thinking mode for DeepSeek (shows reasoning)", "type": "boolean" } }, "required": [ "model", "prompt" ], "type": "object" }

Implementation Reference

  • Executes the 'ask' tool by mapping the model parameter to a backend, sending the prompt via the MultiAIRouter, recording usage analytics, and handling errors with fallback logging.
    async handleAsk(args) { const { model, prompt, thinking = false, force_backend = false } = args; const startTime = Date.now(); try { const backend = model === 'deepseek3.1' ? 'nvidia_deepseek' : model === 'qwen3' ? 'nvidia_qwen' : model; const result = await this.router.makeRequest(prompt, backend, { thinking }); await this.usageAnalytics.recordInvocation({ tool_name: 'ask', backend_used: backend, processing_time_ms: Date.now() - startTime, success: true }); return result; } catch (error) { await this.usageAnalytics.recordInvocation({ tool_name: 'ask', backend_used: model, processing_time_ms: Date.now() - startTime, success: false, error }); throw error; } }
  • Core registration of the 'ask' tool definition in the coreToolDefinitions array, specifying name, description, handler, and schema. This object is later added to this.coreTools map.
    { name: 'ask', description: '๐Ÿค– MULTI-AI Direct Query - Ask any backend with BLAZING FAST smart fallback chains! Features automatic Unity detection, dynamic token scaling, and response headers with backend tracking.', handler: 'handleAsk', schema: { type: 'object', properties: { model: { type: 'string', enum: ['local', 'gemini', 'deepseek3.1', 'qwen3'], description: 'AI backend to query: local (Qwen2.5-Coder-7B-Instruct, 128K+ tokens), gemini (Gemini Enhanced, 32K tokens), deepseek3.1 (NVIDIA DeepSeek V3.1, 8K tokens), qwen3 (NVIDIA Qwen3 Coder 480B, 32K tokens)' }, prompt: { type: 'string', description: 'Your question or prompt (Unity/complex generations automatically get high token limits)' }, thinking: { type: 'boolean', default: false, description: 'Enable reasoning mode for DeepSeek V3.1 (opt-in)' }, force_backend: { type: 'boolean', default: false, description: 'Force use of specified backend even if unhealthy (bypass smart fallback)' } }, required: ['model', 'prompt'] } },
  • Input schema for the 'ask' tool defining properties like model (enum), prompt (required string), thinking (boolean), and force_backend (boolean).
    schema: { type: 'object', properties: { model: { type: 'string', enum: ['local', 'gemini', 'deepseek3.1', 'qwen3'], description: 'AI backend to query: local (Qwen2.5-Coder-7B-Instruct, 128K+ tokens), gemini (Gemini Enhanced, 32K tokens), deepseek3.1 (NVIDIA DeepSeek V3.1, 8K tokens), qwen3 (NVIDIA Qwen3 Coder 480B, 32K tokens)' }, prompt: { type: 'string', description: 'Your question or prompt (Unity/complex generations automatically get high token limits)' }, thinking: { type: 'boolean', default: false, description: 'Enable reasoning mode for DeepSeek V3.1 (opt-in)' }, force_backend: { type: 'boolean', default: false, description: 'Force use of specified backend even if unhealthy (bypass smart fallback)' } }, required: ['model', 'prompt'] }
  • Alias definitions that map 'MKG_generate' and 'deepseek_generate' to the core 'ask' tool, enabling backwards compatibility and alternative names for the tool.
    // MKG aliases { alias: 'MKG_analyze', coreTool: 'review' }, { alias: 'MKG_generate', coreTool: 'ask' }, { alias: 'MKG_review', coreTool: 'review' }, { alias: 'MKG_edit', coreTool: 'edit_file' }, { alias: 'MKG_health', coreTool: 'health' }, // DeepSeek aliases { alias: 'deepseek_analyze', coreTool: 'review' }, { alias: 'deepseek_generate', coreTool: 'ask' }, { alias: 'deepseek_review', coreTool: 'review' }, { alias: 'deepseek_edit', coreTool: 'edit_file' }, { alias: 'deepseek_health', coreTool: 'health' } ];

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Platano78/Smart-AI-Bridge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server