Skip to main content
Glama
Platano78

Smart-AI-Bridge

ask

Send a prompt to one AI backend and return the response. Smart routing selects the optimal backend based on task complexity, or specify a model directly.

Instructions

Send one prompt to one AI backend and return the response. model:'auto' lets SAB's router pick the best backend by task complexity + current health; passing a specific model name forces that provider. Use this for direct LLM queries that don't fit a more specialized tool. For multi-backend consensus on the same prompt, use council. For agentic multi-step work with a defined role, use spawn_subagent. For LLM-driven file generation or editing, use generate_file / modify_file so the file content stays out of Claude's context window. Read-only: makes one HTTP call to the chosen backend. Returns: {success, model, requested_backend, actual_backend, prompt (truncated preview), response (the LLM output), backend_used, fallback_chain, response_time, cache_status, thinking_enabled, max_tokens, was_truncated, smart_routing_applied, routing, processing_time}.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modelYesAI backend to query: auto (smart routing selects optimal backend), local (autodiscover vLLM/llama.cpp/LM Studio), gemini (Gemini Enhanced, 32K tokens), nvidia_deepseek (NVIDIA DeepSeek with streaming + reasoning, 8K tokens), nvidia_qwen (NVIDIA Qwen3 Coder 480B, 32K tokens), openai (OpenAI GPT-5.2, 128K context, premium reasoning), groq (Llama 3.3 70B, ultra-fast 500+ t/s). The friendly aliases `deepseek` and `qwen3` are also accepted (mapped to nvidia_deepseek / nvidia_qwen), matching the other tools.
promptYesYour question or prompt (Unity/complex generations automatically get high token limits)
thinkingNoEnable thinking mode for DeepSeek (shows reasoning)
max_tokensNoMaximum response length (auto-calculated if not specified: Unity=16K, Complex=8K, Simple=2K)
enable_chunkingNoEnable automatic request chunking for extremely large generations (fallback if truncated)
force_backendNoForce specific backend (bypasses smart routing) - use backend keys like "local", "gemini", "nvidia_deepseek", "nvidia_qwen", "openai", "groq"
model_profileNoRouter mode model profile for local backend. Available profiles: coding-reap25b (complex refactoring, ~25s), coding-seed-coder (standard coding, ~8s), coding-qwen-7b (fast coding, ~10s), agents-qwen3-14b (multi-agent, ~10s), agents-seed-coder (high throughput, ~8s), fast-deepseek-lite (quick analysis, ~8s), fast-qwen14b (fast coding, ~12s)
auto_profileNoEnable automatic profile selection based on task type detection. When true, auto-selects coding-seed-coder for coding tasks if no explicit model_profile is set.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It declares 'Read-only: makes one HTTP call to the chosen backend' and lists the return object fields. Could add details on error handling or timeouts, but the core behavioral trait is disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with purpose and usage, then distinctions, then read-only note, then return. Each sentence adds value, though it is relatively long for a simple tool. Could be slightly more concise but still well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 parameters, 100% schema coverage, no output schema, and no annotations, the description covers usage, behavior, parameters, alternatives, and return format. Lacks details on error states or performance guarantees, but overall sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description significantly enhances parameter understanding: explains enum values for model, auto-calculation of max_tokens, role of force_backend, model_profile options, and auto_profile behavior. Goes well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Send one prompt to one AI backend and return the response', providing a specific verb and resource. It distinguishes itself from siblings by naming alternatives: council, spawn_subagent, generate_file, modify_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'for direct LLM queries that don't fit a more specialized tool.' Provides clear alternatives for multi-backend consensus (council), agentic work (spawn_subagent), and file operations (generate_file/modify_file).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Platano78/Smart-AI-Bridge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server