Smart-AI-Bridge

CONFIGURATION.md•10.4 KiB

# Smart AI Bridge v2.0.0 - Configuration Guide ## Backend Configuration ### Single Source of Truth: `src/config/backends.json` All backend configuration lives in `src/config/backends.json`. This file is loaded by the `BackendRegistry` at startup and defines every backend, its adapter type, priority, and connection details. ```json { "version": "2.0.0", "description": "Smart AI Bridge v2.0.0 Backend Configuration", "backends": { "local": { "type": "local", "enabled": true, "priority": 1, "description": "Local inference (dynamic model discovery)", "capabilities": "dynamic", "context_limit": 65536, "strengths": "Large context, free inference", "config": { "url": "http://127.0.0.1:8081/v1/chat/completions", "model": "dynamic", "maxTokens": 65536, "timeout": 120000 } }, "nvidia_deepseek": { "type": "nvidia_deepseek", "enabled": true, "priority": 2, "description": "NVIDIA DeepSeek (reasoning, 8K tokens)", "capabilities": ["deep_reasoning", "security_focus"], "context_limit": 8192, "strengths": "Complex reasoning, security analysis", "config": { "maxTokens": 8192, "timeout": 60000, "url": "https://integrate.api.nvidia.com/v1/chat/completions", "model": "deepseek-ai/deepseek-v3.2" } }, "nvidia_qwen": { "type": "nvidia_qwen", "enabled": true, "priority": 3, "description": "NVIDIA Qwen (coding, 32K tokens)", "capabilities": ["code_specialized", "deep_reasoning"], "context_limit": 32768, "config": { "maxTokens": 32768, "timeout": 60000 } }, "gemini": { "type": "gemini", "enabled": true, "priority": 4, "description": "Google Gemini (fast, 32K tokens)", "config": { "maxTokens": 32768, "timeout": 60000 } }, "openai_chatgpt": { "type": "openai", "enabled": true, "priority": 5, "description": "OpenAI GPT-5.2 (premium reasoning, 128K context)", "config": { "model": "gpt-5.2", "maxTokens": 128000, "timeout": 120000 } }, "groq_llama": { "type": "groq", "enabled": true, "priority": 6, "description": "Groq Llama 3.3 70B (ultra-fast 500+ t/s)", "config": { "model": "llama-3.3-70b-versatile", "maxTokens": 32768, "timeout": 30000 } } }, "fallbackPolicy": { "maxRetries": 3, "retryDelayMs": 1000, "circuitBreakerThreshold": 5, "circuitBreakerResetMs": 30000 }, "routing": { "defaultBackend": "local", "complexityThresholds": { "simple": 0.3, "medium": 0.6, "complex": 0.8 } } } ``` ### Backend Configuration Fields | Field | Type | Description | |-------|------|-------------| | `type` | string | Adapter type: `local`, `nvidia_deepseek`, `nvidia_qwen`, `gemini`, `openai`, `groq` | | `enabled` | boolean | Whether the backend is active | | `priority` | number | Fallback chain order (lower = higher priority) | | `description` | string | Human-readable description | | `capabilities` | string/array | Backend capabilities for routing decisions | | `context_limit` | number | Maximum context window in tokens | | `strengths` | string | What the backend excels at | | `config.url` | string | API endpoint URL | | `config.model` | string | Model identifier | | `config.maxTokens` | number | Maximum response tokens | | `config.timeout` | number | Request timeout in milliseconds | | `config.apiKey` | string | API key (or `$ENV_VAR_NAME` to read from environment) | ### Custom Backends Additional backends can be added at `data/backends-custom.json`. These extend or override the main config: ```json { "backends": { "my_custom_backend": { "type": "openai", "enabled": true, "priority": 7, "description": "My custom OpenAI-compatible endpoint", "config": { "url": "https://my-api.example.com/v1/chat/completions", "apiKey": "$MY_CUSTOM_API_KEY", "model": "my-model", "maxTokens": 16384, "timeout": 60000 } } } } ``` ## Environment Variables ### API Keys These are the primary environment variables consumed by the backend adapters: ```bash # NVIDIA API (used by nvidia_deepseek and nvidia_qwen backends) NVIDIA_API_KEY=your-nvidia-api-key # OpenAI API OPENAI_API_KEY=your-openai-api-key # Google Gemini API GEMINI_API_KEY=your-gemini-api-key # Groq API GROQ_API_KEY=your-groq-api-key ``` ### Server Configuration ```bash # Node environment NODE_ENV=production # production | development # MCP logging (all output goes to stderr) MCP_LOG_LEVEL=info # silent | error | warn | info | debug ``` ### Local Model Configuration ```bash # Local model endpoint (if not using default from backends.json) LOCAL_MODEL_ENDPOINT=http://localhost:8081/v1 # Local model port (for auto-discovery) MKG_SERVER_PORT=8081 ``` ### API Key References in backends.json Backend configurations can reference environment variables using the `$` prefix: ```json { "config": { "apiKey": "$NVIDIA_API_KEY" } } ``` The `BackendRegistry` resolves `$NVIDIA_API_KEY` to the value of `process.env.NVIDIA_API_KEY` at startup. ## Fallback Policy The fallback policy in `backends.json` controls retry behavior: ```json { "fallbackPolicy": { "maxRetries": 3, "retryDelayMs": 1000, "circuitBreakerThreshold": 5, "circuitBreakerResetMs": 30000 } } ``` | Field | Description | |-------|-------------| | `maxRetries` | Number of retry attempts per backend before moving to next | | `retryDelayMs` | Delay between retries (milliseconds) | | `circuitBreakerThreshold` | Consecutive failures before circuit opens | | `circuitBreakerResetMs` | Time before circuit breaker resets (milliseconds) | ## Routing Configuration ```json { "routing": { "defaultBackend": "local", "complexityThresholds": { "simple": 0.3, "medium": 0.6, "complex": 0.8 } } } ``` The `MultiAIRouter` uses these thresholds when applying rule-based routing (Tier 3). Complex tasks are routed to higher-capability backends (nvidia_qwen), while simple tasks stay on the default backend. ## Claude Code MCP Configuration ### Basic Configuration ```json { "mcpServers": { "smart-ai-bridge": { "command": "node", "args": ["src/server.js"], "cwd": "/path/to/smart-ai-bridge", "env": { "NVIDIA_API_KEY": "your-nvidia-api-key", "OPENAI_API_KEY": "your-openai-api-key", "GEMINI_API_KEY": "your-gemini-api-key", "GROQ_API_KEY": "your-groq-api-key" } } } } ``` ### Full Production Configuration ```json { "mcpServers": { "smart-ai-bridge": { "command": "node", "args": ["src/server.js"], "cwd": "/path/to/smart-ai-bridge", "env": { "NODE_ENV": "production", "MCP_LOG_LEVEL": "warn", "NVIDIA_API_KEY": "${NVIDIA_API_KEY}", "OPENAI_API_KEY": "${OPENAI_API_KEY}", "GEMINI_API_KEY": "${GEMINI_API_KEY}", "GROQ_API_KEY": "${GROQ_API_KEY}" } } } } ``` ### Development Configuration ```json { "mcpServers": { "smart-ai-bridge-dev": { "command": "node", "args": ["src/server.js"], "cwd": "/path/to/smart-ai-bridge", "env": { "NODE_ENV": "development", "MCP_LOG_LEVEL": "debug" } } } } ``` ## Local Model Setup ### vLLM via Docker ```yaml services: qwen3-coder: image: vllm/vllm-openai:latest container_name: qwen3-coder ports: - "8081:8000" environment: - NVIDIA_VISIBLE_DEVICES=all command: [ "--model", "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8", "--host", "0.0.0.0", "--port", "8000", "--max-model-len", "32768", "--gpu-memory-utilization", "0.85", "--trust-remote-code" ] deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] restart: unless-stopped ``` ### LM Studio 1. Load a model in LM Studio 2. Start the local server (default port 1234) 3. Update `backends.json` local config URL: ```json { "local": { "config": { "url": "http://localhost:1234/v1/chat/completions" } } } ``` ### Ollama ```bash ollama serve ollama run qwen2.5-coder:14b ``` Update the local backend URL to `http://localhost:11434/v1/chat/completions`. ## Cross-Platform Notes ### Windows (WSL2) If your local model server runs on the Windows host while Smart AI Bridge runs in WSL2, use the WSL2 gateway IP: ```json { "local": { "config": { "url": "http://172.23.16.1:8081/v1/chat/completions" } } } ``` Ensure the model server binds to `0.0.0.0` (not `127.0.0.1`) to accept WSL2 connections. ### Linux / macOS Standard localhost works: ```json { "local": { "config": { "url": "http://127.0.0.1:8081/v1/chat/completions" } } } ``` ## Subagent Backend Overrides Override which backend handles specific subagent roles: ```bash # Per-role overrides SUBAGENT_BACKEND_CODE_REVIEWER=nvidia_deepseek SUBAGENT_BACKEND_SECURITY_AUDITOR=nvidia_deepseek SUBAGENT_BACKEND_PLANNER=nvidia_qwen SUBAGENT_BACKEND_TEST_GENERATOR=nvidia_deepseek SUBAGENT_BACKEND_DOCUMENTATION_WRITER=gemini SUBAGENT_BACKEND_TDD_DECOMPOSER=nvidia_qwen SUBAGENT_BACKEND_TDD_TEST_WRITER=nvidia_deepseek SUBAGENT_BACKEND_TDD_IMPLEMENTER=nvidia_qwen SUBAGENT_BACKEND_TDD_QUALITY_REVIEWER=nvidia_deepseek # Or set a global default SUBAGENT_DEFAULT_BACKEND=nvidia_qwen ``` ## Disabling Backends To disable a backend without removing it from configuration, set `enabled: false`: ```json { "openai_chatgpt": { "type": "openai", "enabled": false, "priority": 5, "config": {} } } ``` Disabled backends are excluded from the fallback chain and will not receive requests. ## Configuration Validation After modifying `backends.json`, verify the server starts correctly: ```bash node src/server.js 2>&1 | head -5 # Expected output: # Smart AI Bridge v2.0.0 starting... # [BackendRegistry] Initialized 6 backends from backends.json # [Router] MultiAIRouter initialized # Smart AI Bridge v2.0.0 connected via stdio # Tools: 20 | Backends: 6 ``` Then use `check_backend_health` to verify each backend: ``` @check_backend_health({ "backend": "local", "force": true }) @check_backend_health({ "backend": "nvidia_deepseek", "force": true }) ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Platano78/Smart-AI-Bridge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CONFIGURATION.md•10.4 KiB