Smart-AI-Bridge

TROUBLESHOOTING-GUIDE.md•17.6 KiB

# Smart AI Bridge v2.0.0 - Troubleshooting Guide ## Overview This guide covers common issues when running Smart AI Bridge v2.0.0, the modular MCP server with 20 tools, 6 backends, and the intelligence layer. The server entry point is `src/server.js` and it communicates via stdio transport. --- ## Server Startup Issues ### Server Won't Start **Symptoms:** ``` SyntaxError: Cannot use import statement outside a module Error [ERR_MODULE_NOT_FOUND]: Cannot find module ``` **Diagnosis and Solutions:** 1. **Check Node.js version** -- v2.0.0 requires Node.js 18.0.0 or later (ESM support). ```bash node --version # Must be >= 18.0.0 ``` 2. **Verify package.json has `"type": "module"`** -- The project uses ESM imports throughout. If this field is missing, Node will treat .js files as CommonJS and fail on `import` statements. 3. **Run npm install** -- Missing dependencies cause module resolution failures. ```bash cd /home/platano/project/smart-ai-bridge npm install ``` 4. **Check the entry point path** -- The v2.0.0 entry point is `src/server.js`, not the legacy monolithic file. ```bash # Correct node src/server.js # Also correct (uses package.json "start" script) npm start ``` 5. **Verify all source files exist** -- The modular architecture requires ~53 files in src/. If files are missing (incomplete clone or checkout), the server will fail on import. ```bash # Quick check: count source files find src -name '*.js' | wc -l # Expected: approximately 53 ``` ### Server Starts But No Tools Available **Symptoms:** ``` [SAB] ListTools: 0 tools available ``` **Solutions:** 1. **Check tool-definitions.js** -- The file `src/tools/tool-definitions.js` exports `CORE_TOOL_DEFINITIONS`. If this file has a syntax error or missing export, no tools will register. 2. **Check handler registration** -- The HandlerFactory at `src/handlers/index.js` must map every handler name referenced in tool definitions. A mismatch between `def.handler` and the factory's registry causes "Unknown tool" errors at call time, but tools will still list. 3. **Check stderr output** -- The server logs to stderr. Look for initialization errors: ```bash node src/server.js 2>server.log cat server.log ``` --- ## Backend Connection Failures ### Local LLM Not Detected **Symptoms:** ``` Error: connect ECONNREFUSED 127.0.0.1:8081 Error: Local backend health check failed ``` The local backend expects an OpenAI-compatible API at `http://127.0.0.1:8081/v1/chat/completions` (configured in `src/config/backends.json`). **Solutions:** 1. **Verify your local LLM server is running** -- Smart AI Bridge supports any OpenAI-compatible server: - **LM Studio**: Start the server, check the port in LM Studio settings - **llama.cpp server**: `./llama-server -m model.gguf --port 8081` - **vLLM**: `python -m vllm.entrypoints.openai.api_server --port 8081` - **Ollama**: `ollama serve` (default port 11434, update backends.json accordingly) 2. **Check the port** -- The default in backends.json is 8081. If your LLM server runs on a different port, update the config: ```json // src/config/backends.json -> backends.local.config.url "url": "http://127.0.0.1:YOUR_PORT/v1/chat/completions" ``` 3. **Test the endpoint directly:** ```bash curl http://127.0.0.1:8081/v1/models ``` 4. **WSL networking** -- If running the LLM server on Windows and Smart AI Bridge in WSL, use the Windows host IP instead of 127.0.0.1: ```bash # Find your Windows host IP cat /etc/resolv.conf | grep nameserver # Update backends.json with that IP ``` 5. **Model discovery** -- The local backend uses dynamic model discovery (`src/utils/model-discovery.js`). If the `/v1/models` endpoint is not available on your LLM server, the backend may fail even if `/v1/chat/completions` works. Check that your server exposes the models endpoint. ### NVIDIA API Errors **Symptoms:** ``` Error: 401 Unauthorized Error: 429 Too Many Requests Error: Model not found: deepseek-ai/deepseek-v3.2 ``` **Solutions:** 1. **Check API key** -- The NVIDIA backends require `NVIDIA_API_KEY` in your environment: ```bash echo $NVIDIA_API_KEY # Should output your key. If empty: export NVIDIA_API_KEY="nvapi-xxxxx" ``` 2. **Rate limits** -- NVIDIA's free tier has strict rate limits. If you hit 429 errors, the circuit breaker will trip (threshold: 5 failures, reset: 30 seconds per backends.json). Wait for the reset or reduce request frequency. 3. **Model availability** -- Model names change. If `deepseek-ai/deepseek-v3.2` or the Qwen model is unavailable, check the NVIDIA API catalog for current model IDs and update backends.json: ```bash curl -H "Authorization: Bearer $NVIDIA_API_KEY" \ https://integrate.api.nvidia.com/v1/models ``` 4. **Token limits** -- nvidia_deepseek is configured for 8192 max tokens, nvidia_qwen for 32768. Requests exceeding these limits will be rejected by the API. The router should handle this, but forced backend selection can bypass routing limits. ### OpenAI Connection Issues **Symptoms:** ``` Error: 401 Invalid API key Error: Connection refused to api.openai.com ``` **Solutions:** 1. **Check API key:** ```bash echo $OPENAI_API_KEY # Must be set. If empty: export OPENAI_API_KEY="sk-xxxxx" ``` 2. **Verify the model exists** -- backends.json specifies `gpt-5.2`. If this model is not available on your OpenAI plan, update the model field in backends.json. 3. **Network access** -- Ensure your environment can reach `api.openai.com`: ```bash curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" ``` 4. **Billing** -- OpenAI API requires an active billing account. A valid API key with no credits will return 429 or 402 errors. ### Groq Connection Issues **Symptoms:** ``` Error: 401 Unauthorized (Groq) Error: Model not available on Groq ``` **Solutions:** 1. **Check API key:** ```bash echo $GROQ_API_KEY # Must be set. If empty: export GROQ_API_KEY="gsk_xxxxx" ``` 2. **Model availability** -- backends.json specifies `llama-3.3-70b-versatile`. Groq periodically rotates available models. Check https://console.groq.com/docs/models for current options and update backends.json if needed. 3. **Rate limits** -- Groq's free tier has aggressive rate limits (requests per minute and tokens per minute). The circuit breaker will trip after 5 consecutive failures. ### Gemini Rate Limiting **Symptoms:** ``` Error: 429 Resource exhausted Error: Gemini rate limit exceeded ``` **Solutions:** 1. **Check API key:** ```bash echo $GEMINI_API_KEY # or echo $GOOGLE_API_KEY ``` 2. **Built-in rate limiter** -- Smart AI Bridge includes a Gemini-specific rate limiter at `src/utils/gemini-rate-limiter.js`. If you are hitting limits despite this, the configured rate may be too aggressive for your API tier. The rate limiter parameters can be adjusted in the Gemini adapter. 3. **Quota** -- The Gemini free tier has daily request limits. Check your Google AI Studio dashboard for current quota usage. --- ## Tool Errors ### "Unknown tool" Error **Symptoms:** ``` McpError: Unknown tool: my_tool. Available: ask, review, analyze_file, ... ``` **Solutions:** 1. **Check tool name** -- v2.0.0 has 20 tools. The complete list: - `ask`, `review`, `analyze_file`, `modify_file`, `batch_modify` - `explore`, `generate_file`, `refactor`, `write_files_atomic`, `validate_changes` - `backup_restore`, `batch_analyze`, `check_backend_health`, `read` - `dual_iterate`, `council`, `spawn_subagent`, `parallel_agents` - `health`, `clear_all_caches` 2. **Tool names are exact** -- No aliases, no prefixes. Use exactly the names listed above. ### Handler Not Found **Symptoms:** ``` Error: No handler registered for: someHandler ``` This means the tool definition references a handler name that the HandlerFactory does not recognize. **Solutions:** 1. Check `src/tools/tool-definitions.js` for the tool's `handler` field. 2. Check `src/handlers/index.js` to confirm that handler name is registered in the factory. 3. If you added a custom tool, ensure both the tool definition and handler registration are in sync. ### Backend Unavailable for Tool **Symptoms:** ``` Error: No available backend for this request Error: All backends failed ``` **Solutions:** 1. **Check backend health:** ``` Use the check_backend_health tool to see which backends are online. ``` 2. **Circuit breaker tripped** -- If a backend has failed 5 consecutive times (configurable in backends.json `fallbackPolicy.circuitBreakerThreshold`), it enters an open state for 30 seconds (`circuitBreakerResetMs`). Wait for the reset or fix the underlying backend issue. 3. **All cloud backends down** -- If your local LLM is not running and all API keys are missing/invalid, no backends will be available. Ensure at least one backend is properly configured. 4. **Forced backend unavailable** -- If a request forces a specific backend (e.g., `backend: "groq"`) and that backend is down, routing cannot fall back. Remove the forced backend or fix the backend. --- ## Module and Import Errors ### ESM Import Failures **Symptoms:** ``` Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/path/to/src/handlers/index' SyntaxError: The requested module does not provide an export named 'X' ``` **Solutions:** 1. **File extensions required** -- ESM requires explicit `.js` extensions in import paths. All internal imports should use full paths like `./handlers/index.js`, not `./handlers/index`. 2. **Check for missing files** -- If a git operation or file move left a file missing, imports will fail. Verify the file exists at the expected path. 3. **Circular imports** -- The modular architecture has many cross-references. If you see "Cannot access 'X' before initialization", there may be a circular import. Check the import chain between the affected modules. ### Missing Dependencies **Symptoms:** ``` Error: Cannot find package '@modelcontextprotocol/sdk' Error: Cannot find package 'openai' ``` **Solutions:** ```bash cd /home/platano/project/smart-ai-bridge npm install ``` Key dependencies and their purposes: - `@modelcontextprotocol/sdk` -- MCP protocol server and transport - `openai` -- OpenAI and Groq API client (Groq uses OpenAI-compatible SDK) - `express` -- Dashboard web server - `ws` -- WebSocket support - `diff` -- Diff generation for modify/refactor operations - `glob` -- File pattern matching for explore/batch tools - `string-similarity` -- Fuzzy matching in routing and tool resolution - `tiktoken` -- Token counting for context management - `acorn` -- JavaScript AST parsing for code analysis --- ## Dashboard Issues ### Dashboard Not Loading **Symptoms:** ``` Error: listen EADDRINUSE :::3000 Error: Cannot find module 'express' ``` **Solutions:** 1. **Port conflict** -- The dashboard (src/dashboard/dashboard-server.js) binds to a port (default 3000). If another process is using that port: ```bash # Find what's using the port lsof -i :3000 # Kill it or change the dashboard port ``` 2. **Express not installed:** ```bash npm install # express is in package.json dependencies ``` 3. **Dashboard is separate from MCP server** -- The MCP server runs on stdio. The dashboard is an optional Express HTTP server. They are independent processes. Starting the MCP server via `npm start` does not automatically start the dashboard. --- ## Configuration Issues ### backends.json Format Errors **Symptoms:** ``` SyntaxError: Unexpected token in JSON Error: Invalid backend configuration ``` **Solutions:** 1. **Validate JSON** -- backends.json must be valid JSON. Common mistakes: trailing commas, unquoted keys, comments (JSON does not support comments). ```bash node -e "JSON.parse(require('fs').readFileSync('src/config/backends.json'))" ``` 2. **Required fields per backend:** ```json { "type": "string", // Backend type identifier "enabled": true, // Must be true to be used "priority": 1, // Lower = higher priority "config": { "timeout": 60000 // Milliseconds } } ``` 3. **Adding a custom backend** -- Add a new entry to `backends.backends` in backends.json, create a corresponding adapter class extending BackendAdapter (`src/backends/backend-adapter.js`), and register it in the BackendRegistry. ### Fallback Policy Tuning The `fallbackPolicy` section in backends.json controls retry and circuit breaker behavior: ```json { "fallbackPolicy": { "maxRetries": 3, // Retries per backend before moving to next "retryDelayMs": 1000, // Delay between retries "circuitBreakerThreshold": 5, // Failures before circuit opens "circuitBreakerResetMs": 30000 // Time before circuit half-opens } } ``` If you experience cascading failures (one backend down causing slow responses as all retries exhaust), reduce `maxRetries` or `retryDelayMs`. If backends are flapping (briefly failing then recovering), increase `circuitBreakerThreshold`. --- ## Performance Issues ### Slow Tool Responses **Symptoms:** - Tool calls taking >30 seconds - Timeouts on specific backends **Solutions:** 1. **Check which backend is being used** -- Use `check_backend_health` to see latency per backend. The router may be sending requests to a slow or overloaded backend. 2. **Timeout tuning** -- Each backend has a configurable timeout in backends.json: ```json "config": { "timeout": 60000 // 60 seconds for local } ``` Reduce timeouts for backends that should be fast (groq: 30s is already set) and increase for backends that handle large contexts (local: 120s, openai: 120s). 3. **Local LLM performance** -- If the local backend is slow, check GPU utilization. Large models on insufficient VRAM will offload to CPU and become very slow. ```bash nvidia-smi # Check GPU memory usage and utilization ``` 4. **Circuit breaker stuck open** -- If a backend's circuit breaker is open, requests skip it entirely. This can concentrate load on remaining backends. Check health status and wait for reset. ### High Memory Usage **Solutions:** 1. **Learning engine accumulation** -- The in-memory learning engine, pattern RAG store, and compound learning system accumulate data over time. Restart the server to clear in-memory state. 2. **Conversation threading** -- Long sessions accumulate thread context. The `clear_all_caches` tool can help reset in-memory state. 3. **Background analysis queue** -- The background analysis queue (`src/intelligence/background-analysis-queue.js`) may accumulate pending work. Monitor queue depth and consider rate-limiting background analysis. --- ## Routing Issues ### Wrong Backend Selected **Symptoms:** - Simple requests going to expensive cloud backends - Complex requests routed to local when it cannot handle them **Solutions:** 1. **Understand the 4-tier routing:** - **Forced** -- If the request specifies `backend: "name"`, that backend is used directly - **Learning** -- The learning engine suggests a backend based on past success patterns - **Rules** -- Complexity thresholds and capability matching determine the backend - **Fallback** -- The default backend (local) is used as last resort 2. **Check complexity thresholds** in backends.json: ```json "routing": { "complexityThresholds": { "simple": 0.3, "medium": 0.6, "complex": 0.8 } } ``` 3. **Force a specific backend** -- For testing, pass `backend: "groq"` (or any backend name) in the tool arguments to bypass routing. 4. **Learning engine interference** -- If the learning engine has learned incorrect patterns (e.g., always preferring a specific backend), restart the server to clear learned data. Persistent learning (planned future feature) will need explicit reset commands. --- ## Environment Variable Reference All API keys the server may need, depending on which backends are enabled: ```bash # NVIDIA backends (nvidia_deepseek, nvidia_qwen) export NVIDIA_API_KEY="nvapi-xxxxx" # OpenAI backend export OPENAI_API_KEY="sk-xxxxx" # Groq backend export GROQ_API_KEY="gsk_xxxxx" # Gemini backend export GEMINI_API_KEY="xxxxx" # or export GOOGLE_API_KEY="xxxxx" ``` The local backend requires no API key -- it connects to a local HTTP endpoint. If a backend's API key is missing, that backend will fail on first use and the circuit breaker will eventually open. The router will then skip it and use other available backends. --- ## Diagnostic Checklist When something is not working, run through this checklist: 1. **Node.js version** -- `node --version` must be >= 18.0.0 2. **Dependencies installed** -- `npm install` in the project root 3. **Entry point** -- `node src/server.js` (not legacy file) 4. **Environment variables** -- Set API keys for the backends you want to use 5. **Local LLM** -- If using local backend, verify the server is running and accessible 6. **stderr output** -- Run `node src/server.js 2>debug.log` and check debug.log for errors 7. **Backend health** -- Use the `check_backend_health` tool to see per-backend status 8. **Config syntax** -- Validate `src/config/backends.json` is valid JSON 9. **File completeness** -- Verify all src/ files are present (`find src -name '*.js' | wc -l`) 10. **Port conflicts** -- If using dashboard, check port availability with `lsof -i :3000` --- ## Getting Help - Check stderr output -- the server logs all tool calls, backend selections, and errors to stderr - Use `check_backend_health` tool for real-time backend status - Use `health` tool for overall system status - Review `src/config/backends.json` for all configurable parameters - Check the project README and CHANGELOG for version-specific notes --- *Last Updated: February 2026* *System Version: v2.0.0*

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Platano78/Smart-AI-Bridge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

TROUBLESHOOTING-GUIDE.md•17.6 KiB