Oxide

oxide
docs

configuration.md•20.7 KiB

# Oxide Configuration Guide Complete reference for configuring Oxide LLM Orchestrator. ## Table of Contents 1. [Configuration Files](#configuration-files) 2. [default.yaml Reference](#defaultyaml-reference) 3. [models.yaml Reference](#modelsyaml-reference) 4. [Service Configuration](#service-configuration) 5. [Routing Rules](#routing-rules) 6. [Advanced Configuration](#advanced-configuration) 7. [Configuration Validation](#configuration-validation) 8. [Hot Reload](#hot-reload) --- ## Configuration Files Oxide uses two main YAML configuration files: | File | Purpose | Location | |------|---------|----------| | **default.yaml** | Main configuration for services, routing, and system settings | `config/default.yaml` | | **models.yaml** | Model capability profiles and optimization guidance | `config/models.yaml` | ### File Locations ```bash oxide/ ├── config/ │ ├── default.yaml # Main configuration │ └── models.yaml # Model profiles ``` --- ## default.yaml Reference ### Services Section Configure LLM services (CLI tools and HTTP APIs). #### CLI Service Configuration ```yaml services: gemini: type: cli # Service type: 'cli' or 'http' executable: gemini # Command to execute (must be in PATH) enabled: true # Enable/disable service max_context_tokens: 2000000 # Maximum context window capabilities: # Service capabilities (for routing) - codebase_analysis - architecture_design - multi_file_context ``` **Required fields for CLI services:** - `type: cli` - `executable` - Command name (e.g., `gemini`, `qwen`) - `enabled` - Boolean flag **Optional fields:** - `max_context_tokens` - Maximum tokens for context window - `capabilities` - List of task types this service excels at #### HTTP Service Configuration (Ollama) ```yaml services: ollama_local: type: http base_url: "http://localhost:11434" api_type: ollama # 'ollama' or 'openai_compatible' enabled: true # Model configuration default_model: "qwen2.5-coder:7b" models: - "qwen2.5-coder:7b" - "qwen2.5-coder:14b" - "qwen3-coder:latest" # Auto-start and recovery auto_start: true # Auto-start service if not running auto_detect_model: true # Auto-detect model if default fails max_retries: 2 # Retry attempts on failure retry_delay: 2 # Seconds between retries capabilities: - quick_query - code_generation - documentation ``` **Required fields for HTTP services:** - `type: http` - `base_url` - Full URL including port - `api_type` - Either `ollama` or `openai_compatible` - `enabled` - Boolean flag **Optional fields:** - `default_model` - Model to use (null for auto-detection) - `models` - List of available models - `auto_start` - Auto-start service (only for local Ollama) - `auto_detect_model` - Fallback to any available model - `max_retries` - Retry attempts (default: 2) - `retry_delay` - Delay between retries in seconds (default: 2) - `preferred_models` - List of model names in priority order (for LM Studio) - `capabilities` - List of task types #### HTTP Service Configuration (LM Studio) ```yaml services: lmstudio: type: http base_url: "http://192.168.1.33:1234/v1" # LM Studio OpenAI-compatible endpoint api_type: openai_compatible enabled: true # Model detection (CRITICAL for LM Studio) default_model: null # Always null - LM Studio requires auto-detection auto_start: false # Cannot auto-start remote LM Studio auto_detect_model: true # REQUIRED: Must be true max_retries: 2 retry_delay: 2 # Preferred models (in priority order) preferred_models: - "qwen" - "coder" - "codellama" - "deepseek" ``` **LM Studio Specific Notes:** - **ALWAYS** set `auto_detect_model: true` - LM Studio doesn't persist model names - Use `preferred_models` to prioritize which model to select - The adapter will query `/v1/models` endpoint to find loaded model - Best match from `preferred_models` will be selected **How Preferred Models Work:** 1. **Exact Match**: If a preferred model name exactly matches an available model, it's selected immediately 2. **Partial Match**: If no exact match, searches for models containing the preferred name (case-insensitive) - Example: `"qwen"` matches `"qwen2.5-coder-7b"` - Example: `"coder"` matches `"qwen2.5-coder-7b"` or `"deepseek-coder-6.7b"` 3. **Priority Order**: Checks preferred models in the order specified 4. **Fallback**: If no preferred models match, selects the first available model **Common Error Messages:** *Connection Refused:* ``` Cannot connect to lmstudio at http://localhost:1234 Please check: 1. Is LM Studio running? 2. Is the server started in LM Studio (Local Server tab)? 3. Is the base_url correct in your configuration? 4. Is the port accessible? (Network/firewall issues) ``` *Model Not Found:* ``` Model 'qwen2.5-coder-7b' not found in lmstudio. Please ensure the model is loaded in LM Studio. Available models can be checked via the web UI. ``` *Server Error (500):* ``` lmstudio internal error. This may indicate the model crashed or ran out of memory. Try restarting LM Studio or loading a smaller model. ``` **Troubleshooting:** - Verify LM Studio is running and server is started (green indicator in Local Server tab) - Check the base_url includes `/v1` prefix for OpenAI-compatible API - Ensure at least one model is loaded in LM Studio - For memory issues, try a smaller/quantized model (7B instead of 13B, Q4 instead of Q8) - Check firewall settings if connecting to remote LM Studio instance ### Routing Rules Section Define how tasks are routed to services based on task type. ```yaml routing_rules: # Task type as key code_review: primary: qwen # Primary service to use fallback: # Fallback services (in order) - ollama_local - ollama_remote timeout_seconds: 60 # Task timeout (optional) codebase_analysis: primary: gemini fallback: - qwen - ollama_local parallel_threshold_files: 20 # Use parallel execution if >20 files timeout_seconds: 180 ``` **Fields:** - `primary` (required) - Primary service name (must exist in `services`) - `fallback` (optional) - List of fallback service names - `timeout_seconds` (optional) - Override default timeout for this task type - `parallel_threshold_files` (optional) - Enable parallel execution threshold **Available task types:** - `codebase_analysis` - Large codebase analysis - `code_review` - Code quality review - `code_generation` - Writing new code - `quick_query` - Fast question answering - `architecture_design` - System design - `debugging` - Finding and fixing bugs - `documentation` - Writing docs - `refactoring` - Code refactoring ### Execution Settings ```yaml execution: max_parallel_workers: 4 # Maximum parallel task workers timeout_seconds: 120 # Default timeout for all tasks streaming: true # Enable streaming responses retry_on_failure: true # Retry failed tasks max_retries: 5 # Maximum retry attempts ``` ### Logging Configuration ```yaml logging: level: INFO # Log level: DEBUG, INFO, WARNING, ERROR file: /tmp/oxide.log # Log file path (null to disable) console: true # Enable console logging ``` ### Memory Configuration ```yaml memory: enabled: true # Enable context memory storage_path: ~/.oxide/memory.json # Storage location max_conversations: 1000 # Max conversations to store max_age_days: 30 # Auto-prune older than this max_messages_per_conversation: 100 # Limit per conversation auto_prune_enabled: true # Enable auto-pruning similarity_threshold: 0.5 # Minimum similarity for retrieval ``` **Memory features:** - Persistent conversation history - Context retrieval based on similarity - Automatic pruning of old conversations - Metadata storage for tracking ### Cluster Configuration ```yaml cluster: enabled: false # Enable multi-machine coordination broadcast_port: 8888 # UDP port for node discovery api_port: 8000 # HTTP API port discovery_interval: 30 # Seconds between discovery broadcasts load_balancing: true # Enable load-based routing ``` **Cluster features:** - Auto-discovery via UDP broadcast - Load-based task distribution - Remote task execution - Health monitoring --- ## models.yaml Reference Model capability profiles for optimization guidance. ```yaml model_profiles: gemini: name: "Google Gemini" strengths: - "Large context window (2M+ tokens)" - "Multi-file analysis capability" - "Excellent architectural understanding" weaknesses: - "Higher latency compared to local models" - "Network dependency" optimal_for: - "Analyzing entire codebases (100+ files)" - "Understanding system architecture" - "Large-scale refactoring planning" ``` **Fields:** - `name` - Human-readable model name - `strengths` - List of model advantages - `weaknesses` - List of model limitations - `optimal_for` - Best use cases **Purpose:** - Documentation and guidance - Not used by routing system (use `routing_rules` instead) - Helps users understand when to use each model --- ## Service Configuration ### Service Types #### 1. CLI Services CLI services execute command-line tools like `gemini` or `qwen`. **Configuration:** ```yaml service_name: type: cli executable: command_name enabled: true ``` **Requirements:** - Command must be in PATH - Must support streaming output - Should accept prompt as argument **Examples:** - Gemini CLI (`gemini -p "prompt"`) - Qwen CLI (`qwen -p "prompt"`) #### 2. HTTP Services (Ollama) Ollama provides a local HTTP API for LLM inference. **Configuration:** ```yaml ollama_local: type: http base_url: "http://localhost:11434" api_type: ollama default_model: "qwen2.5-coder:7b" auto_start: true auto_detect_model: true ``` **Auto-start feature:** - Detects if Ollama is running - Attempts to start Ollama if not running - Platform-specific start commands: - macOS: `open -a Ollama` or `ollama serve` - Linux: `systemctl start ollama` or `ollama serve` - Windows: Starts Ollama.exe **Auto-detect model:** - If `default_model` fails, tries other models in `models` list - If no models specified, uses first available model - Queries `/api/tags` to get available models #### 3. HTTP Services (LM Studio) LM Studio provides OpenAI-compatible API. **Configuration:** ```yaml lmstudio: type: http base_url: "http://192.168.1.33:1234/v1" api_type: openai_compatible default_model: null # Always null auto_detect_model: true # Always true preferred_models: - "qwen" - "coder" ``` **Important:** - LM Studio doesn't persist model names - Must use auto-detection every time - Queries `/v1/models` endpoint - Selects best match from `preferred_models` --- ## Routing Rules ### How Routing Works 1. **Task Classification** - Oxide classifies the task type 2. **Primary Service** - Tries primary service first 3. **Fallback Chain** - If primary fails, tries fallbacks in order 4. **Retry Logic** - Retries failed requests (configurable) ### Creating Routing Rules ```yaml routing_rules: task_type: primary: service_name fallback: - fallback1 - fallback2 timeout_seconds: 60 ``` **Best Practices:** - Put fastest/cheapest service as primary for quick tasks - Put most capable service as primary for complex tasks - Order fallbacks by capability and availability - Set appropriate timeouts for each task type ### Example Configurations #### Cost-Optimized (Prefer Local) ```yaml code_review: primary: ollama_local # Free and fast fallback: - qwen # CLI fallback - gemini # Cloud fallback ``` #### Quality-Optimized (Prefer Cloud) ```yaml code_review: primary: gemini # Best quality fallback: - qwen # Good fallback - ollama_local # Last resort ``` #### Speed-Optimized (Local Only) ```yaml quick_query: primary: ollama_local # Fastest fallback: - lmstudio # Another local option ``` --- ## Advanced Configuration ### Environment-Specific Configs Create different configs for different environments: ```bash config/ ├── default.yaml # Development config ├── production.yaml # Production config └── local.yaml # Local-only config ``` Load specific config: ```python from oxide.config.loader import load_config from pathlib import Path config = load_config(Path("config/production.yaml")) ``` ### Service Priority by Capability Match services to task capabilities: ```yaml services: gemini: capabilities: - codebase_analysis - architecture_design qwen: capabilities: - code_review - code_generation - debugging routing_rules: # Route to service with matching capability codebase_analysis: primary: gemini # Has codebase_analysis capability code_review: primary: qwen # Has code_review capability ``` ### Timeout Configuration Set timeouts at multiple levels: ```yaml # Global default timeout execution: timeout_seconds: 120 # Task-specific timeout routing_rules: quick_query: timeout_seconds: 10 # Override for fast tasks codebase_analysis: timeout_seconds: 300 # Override for slow tasks ``` **Priority (highest to lowest):** 1. Task-specific timeout in routing rule 2. Global execution timeout 3. Hard-coded default (120s) --- ## Configuration Validation ### Automatic Validation Oxide validates configuration on load using Pydantic schemas. **Validation checks:** - ✅ At least one service is enabled - ✅ CLI services have `executable` defined - ✅ HTTP services have `base_url` defined - ✅ Routing rules reference existing services - ✅ YAML syntax is correct - ✅ Required fields are present ### Validation Errors **Example error:** ``` ConfigError: Routing rule for 'code_review' references unknown primary service: nonexistent_service ``` **How to fix:** 1. Check spelling of service name 2. Ensure service is defined in `services` section 3. Check that service name matches exactly (case-sensitive) ### Manual Validation Validate config without starting Oxide: ```python from oxide.config.loader import load_config from pathlib import Path try: config = load_config(Path("config/default.yaml")) print("✅ Configuration is valid!") except Exception as e: print(f"❌ Configuration error: {e}") ``` --- ## Hot Reload ### Enabling Hot Reload Hot reload allows configuration changes without restarting Oxide. **Configuration:** ```yaml # Add to default.yaml (future feature) hot_reload: enabled: true watch_interval: 5 # Check for changes every 5 seconds auto_reload: true # Automatically reload on change ``` ### What Gets Reloaded **Hot reloadable:** - ✅ Service enable/disable - ✅ Routing rules - ✅ Timeouts - ✅ Logging settings - ✅ Memory settings **Not hot reloadable (requires restart):** - ❌ Service type changes (CLI ↔ HTTP) - ❌ Service URLs/executables - ❌ API types - ❌ Cluster settings ### Manual Reload Trigger reload via API: ```bash # Reload configuration curl -X POST http://localhost:8000/api/config/reload # Response: { "status": "reloaded", "changes": { "services_changed": ["ollama_local"], "routing_rules_changed": ["code_review"] } } ``` ### Web UI Reload Access configuration panel in web UI: 1. Navigate to **Settings** → **Configuration** 2. Click **Reload Configuration** 3. Review changes before applying --- ## Common Configuration Patterns ### Pattern 1: Local Development Fast, free, private local inference: ```yaml services: ollama_local: enabled: true auto_start: true gemini: enabled: false # Disable cloud for privacy qwen: enabled: false routing_rules: code_review: primary: ollama_local code_generation: primary: ollama_local ``` ### Pattern 2: Cloud-First Best quality, don't care about cost: ```yaml services: gemini: enabled: true qwen: enabled: true ollama_local: enabled: false # Disable local routing_rules: code_review: primary: gemini fallback: [qwen] ``` ### Pattern 3: Hybrid (Recommended) Balance speed, quality, and cost: ```yaml services: gemini: { enabled: true } qwen: { enabled: true } ollama_local: { enabled: true, auto_start: true } routing_rules: # Fast tasks → local quick_query: primary: ollama_local fallback: [qwen] # Quality tasks → cloud code_review: primary: qwen fallback: [ollama_local, gemini] # Large tasks → high-context codebase_analysis: primary: gemini fallback: [qwen, ollama_local] ``` ### Pattern 4: Multi-Machine Cluster Distributed processing across LAN: ```yaml # Machine 1 (main) cluster: enabled: true api_port: 8000 services: ollama_local: { enabled: true } gemini: { enabled: true } ``` ```yaml # Machine 2 (worker) cluster: enabled: true api_port: 8001 services: ollama_remote: { enabled: true, base_url: "http://192.168.1.46:11434" } ``` --- ## Troubleshooting ### Service Won't Start **Problem:** Service shows as unhealthy **Solutions:** 1. Check if executable is in PATH (for CLI services) 2. Check if URL is reachable (for HTTP services) 3. Check logs: `tail -f /tmp/oxide.log` 4. Test manually: `gemini -p "test"` or `curl http://localhost:11434/api/tags` ### Routing Not Working **Problem:** Tasks always go to fallback **Solutions:** 1. Check primary service is enabled 2. Check primary service is healthy 3. Review routing rules match task type 4. Check service capabilities ### Configuration Not Loading **Problem:** Changes don't take effect **Solutions:** 1. Restart Oxide (if hot reload not enabled) 2. Check YAML syntax: `yamllint config/default.yaml` 3. Check validation errors in logs 4. Use config API to view current config --- ## Configuration API ### View Current Configuration ```bash # Get full configuration GET /api/config # Response: { "services": { ... }, "routing_rules": { ... }, "execution": { ... } } ``` ### Update Configuration ```bash # Update specific service PATCH /api/config/services/ollama_local { "enabled": false } # Update routing rule PATCH /api/config/routing_rules/code_review { "primary": "gemini" } ``` ### Validate Configuration ```bash # Validate configuration changes POST /api/config/validate { "services": { ... } } # Response: { "valid": true, "errors": [] } ``` --- ## Best Practices ### 1. Start Simple Begin with minimal config and add complexity as needed: ```yaml services: ollama_local: type: http base_url: "http://localhost:11434" api_type: ollama enabled: true routing_rules: code_review: primary: ollama_local ``` ### 2. Use Auto-Detection Let Oxide detect models automatically: ```yaml ollama_local: default_model: null # Let Oxide choose auto_detect_model: true ``` ### 3. Set Appropriate Timeouts Match timeouts to task complexity: - Quick queries: 10-30s - Code generation: 30-60s - Code review: 60-120s - Codebase analysis: 120-300s ### 4. Enable Retries Always enable retries for production: ```yaml execution: retry_on_failure: true max_retries: 3 ``` ### 5. Monitor and Adjust Use metrics to optimize configuration: - Track success rates per service - Measure response times - Monitor costs (if using cloud services) - Adjust routing based on performance --- ## Summary ### Quick Reference | Config File | Purpose | |-------------|---------| | `default.yaml` | Main configuration | | `models.yaml` | Model capability profiles | ### Key Sections | Section | Purpose | |---------|---------| | `services` | Define LLM services | | `routing_rules` | Task routing logic | | `execution` | Execution settings | | `logging` | Logging configuration | | `memory` | Context memory settings | | `cluster` | Multi-machine setup | ### Important Concepts - **Service Types**: CLI or HTTP - **Auto-start**: Ollama can start automatically - **Auto-detect**: Models can be detected automatically - **Routing**: Primary service with fallback chain - **Hot Reload**: Configuration changes without restart (future) - **Validation**: Automatic validation on load --- ## Next Steps 1. **Review examples** in this guide 2. **Customize** `config/default.yaml` for your needs 3. **Test** services with health checks 4. **Monitor** performance and adjust routing 5. **Enable hot reload** for dynamic adjustments For more help, see: - [API Documentation](./api.md) - [Service Health Monitoring](./monitoring.md) - [Troubleshooting Guide](./troubleshooting.md)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yayoboy/oxide'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

configuration.md•20.7 KiB