Skip to main content
Glama

πŸ”¬ Oxide - Intelligent LLM Orchestrator

CI Security Deploy Documentation License Python

Intelligent routing and orchestration for distributed AI resources

Oxide is a comprehensive platform for managing and orchestrating multiple Large Language Model (LLM) services. It intelligently routes tasks to the most appropriate LLM based on task characteristics, provides a web dashboard for monitoring and management, and integrates seamlessly with Claude Code via Model Context Protocol (MCP).

✨ Features

🎯 Intelligent Task Routing

  • Automatic Service Selection: Analyzes task type, complexity, and file count to choose the optimal LLM

  • Custom Routing Rules: Configure permanent task-to-service assignments via Web UI

  • Fallback Support: Automatic failover to alternative services if primary is unavailable

  • Parallel Execution: Distribute large codebase analysis across multiple LLMs

  • Manual Override: Select specific services for individual tasks

πŸš€ Local LLM Management (NEW!)

  • Auto-Start Ollama: Automatically starts Ollama if not running (macOS, Linux, Windows)

  • Auto-Detect Models: Discovers available models without manual configuration

  • Smart Model Selection: Chooses best model based on preferences and availability

  • Auto-Recovery: Retries with service restart on temporary failures

  • Zero-Config LM Studio: Works with LM Studio without model name configuration

🌐 Web Dashboard

  • Real-time Monitoring: Live metrics for CPU, memory, task execution, and service health

  • Task Executor: Execute tasks directly from the browser with service selection

  • Task Assignment Manager: Configure which LLM handles specific task types

  • Task History: Complete history of all executed tasks with results and metrics

  • WebSocket Support: Real-time updates for task progress and system events

  • Service Management: Monitor and test all configured LLM services

πŸ”Œ MCP Integration

  • Claude Code Integration: Use Oxide directly within Claude Code

  • Three MCP Tools:

    • route_task - Execute tasks with intelligent routing

    • analyze_parallel - Parallel codebase analysis

    • list_services - Check service health and availability

  • Persistent Task Storage: All tasks saved to ~/.oxide/tasks.json

  • Auto-start Web UI: Optional automatic Web UI launch with MCP server

πŸ›‘οΈ Process Management

  • Automatic Cleanup: All spawned processes (Web UI, Gemini, Qwen, etc.) cleaned up on exit

  • Signal Handlers: Graceful shutdown on SIGTERM/SIGINT

  • Process Registry: Tracks all child processes for guaranteed cleanup

  • No Orphaned Processes: Ensures clean system state even on forced termination

πŸ“Š Supported LLM Services

  • Google Gemini (CLI) - 2M+ token context window, ideal for large codebases

  • Qwen (CLI) - Optimized for code generation and review

  • Ollama (HTTP) - Local and remote instances

  • Extensible: Easy to add new LLM adapters

πŸš€ Quick Start

Prerequisites

  • Python 3.11+

  • uv package manager

  • Node.js 18+ (for Web UI)

  • Gemini CLI (optional)

  • Qwen CLI (optional)

  • Ollama (optional)

Installation

# Clone the repository cd /Users/yayoboy/Documents/GitHub/oxide # Install dependencies uv sync # Build the Web UI cd src/oxide/web/frontend npm install npm run build cd ../../.. # Verify installation uv run oxide-mcp --help

Configuration

Edit config/default.yaml:

services: gemini: enabled: true type: cli executable: gemini qwen: enabled: true type: cli executable: qwen ollama_local: enabled: true type: http base_url: http://localhost:11434 model: qwen2.5-coder:7b default_model: qwen2.5-coder:7b ollama_remote: enabled: false type: http base_url: http://192.168.1.46:11434 model: qwen2.5-coder:7b routing_rules: prefer_local: true fallback_enabled: true execution: timeout_seconds: 120 max_retries: 2 retry_on_failure: true max_parallel_workers: 3 logging: level: INFO console: true file: oxide.log

πŸ“– Usage

Option 1: MCP with Claude Code

  1. Configure Claude Code

Add to ~/.claude/settings.json:

{ "mcpServers": { "oxide": { "command": "uv", "args": ["--directory", "/Users/yayoboy/Documents/GitHub/oxide", "run", "oxide-mcp"], "env": { "OXIDE_AUTO_START_WEB": "true" } } } }

Setting OXIDE_AUTO_START_WEB=true automatically starts the Web UI at http://localhost:8000

  1. Use in Claude Code

Claude will automatically use Oxide MCP tools:

You: "Analyze this codebase for architecture patterns" Claude: Uses Oxide to route to Gemini (large context) You: "Review this function for bugs" Claude: Uses Oxide to route to Qwen (code specialist) You: "What is 2+2?" Claude: Uses Oxide to route to Ollama Local (quick query)

Option 2: Web Dashboard

  1. Start the Web UI

# Option A: Use the startup script ./scripts/start_web_ui.sh # Option B: Manual start python -m uvicorn oxide.web.backend.main:app --host 0.0.0.0 --port 8000 # Option C: Auto-start with MCP (set OXIDE_AUTO_START_WEB=true) uv run oxide-mcp
  1. Access the Dashboard

Open http://localhost:8000 in your browser

Option 3: Python API

from oxide.core.orchestrator import Orchestrator from oxide.config.loader import load_config # Initialize config = load_config() orchestrator = Orchestrator(config) # Execute a task with intelligent routing async for chunk in orchestrator.execute_task( prompt="Explain quantum computing", files=None, preferences=None # Let Oxide choose ): print(chunk, end="") # Execute with manual service selection async for chunk in orchestrator.execute_task( prompt="Review this code", files=["src/main.py"], preferences={"preferred_service": "qwen"} ): print(chunk, end="")

πŸ—οΈ Architecture

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Oxide Orchestrator β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Classifier │──▢│ Router │──▢│ Adapters β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Task Analysis Route Decision LLM Execution β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Process Manager - Lifecycle Management β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Task Storage - Persistent History β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Routing Rules - Custom Assignments β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MCP β”‚ β”‚ Web UI β”‚ β”‚ Python β”‚ β”‚ Server β”‚ β”‚ Backend β”‚ β”‚ API β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

1. Task Classifier (src/oxide/core/classifier.py)

Analyzes tasks to determine:

  • Task type (coding, review, codebase_analysis, etc.)

  • Complexity score based on keywords and patterns

  • File count and total size

  • Whether parallel execution is beneficial

Task Types:

  • coding - Code generation

  • code_review - Code review

  • bug_search - Bug analysis

  • refactoring - Code refactoring

  • documentation - Writing docs

  • codebase_analysis - Large codebase analysis

  • quick_query - Simple questions

  • general - General purpose

2. Task Router (src/oxide/core/router.py)

Routes tasks based on:

  • Task classification results

  • Custom routing rules (user-defined permanent assignments)

  • Service health status and availability

  • Fallback preferences and retry logic

3. Adapters (src/oxide/adapters/)

Unified interface for different LLM types:

  • CLI Adapters (cli_adapter.py):

    • Gemini (gemini.py) - Subprocess execution, 2M+ context

    • Qwen (qwen.py) - Code specialist

    • Automatic process tracking and cleanup

  • HTTP Adapters (ollama_http.py):

    • Ollama Local/Remote - REST API communication

    • Streaming support

    • Health checks

All adapters implement:

  • execute() - Task execution with streaming

  • health_check() - Service availability check

  • get_service_info() - Service metadata

4. Task Storage (src/oxide/utils/task_storage.py)

Persistent task history management:

  • Storage: ~/.oxide/tasks.json

  • Thread-safe: Concurrent read/write support

  • Tracked data:

    • Task ID, status, timestamps

    • Prompt, files, preferences

    • Service used, task type

    • Result, error, duration

  • Features:

    • List/filter tasks by status

    • Get statistics (by service, by type, by status)

    • Clear tasks (all or by status)

5. Process Manager (src/oxide/utils/process_manager.py)

Lifecycle management for all spawned processes:

  • Tracks: Web UI server, CLI processes (Gemini, Qwen)

  • Signal handlers: SIGTERM, SIGINT, SIGHUP

  • Cleanup: Automatic on exit (graceful β†’ force kill)

  • Safety: Prevents orphaned processes

  • atexit hook: Final cleanup guarantee

6. Routing Rules Manager (src/oxide/utils/routing_rules.py)

User-defined task-to-service assignments:

  • Storage: ~/.oxide/routing_rules.json

  • Format: {"task_type": "service_name"}

  • Example:

    { "coding": "qwen", "code_review": "gemini", "bug_search": "qwen", "quick_query": "ollama_local" }
  • Priority: Custom rules override intelligent routing

🎨 Web UI Features

Dashboard Sections

1. System Metrics (Real-time)

  • Services: Total, enabled, healthy, unhealthy

  • Tasks: Running, completed, failed, queued

  • System: CPU %, Memory % and usage

  • WebSocket: Active connections

  • Auto-refresh every 2 seconds

2. Task Executor πŸš€

Execute tasks directly from the browser:

  • Prompt input: Multi-line text area

  • Service selection:

    • πŸ€– Auto (Intelligent Routing) - Let Oxide choose

    • Manual - Select specific service (gemini, qwen, ollama, etc.)

  • Real-time streaming: See results as they appear

  • Error handling: Clear error messages

  • Integration: Tasks appear immediately in history

3. LLM Services

Service cards showing:

  • Status: βœ… Healthy / ⚠️ Unavailable / ❌ Disabled

  • Type: CLI or HTTP

  • Description: Service capabilities

  • Details: Base URL (HTTP), executable (CLI)

  • Context: Max tokens (Gemini: 2M+)

4. Task Assignment Manager βš™οΈ ⭐ NEW

Configure permanent task-to-service assignments:

Interface:

  • Add Rule Form:

    • Dropdown: Select task type (coding, review, etc.)

    • Dropdown: Select service (qwen, gemini, ollama)

    • Button: Add Rule

  • Active Rules Table:

    • Task Type | Assigned Service | Description | Actions

    • Delete individual rules

    • Clear all rules

Available Task Types:

  • coding β†’ Code Generation β†’ Recommended: qwen, gemini

  • code_review β†’ Code Review β†’ Recommended: qwen, gemini

  • bug_search β†’ Bug Search β†’ Recommended: qwen, gemini

  • refactoring β†’ Code Refactoring β†’ Recommended: qwen, gemini

  • documentation β†’ Documentation β†’ Recommended: gemini, qwen

  • codebase_analysis β†’ Large Codebase β†’ Recommended: gemini

  • quick_query β†’ Simple Questions β†’ Recommended: ollama_local

  • general β†’ General Purpose β†’ Recommended: ollama_local, qwen

Example Configuration:

coding β†’ qwen (All code generation to qwen) code_review β†’ gemini (All reviews to gemini) bug_search β†’ qwen (Bug analysis to qwen) quick_query β†’ ollama (Fast queries to local ollama)

When a task matches a rule, it's always routed to the assigned service, bypassing intelligent routing.

5. Task History πŸ“

Complete history of all executed tasks:

  • From all sources: MCP, Web UI, Python API

  • Auto-refresh: Every 3 seconds

  • Display:

    • Status badge (completed, running, failed, queued)

    • Timestamp, duration

    • Prompt preview (first 150 chars)

    • Service used, task type

    • File count

    • Error messages (if failed)

    • Result preview (first 200 chars)

  • Limit: Latest 10 tasks by default

6. Live Updates πŸ””

WebSocket event stream:

  • Real-time task progress

  • Service status changes

  • System events

πŸ“‘ API Reference

REST API

Base URL: http://localhost:8000/api

Tasks Endpoints

Execute Task

POST /api/tasks/execute Content-Type: application/json { "prompt": "Your query here", "files": ["path/to/file.py"], "preferences": { "preferred_service": "qwen" } } Response: {"task_id": "...", "status": "queued", "message": "..."}

List Tasks

GET /api/tasks/?limit=10&status=completed Response: { "tasks": [...], "total": 42, "filtered": 10 }

Get Task

GET /api/tasks/{task_id} Response: { "id": "...", "status": "completed", "prompt": "...", "result": "...", "duration": 5.23, ... }

Delete Task

DELETE /api/tasks/{task_id}

Clear Tasks

POST /api/tasks/clear?status=completed

Services Endpoints

List Services

GET /api/services/ Response: { "services": { "gemini": {"enabled": true, "healthy": true, ...}, ... }, "total": 4, "enabled": 3 }

Get Service

GET /api/services/{service_name}

Health Check

POST /api/services/{service_name}/health

Test Service

POST /api/services/{service_name}/test?test_prompt=Hello

Routing Rules Endpoints ⭐ NEW

List All Rules

GET /api/routing/rules Response: { "rules": [ {"task_type": "coding", "service": "qwen"}, ... ], "stats": { "total_rules": 3, "rules_by_service": {"qwen": 2, "gemini": 1}, "task_types": ["coding", "code_review", "bug_search"] } }

Get Rule

GET /api/routing/rules/{task_type}

Create/Update Rule

POST /api/routing/rules Content-Type: application/json { "task_type": "coding", "service": "qwen" } Response: { "message": "Routing rule updated", "rule": {"task_type": "coding", "service": "qwen"} }

Update Rule

PUT /api/routing/rules/{task_type} Content-Type: application/json { "task_type": "coding", "service": "gemini" }

Delete Rule

DELETE /api/routing/rules/{task_type}

Clear All Rules

POST /api/routing/rules/clear

Get Available Task Types

GET /api/routing/task-types Response: { "task_types": [ { "name": "coding", "label": "Code Generation", "description": "Writing new code, implementing features", "recommended_services": ["qwen", "gemini"] }, ... ] }

Monitoring Endpoints

Get Metrics

GET /api/monitoring/metrics Response: { "services": {"total": 4, "enabled": 3, "healthy": 2, ...}, "tasks": {"total": 10, "running": 0, "completed": 8, ...}, "system": {"cpu_percent": 25.3, "memory_percent": 45.7, ...}, "websocket": {"connections": 1}, "timestamp": 1234567890.123 }

Get Stats

GET /api/monitoring/stats Response: { "total_tasks": 42, "avg_duration": 5.67, "success_rate": 95.24, "tasks_by_status": {"completed": 40, "failed": 2} }

Health Check

GET /api/monitoring/health Response: { "status": "healthy", "healthy": true, "issues": [], "cpu_percent": 25.3, "memory_percent": 45.7 }

WebSocket API

Connect to ws://localhost:8000/ws for real-time updates.

Message Types:

  1. task_start

{ "type": "task_start", "task_id": "...", "task_type": "coding", "service": "qwen" }
  1. task_progress (streaming)

{ "type": "task_progress", "task_id": "...", "chunk": "Here is the code..." }
  1. task_complete

{ "type": "task_complete", "task_id": "...", "success": true, "duration": 5.23 }

Client Usage:

const ws = new WebSocket('ws://localhost:8000/ws'); ws.onmessage = (event) => { const data = JSON.parse(event.data); if (data.type === 'task_progress') { console.log(data.chunk); } }; // Keep-alive ping setInterval(() => ws.send('ping'), 30000);

πŸ”§ Development

Project Structure

oxide/ β”œβ”€β”€ config/ β”‚ └── default.yaml # Main configuration β”œβ”€β”€ src/oxide/ β”‚ β”œβ”€β”€ core/ β”‚ β”‚ β”œβ”€β”€ classifier.py # Task classification β”‚ β”‚ β”œβ”€β”€ router.py # Routing logic β”‚ β”‚ └── orchestrator.py # Main orchestrator β”‚ β”œβ”€β”€ adapters/ β”‚ β”‚ β”œβ”€β”€ base.py # Base adapter interface β”‚ β”‚ β”œβ”€β”€ cli_adapter.py # CLI adapter base β”‚ β”‚ β”œβ”€β”€ gemini.py # Gemini adapter β”‚ β”‚ β”œβ”€β”€ qwen.py # Qwen adapter β”‚ β”‚ └── ollama_http.py # Ollama HTTP adapter β”‚ β”œβ”€β”€ execution/ β”‚ β”‚ └── parallel.py # Parallel execution engine β”‚ β”œβ”€β”€ utils/ β”‚ β”‚ β”œβ”€β”€ task_storage.py # Task persistence β”‚ β”‚ β”œβ”€β”€ routing_rules.py # Routing rules storage β”‚ β”‚ β”œβ”€β”€ process_manager.py # Process lifecycle β”‚ β”‚ β”œβ”€β”€ logging.py # Logging utilities β”‚ β”‚ └── exceptions.py # Custom exceptions β”‚ β”œβ”€β”€ mcp/ β”‚ β”‚ β”œβ”€β”€ server.py # MCP server (FastMCP) β”‚ β”‚ └── tools.py # MCP tool definitions β”‚ └── web/ β”‚ β”œβ”€β”€ backend/ β”‚ β”‚ β”œβ”€β”€ main.py # FastAPI application β”‚ β”‚ β”œβ”€β”€ websocket.py # WebSocket manager β”‚ β”‚ └── routes/ β”‚ β”‚ β”œβ”€β”€ tasks.py # Task endpoints β”‚ β”‚ β”œβ”€β”€ services.py # Service endpoints β”‚ β”‚ β”œβ”€β”€ routing.py # Routing rules endpoints β”‚ β”‚ └── monitoring.py # Monitoring endpoints β”‚ └── frontend/ # React SPA β”‚ β”œβ”€β”€ src/ β”‚ β”‚ β”œβ”€β”€ components/ β”‚ β”‚ β”‚ β”œβ”€β”€ TaskExecutor.jsx β”‚ β”‚ β”‚ β”œβ”€β”€ TaskAssignmentManager.jsx β”‚ β”‚ β”‚ β”œβ”€β”€ TaskHistory.jsx β”‚ β”‚ β”‚ β”œβ”€β”€ ServiceCard.jsx β”‚ β”‚ β”‚ └── MetricsDashboard.jsx β”‚ β”‚ β”œβ”€β”€ hooks/ β”‚ β”‚ β”‚ β”œβ”€β”€ useServices.js β”‚ β”‚ β”‚ β”œβ”€β”€ useMetrics.js β”‚ β”‚ β”‚ └── useWebSocket.js β”‚ β”‚ β”œβ”€β”€ api/ β”‚ β”‚ β”‚ └── client.js β”‚ β”‚ └── App.jsx β”‚ β”œβ”€β”€ package.json β”‚ └── vite.config.js β”œβ”€β”€ tests/ β”‚ β”œβ”€β”€ test_process_cleanup.py β”‚ └── test_task_history_integration.py └── scripts/ └── start_web_ui.sh

Running Tests

# Process cleanup tests python3 tests/test_process_cleanup.py # Task history integration tests python3 tests/test_task_history_integration.py # All tests pass # βœ“ Sync process cleanup # βœ“ Async process cleanup # βœ“ Multiple process cleanup # βœ“ Signal handler cleanup # βœ“ Task storage integration

Adding a New LLM Adapter

  1. Create adapter class

# src/oxide/adapters/my_llm.py from .base import BaseAdapter from typing import AsyncIterator, List, Optional class MyLLMAdapter(BaseAdapter): def __init__(self, config: dict): super().__init__("my_llm", config) self.api_key = config.get("api_key") # Initialize your client... async def execute( self, prompt: str, files: Optional[List[str]] = None, timeout: Optional[int] = None, **kwargs ) -> AsyncIterator[str]: """Execute task and stream results.""" # Your implementation yield "Response chunk" async def health_check(self) -> bool: """Check if service is available.""" # Your health check logic return True def get_service_info(self) -> dict: """Return service metadata.""" info = super().get_service_info() info.update({ "description": "My LLM Service", "max_tokens": 100000 }) return info
  1. Register in configuration

# config/default.yaml services: my_llm: enabled: true type: http # or 'cli' base_url: http://localhost:8080 model: my-model api_key: ${MY_LLM_API_KEY} # From environment
  1. Update orchestrator

# src/oxide/core/orchestrator.py def _create_adapter(self, service_name, config): service_type = config.get("type") if service_type == "cli": if "my_llm" in service_name: from ..adapters.my_llm import MyLLMAdapter return MyLLMAdapter(config) # ... other CLI adapters elif service_type == "http": if "my_llm" in service_name: from ..adapters.my_llm import MyLLMAdapter return MyLLMAdapter(config) # ... other HTTP adapters
  1. Test your adapter

import asyncio from oxide.core.orchestrator import Orchestrator from oxide.config.loader import load_config async def test(): config = load_config() orchestrator = Orchestrator(config) async for chunk in orchestrator.execute_task( prompt="Test query", preferences={"preferred_service": "my_llm"} ): print(chunk, end="") asyncio.run(test())

πŸ“Š Storage Files

Oxide creates the following files in ~/.oxide/:

  • tasks.json - Task execution history (all tasks from all sources)

  • routing_rules.json - Custom routing rules (task type β†’ service)

  • oxide.log - Application logs (if file logging enabled)

Example

{ "task-uuid-1": { "id": "task-uuid-1", "status": "completed", "prompt": "What is quantum computing?", "files": [], "service": "ollama_local", "task_type": "quick_query", "result": "Quantum computing is...", "error": null, "created_at": 1234567890.123, "started_at": 1234567890.456, "completed_at": 1234567895.789, "duration": 5.333 } }

Example

{ "coding": "qwen", "code_review": "gemini", "bug_search": "qwen", "quick_query": "ollama_local" }

🎯 Local LLM Management

Auto-Start Ollama

Oxide can automatically start Ollama if it's not running:

# config/default.yaml services: ollama_local: type: http base_url: "http://localhost:11434" api_type: ollama enabled: true auto_start: true # πŸ”₯ Auto-start if not running auto_detect_model: true # πŸ”₯ Auto-detect best model max_retries: 2 # Retry on failures retry_delay: 2 # Seconds between retries

What happens:

  1. First task execution checks if Ollama is running

  2. If not, automatically starts Ollama via:

    • macOS: Opens Ollama.app or runs ollama serve

    • Linux: Uses systemd or runs ollama serve

    • Windows: Runs ollama serve as detached process

  3. Waits up to 30s for Ollama to be ready

  4. Proceeds with task execution

Auto-Detect Models

No need to configure model names manually:

lmstudio: type: http base_url: "http://192.168.1.33:1234/v1" api_type: openai_compatible enabled: true default_model: null # πŸ”₯ Will auto-detect auto_detect_model: true preferred_models: # Priority order - "qwen" # Matches: qwen/qwen2.5-coder-14b - "coder" # Matches: mistralai/codestral-22b - "deepseek" # Matches: deepseek/deepseek-r1

Smart Selection Algorithm:

  1. Fetches available models from service

  2. Tries exact match with preferred models

  3. Tries partial match (e.g., "qwen" matches "qwen2.5-coder:7b")

  4. Falls back to first available model

Service Health Monitoring

from oxide.utils.service_manager import get_service_manager service_manager = get_service_manager() # Comprehensive health check with auto-recovery health = await service_manager.ensure_service_healthy( service_name="ollama_local", base_url="http://localhost:11434", api_type="ollama", auto_start=True, # Try to start if down auto_detect_model=True # Detect available models ) print(f"Healthy: {health['healthy']}") print(f"Models: {health['models']}") print(f"Recommended: {health['recommended_model']}")

Background Health Monitoring

# Start monitoring (checks every 60s, auto-recovers on failure) await service_manager.start_health_monitoring( service_name="ollama_local", base_url="http://localhost:11434", interval=60, auto_recovery=True )

🎯 Usage Examples

Example 1: Simple Query (Auto-Start Enabled)

# Ollama will auto-start if not running! async for chunk in orchestrator.execute_task("What is 2+2?"): print(chunk, end="") # What happens: # 1. Checks if Ollama is running β†’ not running # 2. Auto-starts Ollama (takes ~5s) # 3. Auto-detects model: qwen2.5-coder:7b # 4. Executes task # 5. Returns: "4"

Example 2: Code Review with Manual Selection

async for chunk in orchestrator.execute_task( prompt="Review this code for bugs", files=["src/auth.py"], preferences={"preferred_service": "gemini"} ): print(chunk, end="") # Forces routing to: gemini # Gets large context window for thorough review

Example 3: Large Codebase Analysis

# Parallel analysis from oxide.execution.parallel import ParallelExecutor executor = ParallelExecutor(max_workers=3) result = await executor.execute_parallel( prompt="Analyze architecture patterns", files=["src/**/*.py"], # 50+ files services=["gemini", "qwen", "ollama_local"], strategy="split" ) print(f"Completed in {result.total_duration_seconds}s") print(result.aggregated_text)

Example 4: Using Routing Rules

# Set up rules via API import requests requests.post("http://localhost:8000/api/routing/rules", json={ "task_type": "coding", "service": "qwen" }) # Now all coding tasks go to qwen automatically async for chunk in orchestrator.execute_task("Write a Python function to sort a list"): print(chunk, end="") # Routes to: qwen (custom rule)

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository

  2. Create a feature branch (git checkout -b feature/amazing-feature)

  3. Make your changes

  4. Add tests if applicable

  5. Update documentation

  6. Commit your changes (git commit -m 'Add amazing feature')

  7. Push to the branch (git push origin feature/amazing-feature)

  8. Open a Pull Request

Development Setup

# Clone your fork git clone https://github.com/yourusername/oxide.git cd oxide # Install dev dependencies uv sync # Install frontend dependencies cd src/oxide/web/frontend npm install cd ../../.. # Run tests python3 tests/test_process_cleanup.py python3 tests/test_task_history_integration.py # Start development servers python -m uvicorn oxide.web.backend.main:app --reload & cd src/oxide/web/frontend && npm run dev

πŸ“ License

MIT License - Copyright (c) 2025 yayoboy

See LICENSE file for details.

πŸ‘₯ Authors

πŸ™ Acknowledgments

  • Built with FastAPI - Modern Python web framework

  • React dashboard using Vite - Lightning-fast frontend tooling

  • MCP integration via Model Context Protocol

  • Process management inspired by supervisor and systemd patterns

  • WebSocket support via FastAPI WebSockets

  • Task classification inspired by semantic analysis techniques

πŸ“§ Support

For issues, questions, or suggestions:

πŸ—ΊοΈ Roadmap

v0.2.0 (Planned)

  • SQLite database for task storage

  • Advanced metrics and analytics

  • Cost tracking per service

  • Rate limiting and quotas

  • Multi-user support

  • Docker deployment

v0.3.0 (Future)

  • Plugin system for custom adapters

  • Workflow automation (task chains)

  • A/B testing framework

  • Performance benchmarking suite

  • Auto-scaling for parallel execution

πŸ“Š Project Status

Version: 0.1.0 Status: βœ… Production Ready - MVP Complete!

Completed Features

  • Project structure and dependencies

  • Configuration system

  • Task classifier

  • Task router with fallbacks

  • Adapter implementations (Gemini, Qwen, Ollama)

  • MCP server integration

  • Web UI dashboard (React + FastAPI)

  • Real-time monitoring and WebSocket

  • Task executor in Web UI

  • Task assignment manager (routing rules UI)

  • Persistent task storage

  • Process lifecycle management

  • Test suite (process cleanup, task storage)

  • Comprehensive documentation

In Progress

  • Production deployment guides

  • Docker containerization

  • Extended test coverage


Built with ❀️ for intelligent LLM orchestration

Last Updated: December 2025

Install Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yayoboy/oxide'

If you have feedback or need assistance with the MCP directory API, please join our Discord server