Skip to main content
Glama

πŸ”¬ Oxide - Intelligent LLM Orchestrator

CI Security Deploy Documentation License Python

Intelligent routing and orchestration for distributed AI resources

Oxide is a comprehensive platform for managing and orchestrating multiple Large Language Model (LLM) services. It intelligently routes tasks to the most appropriate LLM based on task characteristics, provides a web dashboard for monitoring and management, and integrates seamlessly with Claude Code via Model Context Protocol (MCP).

✨ Features

🎯 Intelligent Task Routing

  • Automatic Service Selection: Analyzes task type, complexity, and file count to choose the optimal LLM

  • Custom Routing Rules: Configure permanent task-to-service assignments via Web UI

  • Fallback Support: Automatic failover to alternative services if primary is unavailable

  • Parallel Execution: Distribute large codebase analysis across multiple LLMs

  • Manual Override: Select specific services for individual tasks

πŸš€ Local LLM Management (NEW!)

  • Auto-Start Ollama: Automatically starts Ollama if not running (macOS, Linux, Windows)

  • Auto-Detect Models: Discovers available models without manual configuration

  • Smart Model Selection: Chooses best model based on preferences and availability

  • Auto-Recovery: Retries with service restart on temporary failures

  • Zero-Config LM Studio: Works with LM Studio without model name configuration

🌐 Web Dashboard

  • Real-time Monitoring: Live metrics for CPU, memory, task execution, and service health

  • Task Executor: Execute tasks directly from the browser with service selection

  • Task Assignment Manager: Configure which LLM handles specific task types

  • Task History: Complete history of all executed tasks with results and metrics

  • WebSocket Support: Real-time updates for task progress and system events

  • Service Management: Monitor and test all configured LLM services

πŸ”Œ MCP Integration

  • Claude Code Integration: Use Oxide directly within Claude Code

  • Three MCP Tools:

    • route_task - Execute tasks with intelligent routing

    • analyze_parallel - Parallel codebase analysis

    • list_services - Check service health and availability

  • Persistent Task Storage: All tasks saved to ~/.oxide/tasks.json

  • Auto-start Web UI: Optional automatic Web UI launch with MCP server

πŸ›‘οΈ Process Management

  • Automatic Cleanup: All spawned processes (Web UI, Gemini, Qwen, etc.) cleaned up on exit

  • Signal Handlers: Graceful shutdown on SIGTERM/SIGINT

  • Process Registry: Tracks all child processes for guaranteed cleanup

  • No Orphaned Processes: Ensures clean system state even on forced termination

πŸ“Š Supported LLM Services

  • Google Gemini (CLI) - 2M+ token context window, ideal for large codebases

  • Qwen (CLI) - Optimized for code generation and review

  • Ollama (HTTP) - Local and remote instances

  • Extensible: Easy to add new LLM adapters

πŸš€ Quick Start

Prerequisites

  • Python 3.11+

  • uv package manager

  • Node.js 18+ (for Web UI)

  • Gemini CLI (optional)

  • Qwen CLI (optional)

  • Ollama (optional)

Installation

# Clone the repository
cd /Users/yayoboy/Documents/GitHub/oxide

# Install dependencies
uv sync

# Build the Web UI
cd src/oxide/web/frontend
npm install
npm run build
cd ../../..

# Verify installation
uv run oxide-mcp --help

Configuration

Edit config/default.yaml:

services:
  gemini:
    enabled: true
    type: cli
    executable: gemini

  qwen:
    enabled: true
    type: cli
    executable: qwen

  ollama_local:
    enabled: true
    type: http
    base_url: http://localhost:11434
    model: qwen2.5-coder:7b
    default_model: qwen2.5-coder:7b

  ollama_remote:
    enabled: false
    type: http
    base_url: http://192.168.1.46:11434
    model: qwen2.5-coder:7b

routing_rules:
  prefer_local: true
  fallback_enabled: true

execution:
  timeout_seconds: 120
  max_retries: 2
  retry_on_failure: true
  max_parallel_workers: 3

logging:
  level: INFO
  console: true
  file: oxide.log

πŸ“– Usage

Option 1: MCP with Claude Code

  1. Configure Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "oxide": {
      "command": "uv",
      "args": ["--directory", "/Users/yayoboy/Documents/GitHub/oxide", "run", "oxide-mcp"],
      "env": {
        "OXIDE_AUTO_START_WEB": "true"
      }
    }
  }
}

Setting OXIDE_AUTO_START_WEB=true automatically starts the Web UI at http://localhost:8000

  1. Use in Claude Code

Claude will automatically use Oxide MCP tools:

You: "Analyze this codebase for architecture patterns"
Claude: Uses Oxide to route to Gemini (large context)

You: "Review this function for bugs"
Claude: Uses Oxide to route to Qwen (code specialist)

You: "What is 2+2?"
Claude: Uses Oxide to route to Ollama Local (quick query)

Option 2: Web Dashboard

  1. Start the Web UI

# Option A: Use the startup script
./scripts/start_web_ui.sh

# Option B: Manual start
python -m uvicorn oxide.web.backend.main:app --host 0.0.0.0 --port 8000

# Option C: Auto-start with MCP (set OXIDE_AUTO_START_WEB=true)
uv run oxide-mcp
  1. Access the Dashboard

Open http://localhost:8000 in your browser

Option 3: Python API

from oxide.core.orchestrator import Orchestrator
from oxide.config.loader import load_config

# Initialize
config = load_config()
orchestrator = Orchestrator(config)

# Execute a task with intelligent routing
async for chunk in orchestrator.execute_task(
    prompt="Explain quantum computing",
    files=None,
    preferences=None  # Let Oxide choose
):
    print(chunk, end="")

# Execute with manual service selection
async for chunk in orchestrator.execute_task(
    prompt="Review this code",
    files=["src/main.py"],
    preferences={"preferred_service": "qwen"}
):
    print(chunk, end="")

πŸ—οΈ Architecture

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Oxide Orchestrator                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Classifier  │──▢│    Router    │──▢│   Adapters   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚         β”‚                   β”‚                    β”‚         β”‚
β”‚         β”‚                   β”‚                    β”‚         β”‚
β”‚    Task Analysis      Route Decision       LLM Execution  β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Process Manager - Lifecycle Management              β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Task Storage - Persistent History                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Routing Rules - Custom Assignments                  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                  β”‚                   β”‚
           β–Ό                  β–Ό                   β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  MCP      β”‚      β”‚  Web UI   β”‚      β”‚ Python   β”‚
    β”‚  Server   β”‚      β”‚  Backend  β”‚      β”‚ API      β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

1. Task Classifier (src/oxide/core/classifier.py)

Analyzes tasks to determine:

  • Task type (coding, review, codebase_analysis, etc.)

  • Complexity score based on keywords and patterns

  • File count and total size

  • Whether parallel execution is beneficial

Task Types:

  • coding - Code generation

  • code_review - Code review

  • bug_search - Bug analysis

  • refactoring - Code refactoring

  • documentation - Writing docs

  • codebase_analysis - Large codebase analysis

  • quick_query - Simple questions

  • general - General purpose

2. Task Router (src/oxide/core/router.py)

Routes tasks based on:

  • Task classification results

  • Custom routing rules (user-defined permanent assignments)

  • Service health status and availability

  • Fallback preferences and retry logic

3. Adapters (src/oxide/adapters/)

Unified interface for different LLM types:

  • CLI Adapters (cli_adapter.py):

    • Gemini (gemini.py) - Subprocess execution, 2M+ context

    • Qwen (qwen.py) - Code specialist

    • Automatic process tracking and cleanup

  • HTTP Adapters (ollama_http.py):

    • Ollama Local/Remote - REST API communication

    • Streaming support

    • Health checks

All adapters implement:

  • execute() - Task execution with streaming

  • health_check() - Service availability check

  • get_service_info() - Service metadata

4. Task Storage (src/oxide/utils/task_storage.py)

Persistent task history management:

  • Storage: ~/.oxide/tasks.json

  • Thread-safe: Concurrent read/write support

  • Tracked data:

    • Task ID, status, timestamps

    • Prompt, files, preferences

    • Service used, task type

    • Result, error, duration

  • Features:

    • List/filter tasks by status

    • Get statistics (by service, by type, by status)

    • Clear tasks (all or by status)

5. Process Manager (src/oxide/utils/process_manager.py)

Lifecycle management for all spawned processes:

  • Tracks: Web UI server, CLI processes (Gemini, Qwen)

  • Signal handlers: SIGTERM, SIGINT, SIGHUP

  • Cleanup: Automatic on exit (graceful β†’ force kill)

  • Safety: Prevents orphaned processes

  • atexit hook: Final cleanup guarantee

6. Routing Rules Manager (src/oxide/utils/routing_rules.py)

User-defined task-to-service assignments:

  • Storage: ~/.oxide/routing_rules.json

  • Format: {"task_type": "service_name"}

  • Example:

    {
      "coding": "qwen",
      "code_review": "gemini",
      "bug_search": "qwen",
      "quick_query": "ollama_local"
    }
  • Priority: Custom rules override intelligent routing

🎨 Web UI Features

Dashboard Sections

1. System Metrics (Real-time)

  • Services: Total, enabled, healthy, unhealthy

  • Tasks: Running, completed, failed, queued

  • System: CPU %, Memory % and usage

  • WebSocket: Active connections

  • Auto-refresh every 2 seconds

2. Task Executor πŸš€

Execute tasks directly from the browser:

  • Prompt input: Multi-line text area

  • Service selection:

    • πŸ€– Auto (Intelligent Routing) - Let Oxide choose

    • Manual - Select specific service (gemini, qwen, ollama, etc.)

  • Real-time streaming: See results as they appear

  • Error handling: Clear error messages

  • Integration: Tasks appear immediately in history

3. LLM Services

Service cards showing:

  • Status: βœ… Healthy / ⚠️ Unavailable / ❌ Disabled

  • Type: CLI or HTTP

  • Description: Service capabilities

  • Details: Base URL (HTTP), executable (CLI)

  • Context: Max tokens (Gemini: 2M+)

4. Task Assignment Manager βš™οΈ ⭐ NEW

Configure permanent task-to-service assignments:

Interface:

  • Add Rule Form:

    • Dropdown: Select task type (coding, review, etc.)

    • Dropdown: Select service (qwen, gemini, ollama)

    • Button: Add Rule

  • Active Rules Table:

    • Task Type | Assigned Service | Description | Actions

    • Delete individual rules

    • Clear all rules

Available Task Types:

  • coding β†’ Code Generation β†’ Recommended: qwen, gemini

  • code_review β†’ Code Review β†’ Recommended: qwen, gemini

  • bug_search β†’ Bug Search β†’ Recommended: qwen, gemini

  • refactoring β†’ Code Refactoring β†’ Recommended: qwen, gemini

  • documentation β†’ Documentation β†’ Recommended: gemini, qwen

  • codebase_analysis β†’ Large Codebase β†’ Recommended: gemini

  • quick_query β†’ Simple Questions β†’ Recommended: ollama_local

  • general β†’ General Purpose β†’ Recommended: ollama_local, qwen

Example Configuration:

coding β†’ qwen           (All code generation to qwen)
code_review β†’ gemini    (All reviews to gemini)
bug_search β†’ qwen       (Bug analysis to qwen)
quick_query β†’ ollama    (Fast queries to local ollama)

When a task matches a rule, it's always routed to the assigned service, bypassing intelligent routing.

5. Task History πŸ“

Complete history of all executed tasks:

  • From all sources: MCP, Web UI, Python API

  • Auto-refresh: Every 3 seconds

  • Display:

    • Status badge (completed, running, failed, queued)

    • Timestamp, duration

    • Prompt preview (first 150 chars)

    • Service used, task type

    • File count

    • Error messages (if failed)

    • Result preview (first 200 chars)

  • Limit: Latest 10 tasks by default

6. Live Updates πŸ””

WebSocket event stream:

  • Real-time task progress

  • Service status changes

  • System events

πŸ“‘ API Reference

REST API

Base URL: http://localhost:8000/api

Tasks Endpoints

Execute Task

POST /api/tasks/execute
Content-Type: application/json

{
  "prompt": "Your query here",
  "files": ["path/to/file.py"],
  "preferences": {
    "preferred_service": "qwen"
  }
}

Response: {"task_id": "...", "status": "queued", "message": "..."}

List Tasks

GET /api/tasks/?limit=10&status=completed

Response: {
  "tasks": [...],
  "total": 42,
  "filtered": 10
}

Get Task

GET /api/tasks/{task_id}

Response: {
  "id": "...",
  "status": "completed",
  "prompt": "...",
  "result": "...",
  "duration": 5.23,
  ...
}

Delete Task

DELETE /api/tasks/{task_id}

Clear Tasks

POST /api/tasks/clear?status=completed

Services Endpoints

List Services

GET /api/services/

Response: {
  "services": {
    "gemini": {"enabled": true, "healthy": true, ...},
    ...
  },
  "total": 4,
  "enabled": 3
}

Get Service

GET /api/services/{service_name}

Health Check

POST /api/services/{service_name}/health

Test Service

POST /api/services/{service_name}/test?test_prompt=Hello

Routing Rules Endpoints ⭐ NEW

List All Rules

GET /api/routing/rules

Response: {
  "rules": [
    {"task_type": "coding", "service": "qwen"},
    ...
  ],
  "stats": {
    "total_rules": 3,
    "rules_by_service": {"qwen": 2, "gemini": 1},
    "task_types": ["coding", "code_review", "bug_search"]
  }
}

Get Rule

GET /api/routing/rules/{task_type}

Create/Update Rule

POST /api/routing/rules
Content-Type: application/json

{
  "task_type": "coding",
  "service": "qwen"
}

Response: {
  "message": "Routing rule updated",
  "rule": {"task_type": "coding", "service": "qwen"}
}

Update Rule

PUT /api/routing/rules/{task_type}
Content-Type: application/json

{
  "task_type": "coding",
  "service": "gemini"
}

Delete Rule

DELETE /api/routing/rules/{task_type}

Clear All Rules

POST /api/routing/rules/clear

Get Available Task Types

GET /api/routing/task-types

Response: {
  "task_types": [
    {
      "name": "coding",
      "label": "Code Generation",
      "description": "Writing new code, implementing features",
      "recommended_services": ["qwen", "gemini"]
    },
    ...
  ]
}

Monitoring Endpoints

Get Metrics

GET /api/monitoring/metrics

Response: {
  "services": {"total": 4, "enabled": 3, "healthy": 2, ...},
  "tasks": {"total": 10, "running": 0, "completed": 8, ...},
  "system": {"cpu_percent": 25.3, "memory_percent": 45.7, ...},
  "websocket": {"connections": 1},
  "timestamp": 1234567890.123
}

Get Stats

GET /api/monitoring/stats

Response: {
  "total_tasks": 42,
  "avg_duration": 5.67,
  "success_rate": 95.24,
  "tasks_by_status": {"completed": 40, "failed": 2}
}

Health Check

GET /api/monitoring/health

Response: {
  "status": "healthy",
  "healthy": true,
  "issues": [],
  "cpu_percent": 25.3,
  "memory_percent": 45.7
}

WebSocket API

Connect to ws://localhost:8000/ws for real-time updates.

Message Types:

  1. task_start

{
  "type": "task_start",
  "task_id": "...",
  "task_type": "coding",
  "service": "qwen"
}
  1. task_progress (streaming)

{
  "type": "task_progress",
  "task_id": "...",
  "chunk": "Here is the code..."
}
  1. task_complete

{
  "type": "task_complete",
  "task_id": "...",
  "success": true,
  "duration": 5.23
}

Client Usage:

const ws = new WebSocket('ws://localhost:8000/ws');

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'task_progress') {
    console.log(data.chunk);
  }
};

// Keep-alive ping
setInterval(() => ws.send('ping'), 30000);

πŸ”§ Development

Project Structure

oxide/
β”œβ”€β”€ config/
β”‚   └── default.yaml                # Main configuration
β”œβ”€β”€ src/oxide/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ classifier.py           # Task classification
β”‚   β”‚   β”œβ”€β”€ router.py               # Routing logic
β”‚   β”‚   └── orchestrator.py         # Main orchestrator
β”‚   β”œβ”€β”€ adapters/
β”‚   β”‚   β”œβ”€β”€ base.py                 # Base adapter interface
β”‚   β”‚   β”œβ”€β”€ cli_adapter.py          # CLI adapter base
β”‚   β”‚   β”œβ”€β”€ gemini.py               # Gemini adapter
β”‚   β”‚   β”œβ”€β”€ qwen.py                 # Qwen adapter
β”‚   β”‚   └── ollama_http.py          # Ollama HTTP adapter
β”‚   β”œβ”€β”€ execution/
β”‚   β”‚   └── parallel.py             # Parallel execution engine
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ task_storage.py         # Task persistence
β”‚   β”‚   β”œβ”€β”€ routing_rules.py        # Routing rules storage
β”‚   β”‚   β”œβ”€β”€ process_manager.py      # Process lifecycle
β”‚   β”‚   β”œβ”€β”€ logging.py              # Logging utilities
β”‚   β”‚   └── exceptions.py           # Custom exceptions
β”‚   β”œβ”€β”€ mcp/
β”‚   β”‚   β”œβ”€β”€ server.py               # MCP server (FastMCP)
β”‚   β”‚   └── tools.py                # MCP tool definitions
β”‚   └── web/
β”‚       β”œβ”€β”€ backend/
β”‚       β”‚   β”œβ”€β”€ main.py             # FastAPI application
β”‚       β”‚   β”œβ”€β”€ websocket.py        # WebSocket manager
β”‚       β”‚   └── routes/
β”‚       β”‚       β”œβ”€β”€ tasks.py        # Task endpoints
β”‚       β”‚       β”œβ”€β”€ services.py     # Service endpoints
β”‚       β”‚       β”œβ”€β”€ routing.py      # Routing rules endpoints
β”‚       β”‚       └── monitoring.py   # Monitoring endpoints
β”‚       └── frontend/               # React SPA
β”‚           β”œβ”€β”€ src/
β”‚           β”‚   β”œβ”€β”€ components/
β”‚           β”‚   β”‚   β”œβ”€β”€ TaskExecutor.jsx
β”‚           β”‚   β”‚   β”œβ”€β”€ TaskAssignmentManager.jsx
β”‚           β”‚   β”‚   β”œβ”€β”€ TaskHistory.jsx
β”‚           β”‚   β”‚   β”œβ”€β”€ ServiceCard.jsx
β”‚           β”‚   β”‚   └── MetricsDashboard.jsx
β”‚           β”‚   β”œβ”€β”€ hooks/
β”‚           β”‚   β”‚   β”œβ”€β”€ useServices.js
β”‚           β”‚   β”‚   β”œβ”€β”€ useMetrics.js
β”‚           β”‚   β”‚   └── useWebSocket.js
β”‚           β”‚   β”œβ”€β”€ api/
β”‚           β”‚   β”‚   └── client.js
β”‚           β”‚   └── App.jsx
β”‚           β”œβ”€β”€ package.json
β”‚           └── vite.config.js
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_process_cleanup.py
β”‚   └── test_task_history_integration.py
└── scripts/
    └── start_web_ui.sh

Running Tests

# Process cleanup tests
python3 tests/test_process_cleanup.py

# Task history integration tests
python3 tests/test_task_history_integration.py

# All tests pass
# βœ“ Sync process cleanup
# βœ“ Async process cleanup
# βœ“ Multiple process cleanup
# βœ“ Signal handler cleanup
# βœ“ Task storage integration

Adding a New LLM Adapter

  1. Create adapter class

# src/oxide/adapters/my_llm.py
from .base import BaseAdapter
from typing import AsyncIterator, List, Optional

class MyLLMAdapter(BaseAdapter):
    def __init__(self, config: dict):
        super().__init__("my_llm", config)
        self.api_key = config.get("api_key")
        # Initialize your client...

    async def execute(
        self,
        prompt: str,
        files: Optional[List[str]] = None,
        timeout: Optional[int] = None,
        **kwargs
    ) -> AsyncIterator[str]:
        """Execute task and stream results."""
        # Your implementation
        yield "Response chunk"

    async def health_check(self) -> bool:
        """Check if service is available."""
        # Your health check logic
        return True

    def get_service_info(self) -> dict:
        """Return service metadata."""
        info = super().get_service_info()
        info.update({
            "description": "My LLM Service",
            "max_tokens": 100000
        })
        return info
  1. Register in configuration

# config/default.yaml
services:
  my_llm:
    enabled: true
    type: http  # or 'cli'
    base_url: http://localhost:8080
    model: my-model
    api_key: ${MY_LLM_API_KEY}  # From environment
  1. Update orchestrator

# src/oxide/core/orchestrator.py
def _create_adapter(self, service_name, config):
    service_type = config.get("type")

    if service_type == "cli":
        if "my_llm" in service_name:
            from ..adapters.my_llm import MyLLMAdapter
            return MyLLMAdapter(config)
        # ... other CLI adapters

    elif service_type == "http":
        if "my_llm" in service_name:
            from ..adapters.my_llm import MyLLMAdapter
            return MyLLMAdapter(config)
        # ... other HTTP adapters
  1. Test your adapter

import asyncio
from oxide.core.orchestrator import Orchestrator
from oxide.config.loader import load_config

async def test():
    config = load_config()
    orchestrator = Orchestrator(config)

    async for chunk in orchestrator.execute_task(
        prompt="Test query",
        preferences={"preferred_service": "my_llm"}
    ):
        print(chunk, end="")

asyncio.run(test())

πŸ“Š Storage Files

Oxide creates the following files in ~/.oxide/:

  • tasks.json - Task execution history (all tasks from all sources)

  • routing_rules.json - Custom routing rules (task type β†’ service)

  • oxide.log - Application logs (if file logging enabled)

Example tasks.json:

{
  "task-uuid-1": {
    "id": "task-uuid-1",
    "status": "completed",
    "prompt": "What is quantum computing?",
    "files": [],
    "service": "ollama_local",
    "task_type": "quick_query",
    "result": "Quantum computing is...",
    "error": null,
    "created_at": 1234567890.123,
    "started_at": 1234567890.456,
    "completed_at": 1234567895.789,
    "duration": 5.333
  }
}

Example routing_rules.json:

{
  "coding": "qwen",
  "code_review": "gemini",
  "bug_search": "qwen",
  "quick_query": "ollama_local"
}

🎯 Local LLM Management

Auto-Start Ollama

Oxide can automatically start Ollama if it's not running:

# config/default.yaml
services:
  ollama_local:
    type: http
    base_url: "http://localhost:11434"
    api_type: ollama
    enabled: true
    auto_start: true              # πŸ”₯ Auto-start if not running
    auto_detect_model: true       # πŸ”₯ Auto-detect best model
    max_retries: 2                # Retry on failures
    retry_delay: 2                # Seconds between retries

What happens:

  1. First task execution checks if Ollama is running

  2. If not, automatically starts Ollama via:

    • macOS: Opens Ollama.app or runs ollama serve

    • Linux: Uses systemd or runs ollama serve

    • Windows: Runs ollama serve as detached process

  3. Waits up to 30s for Ollama to be ready

  4. Proceeds with task execution

Auto-Detect Models

No need to configure model names manually:

lmstudio:
  type: http
  base_url: "http://192.168.1.33:1234/v1"
  api_type: openai_compatible
  enabled: true
  default_model: null           # πŸ”₯ Will auto-detect
  auto_detect_model: true
  preferred_models:             # Priority order
    - "qwen"                    # Matches: qwen/qwen2.5-coder-14b
    - "coder"                   # Matches: mistralai/codestral-22b
    - "deepseek"                # Matches: deepseek/deepseek-r1

Smart Selection Algorithm:

  1. Fetches available models from service

  2. Tries exact match with preferred models

  3. Tries partial match (e.g., "qwen" matches "qwen2.5-coder:7b")

  4. Falls back to first available model

Service Health Monitoring

from oxide.utils.service_manager import get_service_manager

service_manager = get_service_manager()

# Comprehensive health check with auto-recovery
health = await service_manager.ensure_service_healthy(
    service_name="ollama_local",
    base_url="http://localhost:11434",
    api_type="ollama",
    auto_start=True,           # Try to start if down
    auto_detect_model=True     # Detect available models
)

print(f"Healthy: {health['healthy']}")
print(f"Models: {health['models']}")
print(f"Recommended: {health['recommended_model']}")

Background Health Monitoring

# Start monitoring (checks every 60s, auto-recovers on failure)
await service_manager.start_health_monitoring(
    service_name="ollama_local",
    base_url="http://localhost:11434",
    interval=60,
    auto_recovery=True
)

🎯 Usage Examples

Example 1: Simple Query (Auto-Start Enabled)

# Ollama will auto-start if not running!
async for chunk in orchestrator.execute_task("What is 2+2?"):
    print(chunk, end="")

# What happens:
# 1. Checks if Ollama is running β†’ not running
# 2. Auto-starts Ollama (takes ~5s)
# 3. Auto-detects model: qwen2.5-coder:7b
# 4. Executes task
# 5. Returns: "4"

Example 2: Code Review with Manual Selection

async for chunk in orchestrator.execute_task(
    prompt="Review this code for bugs",
    files=["src/auth.py"],
    preferences={"preferred_service": "gemini"}
):
    print(chunk, end="")

# Forces routing to: gemini
# Gets large context window for thorough review

Example 3: Large Codebase Analysis

# Parallel analysis
from oxide.execution.parallel import ParallelExecutor

executor = ParallelExecutor(max_workers=3)

result = await executor.execute_parallel(
    prompt="Analyze architecture patterns",
    files=["src/**/*.py"],  # 50+ files
    services=["gemini", "qwen", "ollama_local"],
    strategy="split"
)

print(f"Completed in {result.total_duration_seconds}s")
print(result.aggregated_text)

Example 4: Using Routing Rules

# Set up rules via API
import requests

requests.post("http://localhost:8000/api/routing/rules", json={
    "task_type": "coding",
    "service": "qwen"
})

# Now all coding tasks go to qwen automatically
async for chunk in orchestrator.execute_task("Write a Python function to sort a list"):
    print(chunk, end="")

# Routes to: qwen (custom rule)

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository

  2. Create a feature branch (git checkout -b feature/amazing-feature)

  3. Make your changes

  4. Add tests if applicable

  5. Update documentation

  6. Commit your changes (git commit -m 'Add amazing feature')

  7. Push to the branch (git push origin feature/amazing-feature)

  8. Open a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/yourusername/oxide.git
cd oxide

# Install dev dependencies
uv sync

# Install frontend dependencies
cd src/oxide/web/frontend
npm install
cd ../../..

# Run tests
python3 tests/test_process_cleanup.py
python3 tests/test_task_history_integration.py

# Start development servers
python -m uvicorn oxide.web.backend.main:app --reload &
cd src/oxide/web/frontend && npm run dev

πŸ“ License

MIT License - Copyright (c) 2025 yayoboy

See LICENSE file for details.

πŸ‘₯ Authors

πŸ™ Acknowledgments

  • Built with FastAPI - Modern Python web framework

  • React dashboard using Vite - Lightning-fast frontend tooling

  • MCP integration via Model Context Protocol

  • Process management inspired by supervisor and systemd patterns

  • WebSocket support via FastAPI WebSockets

  • Task classification inspired by semantic analysis techniques

πŸ“§ Support

For issues, questions, or suggestions:

πŸ—ΊοΈ Roadmap

v0.2.0 (Planned)

  • SQLite database for task storage

  • Advanced metrics and analytics

  • Cost tracking per service

  • Rate limiting and quotas

  • Multi-user support

  • Docker deployment

v0.3.0 (Future)

  • Plugin system for custom adapters

  • Workflow automation (task chains)

  • A/B testing framework

  • Performance benchmarking suite

  • Auto-scaling for parallel execution

πŸ“Š Project Status

Version: 0.1.0 Status: βœ… Production Ready - MVP Complete!

Completed Features

  • Project structure and dependencies

  • Configuration system

  • Task classifier

  • Task router with fallbacks

  • Adapter implementations (Gemini, Qwen, Ollama)

  • MCP server integration

  • Web UI dashboard (React + FastAPI)

  • Real-time monitoring and WebSocket

  • Task executor in Web UI

  • Task assignment manager (routing rules UI)

  • Persistent task storage

  • Process lifecycle management

  • Test suite (process cleanup, task storage)

  • Comprehensive documentation

In Progress

  • Production deployment guides

  • Docker containerization

  • Extended test coverage


Built with ❀️ for intelligent LLM orchestration

Last Updated: December 2025

Install Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yayoboy/oxide'

If you have feedback or need assistance with the MCP directory API, please join our Discord server