Delia

Overview Schema Related Servers Score Discussions

006-agent-aware-model-routing.md•11.4 KiB

# ADR-006: Agent-Aware Model Routing ## Status Proposed ## Context Current HuggingFace trends show that **agent-trained models** significantly outperform instruction-tuned models for tool-use tasks: | Model | Training | Terminal-Bench | SWE-Bench | |-------|----------|----------------|-----------| | Qwen3-8B (base) | Instruction | 0.0 | 0.7% | | OpenThinker-Agent-v1 | Execution traces | **4.9** | **15.7%** | | DeepSWE-32B | RL on SWE tasks | - | **42.2%** | | DeepCoder-14B | RL on code traces | - | 60.6% LiveCodeBench | **Key insight**: Models trained on execution traces (terminal commands, tool outputs, diffs) have dramatically better tool compliance and task completion rates than generic instruction-tuned models. Delia's current architecture routes agentic tasks to the `coder` tier, which uses generic code models. This is suboptimal. ## Decision Add two new model tiers optimized for agent workflows: ### 1. `agentic` Tier - **Purpose**: Tool-calling loops, terminal operations, file manipulation - **Ideal Models**: OpenThinker-Agent-v1, agent-fine-tuned variants - **Trigger**: `OrchestrationMode.AGENTIC` detection - **Key Property**: High tool compliance, deterministic outputs ### 2. `swe` Tier - **Purpose**: Multi-file refactoring, repo-scale reasoning, architecture changes - **Ideal Models**: DeepSWE-Preview, SWE-bench optimized models - **Trigger**: Complex codebase operations detected - **Key Property**: Long-context reasoning, diff generation ## Architecture Changes ### 1. Config Layer (`config.py`) ```python # New tier definitions (after line 181) model_agentic: ModelConfig = field( default_factory=lambda: ModelConfig( name="agentic", default_model="auto", vram_gb=float(os.getenv("DELIA_MODEL_AGENTIC_VRAM", "-1")), context_tokens=-1, num_ctx=-1, max_input_kb=int(os.getenv("DELIA_MODEL_AGENTIC_INPUT_KB", "64")), ) ) model_swe: ModelConfig = field( default_factory=lambda: ModelConfig( name="swe", default_model="auto", vram_gb=float(os.getenv("DELIA_MODEL_SWE_VRAM", "-1")), context_tokens=-1, num_ctx=-1, max_input_kb=int(os.getenv("DELIA_MODEL_SWE_INPUT_KB", "200")), # Large for repo context ) ) # New task sets (after line 205) agentic_tasks: frozenset[str] = field( default_factory=lambda: frozenset({"agent", "tool", "execute", "terminal"}) ) swe_tasks: frozenset[str] = field( default_factory=lambda: frozenset({"refactor", "migrate", "architect", "redesign"}) ) ``` ### 2. Model Detection (`model_detection.py`) ```python # Updated TIER_KEYWORDS (line 48) TIER_KEYWORDS = { "dispatcher": ["functiongemma"], "thinking": ["think", "reason", "r1", "o1", "deepseek-r"], "agentic": ["agent", "openthinker", "tool-use", "function-call"], # NEW "swe": ["swe", "deepswe", "software-engineer"], # NEW "coder": ["code", "coder", "codestral", "starcoder", "qwen2.5-coder", "deepcoder"], "moe": ["30b", "32b", "70b", "72b", "moe", "mixtral", "qwen3:30"], "quick": ["7b", "8b", "3b", "4b", "1b", "small", "mini", "tiny", "14b"], } # Updated assign_models_to_tiers (line 127) tiers: dict[str, list[str]] = { "quick": [], "coder": [], "moe": [], "thinking": [], "dispatcher": [], "agentic": [], # NEW "swe": [], # NEW } ``` ### 3. Intent Detection (`intent.py`) ```python # New SWE patterns (after line 87) SWE_PATTERNS: ClassVar[list[IntentPattern]] = [ IntentPattern( re.compile(r"\b(refactor|redesign|migrate|overhaul|rewrite)\s+(the\s+)?(entire|whole|full|complete)?\s*(codebase|project|system|repo)\b", re.I), orchestration_mode=OrchestrationMode.AGENTIC, task_type="swe", confidence_boost=0.6, reasoning="repo-scale operation detected", ), IntentPattern( re.compile(r"\b(multi.?file|across files|all files|every file|codebase.?wide)\b", re.I), task_type="swe", confidence_boost=0.5, reasoning="multi-file operation", ), IntentPattern( re.compile(r"\b(architecture|system design|component diagram|module structure)\b", re.I), task_type="swe", confidence_boost=0.45, reasoning="architectural task", ), ] # Enhanced AGENTIC_PATTERNS - add agent-specific triggers IntentPattern( re.compile(r"\b(use tools?|call tools?|with tools?|tool.?use|function.?call)\b", re.I), orchestration_mode=OrchestrationMode.AGENTIC, task_type="agentic", confidence_boost=0.55, reasoning="explicit tool use requested", ), ``` ### 4. Routing Logic (`routing.py`) ```python # New task mappings (after line 322) _AGENTIC_TASKS = frozenset({"agent", "tool", "execute", "terminal", "agentic"}) _SWE_TASKS = frozenset({"refactor", "migrate", "architect", "redesign", "swe"}) def _task_to_tier(task_type: str) -> str: """Map task type to model tier for economic lookups.""" if task_type in _MOE_TASKS: return "moe" if task_type in _AGENTIC_TASKS: return "agentic" # NEW if task_type in _SWE_TASKS: return "swe" # NEW if task_type in _CODER_TASKS: return "coder" return "quick" # In ModelRouter.select_model() - add after line 803: # Priority 2.6: Agentic tasks use agent-trained models if task_type in self.config.agentic_tasks or task_type == "agentic": model_agentic = get_model("agentic") if model_agentic != "current": log.info("model_selected", source="agentic_task", task=task_type, tier="agentic") return model_agentic # Fall back to coder if no agentic model configured log.info("model_selected", source="agentic_fallback", task=task_type, tier="coder") return model_coder # Priority 2.7: SWE tasks use SWE-optimized models if task_type in self.config.swe_tasks or task_type == "swe": model_swe = get_model("swe") if model_swe != "current": log.info("model_selected", source="swe_task", task=task_type, tier="swe") return model_swe # Fall back to moe for complex reasoning log.info("model_selected", source="swe_fallback", task=task_type, tier="moe") return model_moe ``` ### 5. Orchestration Executor (`executor.py`) ```python # In _execute_agentic(), modify line 174: # OLD: selected_model = model_override or await select_model(task_type="review", ...) # NEW: task_tier = "agentic" if intent.task_type in ("agentic", "agent", "tool") else intent.task_type if intent.task_type == "swe": task_tier = "swe" selected_model = model_override or await select_model( task_type=task_tier, content_size=len(message), content=message ) ``` ### 6. Settings Schema (`settings.json.example`) ```json { "backends": [ { "id": "ollama-local", "models": { "quick": "qwen3:8b", "coder": ["deepcoder:14b", "qwen2.5-coder:14b"], "moe": ["qwen3:32b", "openthinker:32b"], "thinking": "openthinker:7b", "agentic": "openthinker:7b", "swe": "deepswe-preview:32b", "dispatcher": "functiongemma:270m" } } ] } ``` ## Routing Flow Diagram ``` User Request │ ▼ ┌─────────────────────────────────────────┐ │ Intent Detection │ │ (regex → semantic → LLM classifier) │ └─────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ Orchestration Mode Selection │ │ │ │ AGENTIC detected? │ │ ├─ SWE patterns? → task_type="swe" │ │ └─ Tool patterns? → task_type="agentic" │ │ │ │ Other modes: VOTING, DEEP_THINKING... │ └─────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ Model Selection │ │ │ │ task_type == "agentic" │ │ → Use agentic tier (OpenThinker) │ │ │ │ task_type == "swe" │ │ → Use swe tier (DeepSWE) │ │ │ │ Fallbacks: │ │ agentic → coder → quick │ │ swe → moe → coder │ └─────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ Agent Loop Execution │ │ │ │ SWE tier gets: │ │ - allow_write=True │ │ - allow_exec=True (with gate) │ │ - Larger max_iterations (15) │ │ │ │ Agentic tier gets: │ │ - Standard tool access │ │ - Native tool calling if supported │ └─────────────────────────────────────────┘ ``` ## Fallback Strategy When specialized tiers aren't configured: | Missing Tier | Fallback Chain | |--------------|----------------| | `agentic` | `coder` → `quick` | | `swe` | `moe` → `coder` | | `thinking` | `moe` → `coder` | | `coder` | `quick` | ## Quality Tracking The existing melon/affinity system automatically learns which models excel at agentic tasks: ```python # In BackendScorer.score() - affinity boost applies per task_type affinity = tracker.get_affinity(backend.id, "agentic") # Tracks agentic performance boost_multiplier *= (1.0 + (affinity - 0.5) * 0.4) # ±20% boost ``` Over time, models that succeed at tool-use tasks accumulate higher affinity scores for agentic routing. ## Consequences ### Positive - Agent loops use purpose-trained models (10x+ improvement on tool compliance) - SWE tasks get repo-scale reasoning capability - Existing fallback system prevents failures if tiers not configured - Quality system auto-learns optimal routing ### Negative - Two additional models to manage/pull - Slight increase in config complexity - Requires users to understand tier purpose ### Neutral - Backward compatible - existing configs work unchanged - No breaking changes to MCP tool interface ## Implementation Order 1. `model_detection.py` - Add tier keywords 2. `config.py` - Add tier definitions and task sets 3. `routing.py` - Add tier routing logic 4. `intent.py` - Add SWE patterns 5. `executor.py` - Wire up tier selection in agentic mode 6. `settings.json.example` - Update example config 7. Pull models: `ollama pull deepcoder:14b openthinker:7b` ## References - [DeepCoder-14B](https://huggingface.co/agentica-org/DeepCoder-14B-Preview) - 60.6% LiveCodeBench - [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) - SOTA 8B agent - [DeepSWE-Preview](https://huggingface.co/agentica-org/DeepSWE-Preview) - 42.2% SWE-Bench

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zbrdc/delia'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

006-agent-aware-model-routing.md•11.4 KiB