Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
LLM_PROVIDER_GUIDE.md13.2 kB
# LLM Provider Configuration Guide **Version:** 1.0.0 **Last Updated:** 2025-11-10 This guide explains how to configure which LLM models are used by Mimir's multi-agent orchestration system. --- ## Table of Contents 1. [Overview](#overview) 2. [Default Configuration (GPT-4.1)](#default-configuration-gpt-41) 3. [Switching to Premium Models](#switching-to-premium-models) 4. [Using Local Ollama](#using-local-ollama) 5. [Per-Agent Model Configuration](#per-agent-model-configuration) 6. [Configuration Methods](#configuration-methods) 7. [Troubleshooting](#troubleshooting) --- ## Overview Mimir's orchestration pipeline uses different LLM models for different agent roles: | Agent Role | Default Model | Purpose | |------------|---------------|---------| | **Ecko** (Prompt Architect) | gpt-4.1 | Optimize user prompts | | **PM** (Project Manager) | gpt-4.1 | Research & planning | | **Worker** (Task Executor) | gpt-4.1 | Execute individual tasks | | **QC** (Quality Control) | gpt-4.1 | Verify worker output | **Why GPT-4.1 by default?** - ✅ Avoids premium request usage (if you have Copilot Pro) - ✅ Fast response times - ✅ Good quality for most tasks - ✅ Lower cost --- ## Default Configuration (GPT-4.1) ### What You Get Out of the Box When you run `docker compose up`, Mimir uses **GPT-4.1** for all agents via GitHub Copilot API. **No configuration needed!** This is the recommended setup for most users. ### Available Models via Copilot API Check what models are available: ```bash # View available models curl http://localhost:4141/v1/models | jq '.data[].id' ``` Typical output: ``` gpt-4.1 gpt-4o gpt-4o-mini o1-preview o1-mini claude-3.5-sonnet ``` > 💡 **Note**: Available models depend on your GitHub Copilot subscription (Individual, Business, or Enterprise). --- ## Switching to Premium Models ### Option 1: Via Open-WebUI (Easiest) 1. Open Open-WebUI: http://localhost:3000 2. Click the **Settings** icon (⚙️) in the top right 3. Go to **Admin Panel** → **Settings** → **Pipelines** 4. Find **Mimir Multi-Agent Orchestrator** 5. Click **Edit** (pencil icon) 6. Modify the **Valves** (configuration): ```json { "PM_MODEL": "gpt-4.1", "WORKER_MODEL": "gpt-4.1", "QC_MODEL": "gpt-4.1" } ``` 7. Click **Save** 8. Test with a new chat ### Option 2: Edit Python Pipeline Directly Edit `/Users/c815719/src/playground/mimir/pipelines/mimir_orchestrator.py`: ```python:66:82:pipelines/mimir_orchestrator.py PM_MODEL: str = Field( default="gpt-4.1", # Changed from gpt-4.1 description="Model to use for PM agent (planning)." ) WORKER_MODEL: str = Field( default="gpt-4.1", # Changed from gpt-4.1 description="Model to use for worker agents (task execution)." ) QC_MODEL: str = Field( default="gpt-4.1", # Changed from gpt-4.1 description="Model to use for QC agents (verification)." ) ``` Then rebuild: ```bash # Rebuild the Open-WebUI container with updated pipeline docker compose restart open-webui # Or rebuild from scratch docker compose down docker compose up -d --build open-webui ``` ### Premium Model Options | Model | Speed | Quality | Cost | Best For | |-------|-------|---------|------|----------| | **gpt-4.1** | ⚡⚡⚡ Fast | ✅ Good | 💰 Low | General tasks, default | | **gpt-4o** | ⚡⚡ Medium | ✅✅ Better | 💰💰 Medium | Complex reasoning | | **gpt-4o-mini** | ⚡⚡⚡ Fast | ✅ Good | 💰 Low | Simple tasks | | **o1-preview** | ⚡ Slow | ✅✅✅ Best | 💰💰💰 High | Hard problems, deep reasoning | | **o1-mini** | ⚡⚡ Medium | ✅✅ Better | 💰💰 Medium | Moderate reasoning | | **claude-3.5-sonnet** | ⚡⚡ Medium | ✅✅ Better | 💰💰 Medium | Code generation | > ⚠️ **Warning**: Premium models (gpt-4o, o1-*) count against your Copilot Pro usage limits. Use sparingly or upgrade your plan. --- ## Using Local Ollama ### Why Use Ollama? - ✅ Fully offline (no internet required) - ✅ No usage limits - ✅ Free (after hardware investment) - ⚠️ Requires GPU for good performance - ⚠️ Lower quality than GPT-4 ### Step 1: Enable Ollama Service Uncomment the Ollama service in `docker-compose.yml`: ```yaml:50:96:docker-compose.yml ollama: build: context: ./docker/ollama dockerfile: Dockerfile args: - EMBEDDING_MODEL=${MIMIR_EMBEDDINGS_MODEL:-mxbai-embed-large} tags: - mimir-ollama:${VERSION:-1.0.0} - mimir-ollama:latest image: mimir-ollama:${VERSION:-1.0.0} container_name: ollama_server ports: - "11434:11434" # Ollama API volumes: - ./data/ollama:/root/.ollama # Persist models environment: - OLLAMA_HOST=0.0.0.0:11434 - OLLAMA_ORIGINS=* restart: unless-stopped healthcheck: test: ["CMD", "ollama", "list"] interval: 10s timeout: 5s retries: 5 start_period: 30s networks: - mcp_network # Uncomment if you have GPU support (NVIDIA) # deploy: # resources: # reservations: # devices: # - driver: nvidia # count: 1 # capabilities: [gpu] ``` ### Step 2: Start Ollama ```bash # Stop current services docker compose down # Start with Ollama docker compose up -d # Wait for Ollama to start (30-60 seconds) docker compose logs -f ollama ``` ### Step 3: Pull Models ```bash # Pull a model (inside container) docker exec -it ollama_server ollama pull llama3.1:8b # Or pull from host (if Ollama CLI installed) ollama pull llama3.1:8b # Verify models are available docker exec -it ollama_server ollama list ``` ### Step 4: Configure Pipeline to Use Ollama Edit `pipelines/mimir_orchestrator.py`: ```python:42:50:pipelines/mimir_orchestrator.py # Change LLM API URL to Ollama LLM_API_URL: str = Field( default="http://ollama:11434/v1", # Changed from copilot-api:4141 description="LLM API base URL", ) # Models must match what you pulled PM_MODEL: str = Field( default="llama3.1:8b", # Changed from gpt-4.1 description="Model to use for PM agent" ) WORKER_MODEL: str = Field( default="llama3.1:8b", description="Model to use for worker agents" ) QC_MODEL: str = Field( default="llama3.1:8b", description="Model to use for QC agents" ) ``` ### Step 5: Restart and Test ```bash # Restart Open-WebUI to pick up changes docker compose restart open-webui # Test in Open-WebUI # http://localhost:3000 ``` ### Recommended Ollama Models | Model | Size | RAM Needed | Quality | Speed | Best For | |-------|------|------------|---------|-------|----------| | **llama3.1:8b** | 4.7GB | 8GB | ✅✅ Good | ⚡⚡ Fast | General tasks | | **llama3.1:70b** | 40GB | 64GB | ✅✅✅ Excellent | ⚡ Slow | Complex reasoning | | **qwen2.5:7b** | 4.7GB | 8GB | ✅✅ Good | ⚡⚡ Fast | Code generation | | **codellama:13b** | 7.4GB | 16GB | ✅✅ Good | ⚡⚡ Medium | Code-specific tasks | | **mistral:7b** | 4.1GB | 8GB | ✅ Decent | ⚡⚡⚡ Fast | Simple tasks | > 💡 **Tip**: Start with `llama3.1:8b` for a good balance of quality and speed. --- ## Per-Agent Model Configuration You can use different models for different agent roles: ### Example: Optimize for Cost and Quality ```python:66:82:pipelines/mimir_orchestrator.py # Fast, cheap model for PM (planning is quick) PM_MODEL: str = Field( default="gpt-4.1", description="Fast planning" ) # High-quality model for Workers (critical execution) WORKER_MODEL: str = Field( default="gpt-4.1", description="High-quality execution" ) # Medium model for QC (verification needs accuracy) QC_MODEL: str = Field( default="gpt-4.1", description="Thorough verification" ) ``` ### Example: All Local (Ollama) ```python:66:82:pipelines/mimir_orchestrator.py PM_MODEL: str = Field(default="llama3.1:8b") WORKER_MODEL: str = Field(default="qwen2.5:7b") # Better at code QC_MODEL: str = Field(default="llama3.1:8b") ``` ### Example: Hybrid (Copilot + Ollama) ```python:42:82:pipelines/mimir_orchestrator.py # Use Copilot for PM and QC (strategic) COPILOT_BASE_URL: str = Field(default="http://copilot-api:4141/v1") PM_MODEL: str = Field(default="gpt-4.1") QC_MODEL: str = Field(default="gpt-4.1") # Use Ollama for Workers (execution, many calls) # NOTE: This requires custom logic to switch base URLs per agent # Not currently supported out of the box WORKER_MODEL: str = Field(default="llama3.1:8b") ``` > ⚠️ **Limitation**: Currently, all agents must use the same API endpoint (either Copilot or Ollama). Hybrid setups require code modifications. --- ## Configuration Methods ### Method 1: Open-WebUI Valves (Recommended) **Pros:** - ✅ No code changes - ✅ Changes take effect immediately - ✅ Per-user configuration - ✅ Easy to test different models **Cons:** - ❌ Settings lost if container recreated - ❌ Must configure via UI **How to:** 1. Open-WebUI → Settings → Admin Panel → Pipelines 2. Edit **Mimir Multi-Agent Orchestrator** 3. Modify **Valves** JSON 4. Save ### Method 2: Edit Python Source (Persistent) **Pros:** - ✅ Changes persist across container recreations - ✅ Version controlled (git) - ✅ Applies to all users **Cons:** - ❌ Requires container rebuild - ❌ Requires editing code **How to:** 1. Edit `pipelines/mimir_orchestrator.py` 2. Modify the `Valves` class defaults 3. Rebuild: `docker compose restart open-webui` ### Method 3: Environment Variables (Advanced) **Pros:** - ✅ No code changes - ✅ Easy to change via `.env` - ✅ Supports different configs per environment **Cons:** - ❌ Requires adding environment variable support to code - ❌ Not currently implemented **Future feature** - would allow: ```bash # In .env MIMIR_PM_MODEL=gpt-4.1 MIMIR_WORKER_MODEL=gpt-4.1 MIMIR_QC_MODEL=gpt-4.1 ``` --- ## Troubleshooting ### Problem: Model not found **Symptoms:** ``` Error: Model 'gpt-5' not found ``` **Solution:** ```bash # Check available models curl http://localhost:4141/v1/models | jq '.data[].id' # Use a model from the list ``` ### Problem: Ollama models not loading **Symptoms:** ``` Error: Failed to connect to Ollama ``` **Solution:** ```bash # Check Ollama is running docker compose ps ollama # Check Ollama logs docker compose logs ollama # Pull model if missing docker exec -it ollama_server ollama pull llama3.1:8b # Verify model is available docker exec -it ollama_server ollama list ``` ### Problem: Premium model usage limits hit **Symptoms:** ``` Error: Rate limit exceeded ``` **Solution:** 1. Switch back to `gpt-4.1` (non-premium) 2. Or upgrade your GitHub Copilot plan 3. Or use local Ollama ### Problem: Changes not taking effect **Symptoms:** Model still using old configuration **Solution:** ```bash # If using Open-WebUI Valves: refresh page # If using Python source: rebuild container docker compose restart open-webui # Nuclear option: full rebuild docker compose down docker compose up -d --build ``` ### Problem: Slow performance with Ollama **Symptoms:** Responses take 30+ seconds **Solution:** 1. **Check GPU**: Ollama needs GPU for good performance ```bash # Check if GPU is available docker exec -it ollama_server nvidia-smi ``` 2. **Use smaller model**: Switch from 70b → 8b ```python WORKER_MODEL: str = Field(default="llama3.1:8b") # Not 70b ``` 3. **Increase resources**: Docker Desktop → Settings → Resources - RAM: 16GB minimum - CPUs: 4+ cores --- ## Best Practices ### 1. Start with Defaults Use `gpt-4.1` for everything until you identify bottlenecks. ### 2. Profile Your Workload Track which agents consume the most tokens: - **PM**: Usually 1-2K tokens (planning) - **Workers**: Usually 2-5K tokens each (execution) - **QC**: Usually 1-2K tokens (verification) ### 3. Optimize Strategically - **High-volume agents** (Workers) → Use cheaper models - **Critical agents** (QC) → Use better models - **Fast agents** (PM) → Use faster models ### 4. Monitor Usage Check your Copilot usage: ```bash # Via copilot-api curl http://localhost:4141/usage # Or visit the usage dashboard open "https://ericc-ch.github.io/copilot-api?endpoint=http://localhost:4141/usage" ``` ### 5. Test Before Committing Test model changes with simple tasks before running complex orchestrations. --- ## Summary | Scenario | Recommended Configuration | |----------|---------------------------| | **Default (most users)** | All agents: `gpt-4.1` (Copilot) | | **Premium quality** | All agents: `gpt-4o` (Copilot) | | **Cost-optimized** | PM/QC: `gpt-4.1`, Workers: `gpt-4o-mini` | | **Fully offline** | All agents: `llama3.1:8b` (Ollama) | | **Best quality** | All agents: `o1-preview` (Copilot, expensive) | | **Code-focused** | Workers: `qwen2.5:7b` (Ollama) or `claude-3.5-sonnet` (Copilot) | --- ## Related Documentation - **[QUICKSTART.md](../getting-started/QUICKSTART.md)** - Initial setup - **[AGENTS.md](../AGENTS.md)** - Multi-agent workflows - **[copilot-api README](https://github.com/ericc-ch/copilot-api)** - Copilot API documentation - **[Ollama Models](https://ollama.com/library)** - Available Ollama models --- **Need Help?** - 🐛 [Report Issues](https://github.com/orneryd/Mimir/issues) - 💬 [Discussions](https://github.com/orneryd/Mimir/discussions)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server