M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

Overview Schema Related Servers Score Discussions

Mimir
docs
guides

LLM_PROVIDER_GUIDE.md•12.9 KiB

# LLM Provider Configuration Guide **Version:** 1.0.0 **Last Updated:** 2025-11-10 This guide explains how to configure which LLM models are used by Mimir's multi-agent orchestration system. --- ## Table of Contents 1. [Overview](#overview) 2. [Default Configuration (GPT-4.1)](#default-configuration-gpt-41) 3. [Switching to Premium Models](#switching-to-premium-models) 4. [Using Local Ollama](#using-local-ollama) 5. [Per-Agent Model Configuration](#per-agent-model-configuration) 6. [Configuration Methods](#configuration-methods) 7. [Troubleshooting](#troubleshooting) --- ## Overview Mimir's orchestration pipeline uses different LLM models for different agent roles: | Agent Role | Default Model | Purpose | |------------|---------------|---------| | **Ecko** (Prompt Architect) | gpt-4.1 | Optimize user prompts | | **PM** (Project Manager) | gpt-4.1 | Research & planning | | **Worker** (Task Executor) | gpt-4.1 | Execute individual tasks | | **QC** (Quality Control) | gpt-4.1 | Verify worker output | **Why GPT-4.1 by default?** - ✅ Avoids premium request usage (if you have Copilot Pro) - ✅ Fast response times - ✅ Good quality for most tasks - ✅ Lower cost --- ## Default Configuration (GPT-4.1) ### What You Get Out of the Box When you run `docker compose up`, Mimir uses **GPT-4.1** for all agents via GitHub Copilot API. **No configuration needed!** This is the recommended setup for most users. ### Available Models via Copilot API Check what models are available: ```bash # View available models curl http://localhost:4141/v1/models | jq '.data[].id' ``` Typical output: ``` gpt-4.1 gpt-4o gpt-4o-mini o1-preview o1-mini claude-3.5-sonnet ``` > 💡 **Note**: Available models depend on your GitHub Copilot subscription (Individual, Business, or Enterprise). --- ## Switching to Premium Models ### Option 1: Via Open-WebUI (Easiest) 1. Open Open-WebUI: http://localhost:3000 2. Click the **Settings** icon (⚙️) in the top right 3. Go to **Admin Panel** → **Settings** → **Pipelines** 4. Find **Mimir Multi-Agent Orchestrator** 5. Click **Edit** (pencil icon) 6. Modify the **Valves** (configuration): ```json { "PM_MODEL": "gpt-4.1", "WORKER_MODEL": "gpt-4.1", "QC_MODEL": "gpt-4.1" } ``` 7. Click **Save** 8. Test with a new chat ### Option 2: Edit Python Pipeline Directly Edit `/Users/c815719/src/playground/mimir/pipelines/mimir_orchestrator.py`: ```python:66:82:pipelines/mimir_orchestrator.py PM_MODEL: str = Field( default="gpt-4.1", # Changed from gpt-4.1 description="Model to use for PM agent (planning)." ) WORKER_MODEL: str = Field( default="gpt-4.1", # Changed from gpt-4.1 description="Model to use for worker agents (task execution)." ) QC_MODEL: str = Field( default="gpt-4.1", # Changed from gpt-4.1 description="Model to use for QC agents (verification)." ) ``` Then rebuild: ```bash # Rebuild the Open-WebUI container with updated pipeline docker compose restart open-webui # Or rebuild from scratch docker compose down docker compose up -d --build open-webui ``` ### Premium Model Options | Model | Speed | Quality | Cost | Best For | |-------|-------|---------|------|----------| | **gpt-4.1** | ⚡⚡⚡ Fast | ✅ Good | 💰 Low | General tasks, default | | **gpt-4o** | ⚡⚡ Medium | ✅✅ Better | 💰💰 Medium | Complex reasoning | | **gpt-4o-mini** | ⚡⚡⚡ Fast | ✅ Good | 💰 Low | Simple tasks | | **o1-preview** | ⚡ Slow | ✅✅✅ Best | 💰💰💰 High | Hard problems, deep reasoning | | **o1-mini** | ⚡⚡ Medium | ✅✅ Better | 💰💰 Medium | Moderate reasoning | | **claude-3.5-sonnet** | ⚡⚡ Medium | ✅✅ Better | 💰💰 Medium | Code generation | > ⚠️ **Warning**: Premium models (gpt-4o, o1-*) count against your Copilot Pro usage limits. Use sparingly or upgrade your plan. --- ## Using Local Ollama ### Why Use Ollama? - ✅ Fully offline (no internet required) - ✅ No usage limits - ✅ Free (after hardware investment) - ⚠️ Requires GPU for good performance - ⚠️ Lower quality than GPT-4 ### Step 1: Enable Ollama Service Uncomment the Ollama service in `docker-compose.yml`: ```yaml:50:96:docker-compose.yml ollama: build: context: ./docker/ollama dockerfile: Dockerfile args: - EMBEDDING_MODEL=${MIMIR_EMBEDDINGS_MODEL:-mxbai-embed-large} tags: - mimir-ollama:${VERSION:-1.0.0} - mimir-ollama:latest image: mimir-ollama:${VERSION:-1.0.0} container_name: ollama_server ports: - "11434:11434" # Ollama API volumes: - ./data/ollama:/root/.ollama # Persist models environment: - OLLAMA_HOST=0.0.0.0:11434 - OLLAMA_ORIGINS=* restart: unless-stopped healthcheck: test: ["CMD", "ollama", "list"] interval: 10s timeout: 5s retries: 5 start_period: 30s networks: - mcp_network # Uncomment if you have GPU support (NVIDIA) # deploy: # resources: # reservations: # devices: # - driver: nvidia # count: 1 # capabilities: [gpu] ``` ### Step 2: Start Ollama ```bash # Stop current services docker compose down # Start with Ollama docker compose up -d # Wait for Ollama to start (30-60 seconds) docker compose logs -f ollama ``` ### Step 3: Pull Models ```bash # Pull a model (inside container) docker exec -it ollama_server ollama pull llama3.1:8b # Or pull from host (if Ollama CLI installed) ollama pull llama3.1:8b # Verify models are available docker exec -it ollama_server ollama list ``` ### Step 4: Configure Pipeline to Use Ollama Edit `pipelines/mimir_orchestrator.py`: ```python:42:50:pipelines/mimir_orchestrator.py # Change LLM API URL to Ollama LLM_API_URL: str = Field( default="http://ollama:11434/v1", # Changed from copilot-api:4141 description="LLM API base URL", ) # Models must match what you pulled PM_MODEL: str = Field( default="llama3.1:8b", # Changed from gpt-4.1 description="Model to use for PM agent" ) WORKER_MODEL: str = Field( default="llama3.1:8b", description="Model to use for worker agents" ) QC_MODEL: str = Field( default="llama3.1:8b", description="Model to use for QC agents" ) ``` ### Step 5: Restart and Test ```bash # Restart Open-WebUI to pick up changes docker compose restart open-webui # Test in Open-WebUI # http://localhost:3000 ``` ### Recommended Ollama Models | Model | Size | RAM Needed | Quality | Speed | Best For | |-------|------|------------|---------|-------|----------| | **llama3.1:8b** | 4.7GB | 8GB | ✅✅ Good | ⚡⚡ Fast | General tasks | | **llama3.1:70b** | 40GB | 64GB | ✅✅✅ Excellent | ⚡ Slow | Complex reasoning | | **qwen2.5:7b** | 4.7GB | 8GB | ✅✅ Good | ⚡⚡ Fast | Code generation | | **codellama:13b** | 7.4GB | 16GB | ✅✅ Good | ⚡⚡ Medium | Code-specific tasks | | **mistral:7b** | 4.1GB | 8GB | ✅ Decent | ⚡⚡⚡ Fast | Simple tasks | > 💡 **Tip**: Start with `llama3.1:8b` for a good balance of quality and speed. --- ## Per-Agent Model Configuration You can use different models for different agent roles: ### Example: Optimize for Cost and Quality ```python:66:82:pipelines/mimir_orchestrator.py # Fast, cheap model for PM (planning is quick) PM_MODEL: str = Field( default="gpt-4.1", description="Fast planning" ) # High-quality model for Workers (critical execution) WORKER_MODEL: str = Field( default="gpt-4.1", description="High-quality execution" ) # Medium model for QC (verification needs accuracy) QC_MODEL: str = Field( default="gpt-4.1", description="Thorough verification" ) ``` ### Example: All Local (Ollama) ```python:66:82:pipelines/mimir_orchestrator.py PM_MODEL: str = Field(default="llama3.1:8b") WORKER_MODEL: str = Field(default="qwen2.5:7b") # Better at code QC_MODEL: str = Field(default="llama3.1:8b") ``` ### Example: Hybrid (Copilot + Ollama) ```python:42:82:pipelines/mimir_orchestrator.py # Use Copilot for PM and QC (strategic) COPILOT_BASE_URL: str = Field(default="http://copilot-api:4141/v1") PM_MODEL: str = Field(default="gpt-4.1") QC_MODEL: str = Field(default="gpt-4.1") # Use Ollama for Workers (execution, many calls) # NOTE: This requires custom logic to switch base URLs per agent # Not currently supported out of the box WORKER_MODEL: str = Field(default="llama3.1:8b") ``` > ⚠️ **Limitation**: Currently, all agents must use the same API endpoint (either Copilot or Ollama). Hybrid setups require code modifications. --- ## Configuration Methods ### Method 1: Open-WebUI Valves (Recommended) **Pros:** - ✅ No code changes - ✅ Changes take effect immediately - ✅ Per-user configuration - ✅ Easy to test different models **Cons:** - ❌ Settings lost if container recreated - ❌ Must configure via UI **How to:** 1. Open-WebUI → Settings → Admin Panel → Pipelines 2. Edit **Mimir Multi-Agent Orchestrator** 3. Modify **Valves** JSON 4. Save ### Method 2: Edit Python Source (Persistent) **Pros:** - ✅ Changes persist across container recreations - ✅ Version controlled (git) - ✅ Applies to all users **Cons:** - ❌ Requires container rebuild - ❌ Requires editing code **How to:** 1. Edit `pipelines/mimir_orchestrator.py` 2. Modify the `Valves` class defaults 3. Rebuild: `docker compose restart open-webui` ### Method 3: Environment Variables (Advanced) **Pros:** - ✅ No code changes - ✅ Easy to change via `.env` - ✅ Supports different configs per environment **Cons:** - ❌ Requires adding environment variable support to code - ❌ Not currently implemented **Future feature** - would allow: ```bash # In .env MIMIR_PM_MODEL=gpt-4.1 MIMIR_WORKER_MODEL=gpt-4.1 MIMIR_QC_MODEL=gpt-4.1 ``` --- ## Troubleshooting ### Problem: Model not found **Symptoms:** ``` Error: Model 'gpt-5' not found ``` **Solution:** ```bash # Check available models curl http://localhost:4141/v1/models | jq '.data[].id' # Use a model from the list ``` ### Problem: Ollama models not loading **Symptoms:** ``` Error: Failed to connect to Ollama ``` **Solution:** ```bash # Check Ollama is running docker compose ps ollama # Check Ollama logs docker compose logs ollama # Pull model if missing docker exec -it ollama_server ollama pull llama3.1:8b # Verify model is available docker exec -it ollama_server ollama list ``` ### Problem: Premium model usage limits hit **Symptoms:** ``` Error: Rate limit exceeded ``` **Solution:** 1. Switch back to `gpt-4.1` (non-premium) 2. Or upgrade your GitHub Copilot plan 3. Or use local Ollama ### Problem: Changes not taking effect **Symptoms:** Model still using old configuration **Solution:** ```bash # If using Open-WebUI Valves: refresh page # If using Python source: rebuild container docker compose restart open-webui # Nuclear option: full rebuild docker compose down docker compose up -d --build ``` ### Problem: Slow performance with Ollama **Symptoms:** Responses take 30+ seconds **Solution:** 1. **Check GPU**: Ollama needs GPU for good performance ```bash # Check if GPU is available docker exec -it ollama_server nvidia-smi ``` 2. **Use smaller model**: Switch from 70b → 8b ```python WORKER_MODEL: str = Field(default="llama3.1:8b") # Not 70b ``` 3. **Increase resources**: Docker Desktop → Settings → Resources - RAM: 16GB minimum - CPUs: 4+ cores --- ## Best Practices ### 1. Start with Defaults Use `gpt-4.1` for everything until you identify bottlenecks. ### 2. Profile Your Workload Track which agents consume the most tokens: - **PM**: Usually 1-2K tokens (planning) - **Workers**: Usually 2-5K tokens each (execution) - **QC**: Usually 1-2K tokens (verification) ### 3. Optimize Strategically - **High-volume agents** (Workers) → Use cheaper models - **Critical agents** (QC) → Use better models - **Fast agents** (PM) → Use faster models ### 4. Monitor Usage Check your Copilot usage: ```bash # Via copilot-api curl http://localhost:4141/usage # Or visit the usage dashboard open "https://ericc-ch.github.io/copilot-api?endpoint=http://localhost:4141/usage" ``` ### 5. Test Before Committing Test model changes with simple tasks before running complex orchestrations. --- ## Summary | Scenario | Recommended Configuration | |----------|---------------------------| | **Default (most users)** | All agents: `gpt-4.1` (Copilot) | | **Premium quality** | All agents: `gpt-4o` (Copilot) | | **Cost-optimized** | PM/QC: `gpt-4.1`, Workers: `gpt-4o-mini` | | **Fully offline** | All agents: `llama3.1:8b` (Ollama) | | **Best quality** | All agents: `o1-preview` (Copilot, expensive) | | **Code-focused** | Workers: `qwen2.5:7b` (Ollama) or `claude-3.5-sonnet` (Copilot) | --- ## Related Documentation - **[QUICKSTART.md](../getting-started/QUICKSTART.md)** - Initial setup - **[AGENTS.md](../AGENTS.md)** - Multi-agent workflows - **[copilot-api README](https://github.com/ericc-ch/copilot-api)** - Copilot API documentation - **[Ollama Models](https://ollama.com/library)** - Available Ollama models --- **Need Help?** - 🐛 [Report Issues](https://github.com/orneryd/Mimir/issues) - 💬 [Discussions](https://github.com/orneryd/Mimir/discussions)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

LLM_PROVIDER_GUIDE.md•12.9 KiB