Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
LLM_CONFIGURATION.md7.67 kB
# LLM Configuration Guide ## Overview Mimir uses **Ollama with `gpt-oss` as the default model** for all agent activity (PM, Worker, QC agents). This provides a good balance of quality and speed with a fully open-source stack. ## Current Configuration ### Default Setup (`.mimir/llm-config.json`) ```json { "defaultProvider": "ollama", "providers": { "ollama": { "defaultModel": "gpt-oss", ... } }, "agentDefaults": { "pm": { "provider": "ollama", "model": "gpt-oss" }, "worker": { "provider": "ollama", "model": "gpt-oss" }, "qc": { "provider": "ollama", "model": "gpt-oss" } }, "features": { "pmModelSuggestions": false } } ``` **Key Points:** - ✅ **All agents use `gpt-oss` by default** (13B params, 32K context window) - ✅ **Ollama runs locally** at `http://localhost:11434` - ✅ **No API keys required** - fully self-hosted - ✅ **Backup models available**: `tinyllama`, `qwen2.5-coder:1.5b-base`, `deepseek-coder:6.7b` - 🎯 **Optional PM model suggestions** - PM can recommend task-specific models (disabled by default) ## PM Model Suggestions Feature ### Overview When enabled, the PM agent can recommend specific models for individual tasks based on their requirements. This allows intelligent model selection while preserving cost control. **Default: DISABLED** - All agents use configured defaults unless explicitly enabled. ### How It Works 1. **PM Receives Model List**: PM agent sees available models with descriptions when planning 2. **PM Suggests Model**: PM can add `recommendedModel` field to task definitions ```json { "id": "task-1", "title": "Optimize database queries", "recommendedModel": "ollama/deepseek-coder:6.7b" } ``` 3. **Task Executor Parses Suggestion**: Always parsed and validated, logged for debugging 4. **Conditional Usage**: Only used if `features.pmModelSuggestions = true` 5. **Fallback to Defaults**: Invalid suggestions fall back to agent type defaults ### Enabling PM Model Suggestions Edit `.mimir/llm-config.json`: ```json { "features": { "pmModelSuggestions": true } } ``` ### Model Format PM can suggest models in two formats: 1. **Full Format**: `provider/model` - Example: `ollama/deepseek-coder:6.7b` - Specifies both provider and model 2. **Short Format**: `model` - Example: `gpt-oss` - Uses default provider (ollama) ### Example Use Cases **Code Generation Tasks:** ```json { "id": "implement-api", "title": "Implement REST API endpoints", "recommendedModel": "ollama/deepseek-coder:6.7b" } ``` **Complex Planning Tasks:** ```json { "id": "system-architecture", "title": "Design system architecture", "recommendedModel": "ollama/gpt-oss" } ``` **Quick Validation Tasks:** ```json { "id": "lint-check", "title": "Run linter and fix issues", "recommendedModel": "ollama/tinyllama" } ``` ### Benefits ✅ **Task-Specific Optimization**: Use specialized models for specific tasks ✅ **Cost Control**: Feature disabled by default prevents unintended model switching ✅ **Backward Compatible**: Existing configs work without changes ✅ **Fallback Safety**: Invalid suggestions default to configured models ✅ **Debugging**: All suggestions logged even when feature disabled ### Limitations ⚠️ **PM Must Know Models**: PM sees available models in prompt ⚠️ **Validation Required**: Invalid models fall back to defaults ⚠️ **No Runtime Discovery**: Available models loaded from config at startup ⚠️ **Feature Flag Required**: Must explicitly enable in config ## Agent-Specific Configuration ### PM Agent (Planning/Decomposition) - **Model**: `gpt-oss` (Ollama) - **Rationale**: Strong reasoning for task decomposition and planning - **Context Window**: 32,768 tokens - **Temperature**: 0.0 (deterministic) ### Worker Agent (Task Execution) - **Model**: `gpt-oss` (Ollama) - **Rationale**: Consistent quality for code generation and implementation - **Context Window**: 32,768 tokens - **Temperature**: 0.0 (deterministic) - **PM Override**: If `pmModelSuggestions` enabled, uses task-specific model ### QC Agent (Verification) - **Model**: `gpt-oss` (Ollama) - **Rationale**: Strict validation with maximum consistency - **Context Window**: 32,768 tokens - **Temperature**: 0.0 (deterministic) ## Alternative Models ### Available in Your Ollama Instance ```bash $ ollama list NAME SIZE CONTEXT gpt-oss:latest 13 GB 32K ✅ DEFAULT tinyllama:latest 637 MB 8K (testing only) qwen2.5-coder:1.5b-base 986 MB 32K (fast coding) deepseek-coder:6.7b 3.8 GB 16K (coding specialist) phi:latest 1.6 GB 2K (not recommended) ``` ### To Switch Models Edit `.mimir/llm-config.json`: ```json { "agentDefaults": { "pm": { "model": "gpt-oss" }, "worker": { "model": "qwen2.5-coder:1.5b-base" }, // Faster for simple tasks "qc": { "model": "gpt-oss" } } } ``` ## GitHub Copilot Fallback (Optional) If you have GitHub Copilot Pro and want to use cloud models: ```json { "defaultProvider": "copilot", "providers": { "copilot": { "baseUrl": "http://localhost:4141/v1", "defaultModel": "gpt-4o", ... } } } ``` **Requirements:** - Active GitHub Copilot subscription - Copilot proxy running at port 4141 ## RAG/Vector Embeddings (Future) **⚠️ NOT YET IMPLEMENTED** When the RAG/vector embedding system is added, it will use **separate configuration** from agent LLMs: ```json { "vectorEmbeddings": { "provider": "ollama", "model": "nomic-embed-text", "dimensions": 768, "baseUrl": "http://localhost:11434" } } ``` **Rationale for Separation:** - **Agent LLMs**: Need reasoning, code generation, planning (larger models) - **Embedding Models**: Need semantic similarity only (smaller, specialized models) - **Different performance profiles**: Embeddings are fast/cheap, LLM inference is expensive - **Independent scaling**: Can use different hardware/services for each ## Configuration File Locations 1. **Project-specific**: `.mimir/llm-config.json` (git-ignored, recommended) 2. **Environment variable**: `MIMIR_LLM_CONFIG=/path/to/config.json` 3. **Fallback**: Hardcoded defaults in `src/config/LLMConfigLoader.ts` ## Verifying Configuration ```bash # Check loaded config npm run test testing/config/llm-config-loader.test.ts # View current models ollama list # Test specific model ollama run gpt-oss "Hello, world!" ``` ## Performance Notes ### `gpt-oss` (13B params) - **Inference Speed**: ~2-5 tokens/sec (CPU), ~20-50 tokens/sec (GPU) - **Memory**: ~13GB RAM (full model) - **Context Window**: 32K tokens - **Quality**: Good for most tasks ### When to Switch Models **Use smaller models** (`qwen2.5-coder:1.5b-base`, `tinyllama`) if: - ❌ Running on resource-constrained hardware - ❌ Need faster iteration during development - ❌ Tasks are simple (e.g., basic refactoring) **Use larger models** (`gpt-oss`, `deepseek-coder:6.7b`) if: - ✅ Complex reasoning required - ✅ Quality is critical (production deploys) - ✅ You have GPU acceleration available ## Troubleshooting ### Model Not Found ```bash # Pull the model ollama pull gpt-oss # Verify it's available ollama list | grep gpt-oss ``` ### Ollama Not Running ```bash # Start Ollama ollama serve # Or check if already running curl http://localhost:11434/api/tags ``` ### Out of Memory ```bash # Use smaller model # Edit .mimir/llm-config.json: "defaultModel": "tinyllama" # Only 637MB ``` --- **Last Updated**: 2025-10-18 **Version**: 1.0.0 **Default Model**: `gpt-oss` (Ollama)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server