Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
LLM_STRATEGY.md3.8 kB
# LLM Configuration Strategy ## Current Configuration (Updated: October 18, 2025) ### Agent LLMs: Copilot GPT-4.1 (Cloud) All agents now use **GPT-4.1 via GitHub Copilot** for superior reasoning and task execution: - **PM Agent**: GPT-4.1 for research, planning, and task decomposition - **Worker Agent**: GPT-4.1 for high-quality task execution - **QC Agent**: GPT-4.1 for strict validation and consistency **Why GPT-4.1:** - ✅ Superior reasoning capabilities - ✅ Excellent tool calling and function execution - ✅ 128K context window - ✅ Fast response times (cloud infrastructure) - ✅ Consistent, reliable output ### Embeddings: Ollama nomic-embed-text (Local) Vector embeddings for RAG search and file indexing use **local Ollama models**: - **Model**: `nomic-embed-text` (137M params, 768 dimensions) - **Use Cases**: - File content indexing - Semantic search - Document similarity - RAG retrieval **Why Local Embeddings:** - ✅ **Privacy**: File content stays local - ✅ **Cost**: No API costs for embeddings - ✅ **Speed**: Local inference is fast for small embedding models - ✅ **Offline**: Works without internet connection ## Configuration File Location: `.mimir/llm-config.json` ```json { "defaultProvider": "copilot", "agentDefaults": { "pm": { "provider": "copilot", "model": "gpt-4.1" }, "worker": { "provider": "copilot", "model": "gpt-4.1" }, "qc": { "provider": "copilot", "model": "gpt-4.1" }, "embeddings": { "provider": "ollama", "model": "nomic-embed-text" } } } ``` ## Future RAG Implementation When implementing vector embeddings and RAG search: 1. **Use `agentDefaults.embeddings`** configuration 2. **Model**: `nomic-embed-text` via Ollama 3. **Pull model**: `ollama pull nomic-embed-text` 4. **Integration**: File indexing system will use local embeddings ## Switching Back to Ollama (If Needed) To switch agents back to local Ollama models: ```json { "defaultProvider": "ollama", "agentDefaults": { "pm": { "provider": "ollama", "model": "qwen3:8b" }, "worker": { "provider": "ollama", "model": "qwen2.5-coder:1.5b-base" }, "qc": { "provider": "ollama", "model": "qwen3:8b" } } } ``` **Note**: Requires 16GB RAM allocated to Docker for qwen3:8b model. ## Prerequisites ### For Copilot (Current Setup) 1. **GitHub Copilot Subscription**: Active subscription required 2. **GitHub CLI Auth**: `gh auth login` 3. **Copilot API Proxy**: Running on `http://localhost:4141/v1` ### For Ollama Embeddings (Next Project) 1. **Docker Allocated RAM**: 16GB recommended 2. **Pull Embedding Model**: `docker exec ollama_server ollama pull nomic-embed-text` 3. **Verify**: `docker exec ollama_server ollama list` ## Testing Configuration ```bash # Test with current config (Copilot) npm run chain "test simple task" # Check config is loaded correctly grep -A 10 "agentDefaults" .mimir/llm-config.json ``` ## Benefits of Hybrid Approach | Component | Provider | Benefit | |-----------|----------|---------| | **Agent Reasoning** | Copilot GPT-4.1 | Best-in-class performance, fast cloud inference | | **Vector Embeddings** | Ollama Local | Privacy, no API costs, offline capability | This hybrid approach provides: - 🎯 **Best Performance**: Cloud LLMs for complex reasoning - 🔒 **Privacy**: Local embeddings keep file content private - 💰 **Cost Efficiency**: Embeddings are free (local), pay only for agent inference - 🚀 **Speed**: Fast cloud LLMs + fast local embeddings --- **Next Steps:** 1. ✅ Configuration updated to use Copilot GPT-4.1 2. 🔄 Test agent chain with new configuration 3. 📋 Next project: Implement RAG with local embeddings (`nomic-embed-text`)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server