Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
README.md6.62 kB
# llama.cpp Docker Images This directory contains Dockerfiles and scripts to build llama.cpp server for **multiple platforms and use cases**. ## Available Images ### 1. ARM64 - Embeddings (Apple Silicon) - **Image:** `timothyswt/llama-cpp-server-arm64:latest` - **Use Case:** Text embeddings for semantic search (Mac M1/M2/M3) - **Models:** nomic-embed-text (768d), mxbai-embed-large (1024d) - **Files:** `Dockerfile`, `Dockerfile.mxbai` ### 2. AMD64 - Embeddings (Windows/Linux x86_64) - **Image:** `timothyswt/llama-cpp-server-amd64-mxbai:latest` - **Use Case:** Text embeddings for semantic search (Windows/Linux PC) - **Models:** mxbai-embed-large (1024d) - **Files:** `Dockerfile.mxbai` (build with `--platform linux/amd64`) ### 3. AMD64 - Vision Models (Windows/Linux x86_64) ⭐ NEW - **Image:** `timothyswt/llama-cpp-server-amd64-vision:latest` - **Use Case:** Image understanding and multimodal AI (Windows/Linux PC) - **Models:** Llama 3.2 Vision (11B/90B), Qwen2.5-VL (2B/7B) - **Files:** `Dockerfile.amd64-vision` - **Docs:** `BUILD_AMD64_VISION.md` - **Build Script:** `scripts/build-llama-cpp-vision-amd64.ps1` ## Why Custom Builds? The official `ghcr.io/ggml-org/llama.cpp:server` image claims multi-arch support but actually only provides AMD64. These custom builds provide: - **Native ARM64** for Apple Silicon Macs (no emulation overhead) - **Native AMD64** with vision capabilities for Windows/Linux - **Embedded models** - no external downloads required --- ## Quick Start - Embeddings (ARM64) ### Using Pre-built Image ```bash docker pull timothyswt/llama-cpp-server-arm64:latest docker run -p 11434:8080 -v ./models:/models timothyswt/llama-cpp-server-arm64:latest ``` ### Building Locally ```bash # Build the image ./scripts/build-llama-cpp.sh # Or manually: docker build -t timothyswt/llama-cpp-server-arm64:latest -f docker/llama-cpp/Dockerfile . ``` ## Features - **Native ARM64**: Built specifically for Apple Silicon - **OpenAI-Compatible API**: Drop-in replacement for Ollama - **Embeddings Support**: Mean pooling for semantic search - **Lightweight**: ~200MB runtime image - **No GPU Required**: Optimized for CPU inference ## Usage with Mimir The `docker-compose.yml` automatically uses this image for the `llama-server` service: ```yaml llama-server: image: timothyswt/llama-cpp-server-arm64:latest ports: - "11434:8080" volumes: - ollama_models:/models ``` ## API Endpoints - **Health**: `GET /health` - **Embeddings**: `POST /v1/embeddings` - **Models**: `GET /v1/models` Compatible with Ollama API format. ## Model Management ### Download Models Models can be in GGUF format (same as Ollama). Place them in the `/models` directory: ```bash # If you have Ollama models already: cp -r ~/.ollama/models ./data/ollama/models # Or download directly: curl -L https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q8_0.gguf \ -o ./data/ollama/models/nomic-embed-text.gguf ``` ### Specify Model in docker-compose.yml Uncomment and set the model path in the `command` section: ```yaml command: - "--model" - "/models/nomic-embed-text.gguf" - "--alias" - "nomic-embed-text" ``` ## Performance ARM64 build performance: - **Embeddings**: ~50-100ms per request (768-dim) - **Memory**: ~200MB base + model size - **CPU**: Utilizes all cores efficiently ## Publishing Updates ```bash # Build and push new version ./scripts/build-llama-cpp.sh 1.0.1 # Or manually: docker push timothyswt/llama-cpp-server-arm64:1.0.1 docker push timothyswt/llama-cpp-server-arm64:latest ``` ## Troubleshooting ### Image Not Found ```bash docker pull timothyswt/llama-cpp-server-arm64:latest ``` ### Health Check Fails Wait 30 seconds for server startup, or check logs: ```bash docker logs llama_server ``` ### Model Not Loading Verify model path and format (must be GGUF): ```bash docker exec llama_server ls -la /models ``` ## Architecture ``` ┌─────────────────────────────────┐ │ Docker Container (ARM64) │ │ │ │ ┌──────────────────────────┐ │ │ │ llama-server binary │ │ │ │ (compiled from source) │ │ │ └──────────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────┐ │ │ │ OpenAI-compatible API │ │ │ │ Port: 8080 │ │ │ └──────────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────┐ │ │ │ /models (volume) │ │ │ │ GGUF model files │ │ │ └──────────────────────────┘ │ └─────────────────────────────────┘ ↓ Port 11434 (external) ↓ Mimir Server ``` ## Quick Reference - Vision Models (AMD64) ### Build Vision Image for Windows ```powershell # Build with Llama 3.2 Vision models (11B + 90B) .\scripts\build-llama-cpp-vision-amd64.ps1 -ModelType llama32 # Or build with Qwen2.5-VL models (2B + 7B) .\scripts\build-llama-cpp-vision-amd64.ps1 -ModelType qwen25 ``` ### Enable Vision in docker-compose.amd64.yml ```yaml # Uncomment one or both vision services llama-server-vision-2b: # Port 8081 - Faster, lower memory llama-server-vision-7b: # Port 8082 - Higher quality ``` ### Test Vision API ```powershell curl http://localhost:8081/v1/chat/completions ` -H "Content-Type: application/json" ` -d '{ "model": "vision-2b", "messages": [{ "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] }] }' ``` See `BUILD_AMD64_VISION.md` for complete documentation. ## References - [llama.cpp GitHub](https://github.com/ggerganov/llama.cpp) - [GGUF Format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) - [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) - [Llama 3.2 Vision](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md) - [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server