Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
QWEN_VL_SETUP.mdβ€’8.4 kB
# Qwen2.5-VL Vision-Language Model Setup ## 🎯 Overview This document describes the setup and configuration for using **Qwen2.5-VL** vision-language models with Mimir for image understanding and description generation. **Note:** We're using Qwen2.5-VL instead of Qwen3-VL due to better llama.cpp compatibility and stable GGUF support. ## πŸ—οΈ Architecture ``` Image File β†’ llama.cpp (qwen2.5-vl) β†’ Text Description β†’ nomic-embed-text-v1.5 (embed description) β†’ Neo4j (vector + description) ``` **Why this approach?** - βœ… Multimodal GGUF embedding models are rare/unavailable - βœ… VLM descriptions are human-readable and debuggable - βœ… Works with existing text embedding infrastructure - βœ… Provides semantic image search capabilities ## πŸ“¦ Available Models | Model | GGUF Size | Context | RAM Required | Speed | Quality | |-------|-----------|---------|--------------|-------|---------| | **qwen2.5-vl-2b** | ~1.5 GB | 32K tokens | ~2 GB | ~60 tok/s | Good | | **qwen2.5-vl-7b** | ~4.8 GB | 128K tokens | ~6 GB | ~35 tok/s | **Excellent** ⭐ | | **qwen2.5-vl-72b** | ~45 GB | 128K tokens | ~48 GB | ~8 tok/s | Best | *Speeds are approximate on Apple Silicon M-series* ## πŸ”§ Configuration ### Environment Variables (Maxed Out Settings) ```bash # Image Embeddings Control MIMIR_EMBEDDINGS_IMAGES=true # Enable image indexing MIMIR_EMBEDDINGS_IMAGES_DESCRIBE_MODE=true # Use VLM description method # VL Provider Configuration MIMIR_EMBEDDINGS_VL_PROVIDER=llama.cpp # Provider type MIMIR_EMBEDDINGS_VL_API=http://llama-vl-server:8080 # VL server endpoint MIMIR_EMBEDDINGS_VL_API_PATH=/v1/chat/completions # OpenAI-compatible endpoint MIMIR_EMBEDDINGS_VL_API_KEY=dummy-key # Not required for local MIMIR_EMBEDDINGS_VL_MODEL=qwen2.5-vl # Model name # Context & Generation Settings (MAXED OUT) MIMIR_EMBEDDINGS_VL_CONTEXT_SIZE=131072 # 128K tokens (7b/72b) MIMIR_EMBEDDINGS_VL_MAX_TOKENS=2048 # Max description length MIMIR_EMBEDDINGS_VL_TEMPERATURE=0.7 # Balanced creativity MIMIR_EMBEDDINGS_VL_DIMENSIONS=768 # Falls back to text dims ``` ### Docker Compose Service ```yaml llama-vl-server: image: timothyswt/llama-cpp-server-arm64-qwen-vl:8b # or :4b container_name: llama_vl_server ports: - "8081:8080" # Different port to avoid conflict environment: # Runtime overrides (model is baked into image) - LLAMA_ARG_CTX_SIZE=131072 # 128K tokens (8b), use 32768 for 4b - LLAMA_ARG_N_PARALLEL=4 - LLAMA_ARG_THREADS=-1 # Use all available threads - LLAMA_ARG_HOST=0.0.0.0 - LLAMA_ARG_PORT=8080 # Vision-specific settings - LLAMA_ARG_TEMPERATURE=0.7 - LLAMA_ARG_TOP_K=20 - LLAMA_ARG_TOP_P=0.95 restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 3 start_period: 60s # VL models take longer to load networks: - mcp_network ``` ## πŸš€ Building Docker Images ### Prerequisites 1. **Pull models from Ollama** (already done): ```bash ollama pull qwen3-vl:4b ollama pull qwen3-vl:8b ``` 2. **Extract GGUF files** (already done): ```bash # Models are copied to docker/llama-cpp/models/ ls -lh docker/llama-cpp/models/ # qwen3-vl-4b.gguf (3.3 GB) # qwen3-vl-8b.gguf (6.1 GB) ``` ### Build Commands ```bash # Build 4b image (faster, lighter) ./scripts/build-llama-cpp-qwen-vl.sh 4b # Build 8b image (recommended, best balance) ./scripts/build-llama-cpp-qwen-vl.sh 8b # Build 32b image (requires downloading model first) ./scripts/build-llama-cpp-qwen-vl.sh 32b ``` ### Build Process Each build: 1. βœ… Clones **latest** llama.cpp (main branch) for qwen3-vl support 2. βœ… Compiles llama.cpp server with multimodal support 3. βœ… Copies **only** the specified model (keeps image size down) 4. βœ… Tests locally on port 8080 5. βœ… Prompts for Docker Hub push ## πŸ§ͺ Testing ### Health Check ```bash curl http://localhost:8081/health ``` ### Image Description Test ```bash # Convert image to base64 IMAGE_BASE64=$(base64 -i /path/to/image.png | tr -d '\n') # Test vision capabilities curl -X POST http://localhost:8081/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3-vl", "messages": [{ "role": "user", "content": [ {"type": "text", "text": "Describe this image in detail."}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,'$IMAGE_BASE64'"}} ] }], "max_tokens": 2048, "temperature": 0.7 }' ``` ## πŸ“Š Performance Tuning ### Context Size by Model ```bash # 4b model LLAMA_ARG_CTX_SIZE=32768 # 32K tokens # 8b model (recommended) LLAMA_ARG_CTX_SIZE=131072 # 128K tokens # 32b model LLAMA_ARG_CTX_SIZE=131072 # 128K tokens ``` ### Parallelism ```bash LLAMA_ARG_N_PARALLEL=4 # Process 4 requests simultaneously LLAMA_ARG_THREADS=-1 # Use all available CPU threads ``` ### Generation Quality ```bash LLAMA_ARG_TEMPERATURE=0.7 # Balanced (0.0 = deterministic, 1.0 = creative) LLAMA_ARG_TOP_K=20 # Consider top 20 tokens LLAMA_ARG_TOP_P=0.95 # Nucleus sampling threshold ``` ## 🏷️ Docker Image Tags ``` timothyswt/llama-cpp-server-arm64-qwen-vl:4b timothyswt/llama-cpp-server-arm64-qwen-vl:4b-latest timothyswt/llama-cpp-server-arm64-qwen-vl:8b timothyswt/llama-cpp-server-arm64-qwen-vl:8b-latest timothyswt/llama-cpp-server-arm64-qwen-vl:latest (β†’ 8b) ``` ## πŸ› Troubleshooting ### Error: "key not found in model: qwen3vl.rope.dimension_sections" **Cause:** Old llama.cpp version doesn't support qwen3-vl models **Solution:** Rebuild with latest llama.cpp: ```bash # Dockerfile now uses latest main branch ./scripts/build-llama-cpp-qwen-vl.sh 4b ``` ### Container Keeps Restarting **Check logs:** ```bash docker logs llama_vl_server --tail 50 ``` **Common causes:** - Model file not found/corrupted - Insufficient memory (need ~4GB for 4b, ~8GB for 8b) - llama.cpp version incompatibility ### Slow Inference **Solutions:** - Use 4b model instead of 8b - Reduce `LLAMA_ARG_CTX_SIZE` - Reduce `LLAMA_ARG_N_PARALLEL` - Enable GPU support (if available) ## πŸ“ Image Processing Strategy ### Automatic Downscaling (No Chunking Required) Qwen2.5-VL has built-in dynamic resolution handling: **Model Limits:** - `image_max_pixels`: 3,211,264 (~1792Γ—1792 pixels, 3.2 MP) - `image_min_pixels`: 6,272 (~79Γ—79 pixels) - `patch_size`: 14Γ—14 pixels - `image_size`: 560 pixels **Supported Image Sizes:** - βœ… 1920Γ—1080 (Full HD) = 2.07 MP β†’ **No resize needed** - βœ… 1792Γ—1792 (Square) = 3.21 MP β†’ **No resize needed** - ⚠️ 2560Γ—1440 (2K) = 3.69 MP β†’ **Auto-resized** - ⚠️ 3840Γ—2160 (4K) = 8.29 MP β†’ **Auto-resized** **Processing Pipeline:** ``` 1. Check image dimensions 2. If > 3.2 MP: Resize to fit (preserve aspect ratio) 3. Convert to Base64 Data URL 4. Send to Qwen2.5-VL 5. Receive text description 6. Embed description with metadata 7. Store in Neo4j ``` **Why No Chunking:** - βœ… **Dynamic Resolution ViT**: Auto-segments into 14Γ—14 patches - βœ… **MRoPE**: Preserves spatial relationships across entire image - βœ… **Single API call**: Faster, simpler, more reliable - βœ… **Semantic search**: Needs "gist", not pixel-perfect detail - βœ… **Negligible resize time**: ~50-300ms vs ~12-35s VL processing **Configuration:** ```bash MIMIR_IMAGE_MAX_PIXELS=3211264 # Qwen2.5-VL limit MIMIR_IMAGE_TARGET_SIZE=1536 # Conservative resize target MIMIR_IMAGE_RESIZE_QUALITY=90 # JPEG quality after resize ``` ## πŸ” Security - Models run locally (no external API calls) - No API key required - Images are processed locally - Descriptions stored in Neo4j ## πŸ“š Related Documentation - [Metadata Enriched Embeddings](../guides/METADATA_ENRICHED_EMBEDDINGS.md) - [Qwen3-VL README](../../docker/llama-cpp/README-QWEN-VL.md) - [Environment Variables](../../env.example) ## πŸ”— External Resources - [Qwen3-VL Model Card](https://huggingface.co/Qwen/Qwen3-VL) - [llama.cpp GitHub](https://github.com/ggerganov/llama.cpp) - [GGUF Format Spec](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) ## πŸ“ License Qwen3-VL models are licensed under **Apache 2.0** - compatible with MIT projects.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server