Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
VISION_PIPELINE_PROPOSAL.md63.8 kB
# Vision-Language Pipeline Proposal for NornicDB **Status:** PROPOSAL **Version:** 1.0.0 **Date:** December 2024 **Author:** Architecture Review --- ## Executive Summary This proposal outlines the addition of a **Vision-Language (VL) model** as a third model slot in NornicDB's model stack, enabling automatic image understanding and semantic search across image content. ### What We're Building Adding a VL model creates a powerful image understanding pipeline: - Detect nodes with `:Image` label or image properties - Scale images to ≤3.2MP (like Mimir does for multimodal) - Run through VL model (Qwen2.5-VL-2B) to get text description - Combine description with node properties - Generate text embedding using existing BGE-M3 - Store embedding for semantic search --- ## 1. Architecture Overview ### Current Model Stack (2 slots) ``` ┌──────────────┐ ┌──────────────┐ │ Embedding │ │ Reasoning │ │ Model │ │ SLM │ │ (BGE-M3) │ │ (Heimdall) │ └──────────────┘ └──────────────┘ ``` ### Proposed Model Stack (3 slots) ``` ┌─────────────────────────────────────────────────────────────────┐ │ NornicDB Model Stack │ ├─────────────────────────────────────────────────────────────────┤ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ Embedding │ │ Reasoning │ │ Vision-Language │ │ │ │ Model │ │ SLM │ │ Model │ │ │ │ (BGE-M3) │ │ (Heimdall) │ │ (Qwen2.5-VL) │ │ │ │ 1024 dims │ │ 0.5B-3B │ │ 2B-7B │ │ │ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │ │ │ │ │ │ │ └───────────────────┴──────────────────────┘ │ │ │ │ │ ┌────────▼────────┐ │ │ │ Model Manager │ │ │ │ (3 slots now) │ │ │ └─────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## 2. Image Processing Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ CREATE (n:Image {data: $base64, filename: 'photo.jpg'}) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Node Detector │ │ └─ Is label :Image? Or has image_data/image_url property? │ │ └─ YES → Route to Vision Pipeline │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Image Preprocessor │ │ └─ Decode base64 or fetch URL │ │ └─ Scale to ≤3.2MP (preserve aspect ratio) │ │ └─ Convert to RGB if needed │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ VL Model (Qwen2.5-VL-2B) │ │ └─ Input: scaled image + prompt │ │ └─ Output: text description │ │ "A sunset over mountains with orange and purple clouds..." │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Text Combiner │ │ └─ description + node.filename + node.tags + node.caption │ │ └─ Result: "Image: sunset over mountains... filename: photo... │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Text Embedder (BGE-M3) │ │ └─ Generate 1024-dim embedding from combined text │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Store: node._embedding = [0.123, 0.456, ...] │ │ Store: node._vl_description = "A sunset over mountains..." │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## 3. Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `NORNICDB_VISION_ENABLED` | `false` | Enable vision pipeline | | `NORNICDB_VISION_MODEL` | `qwen2.5-vl-2b-instruct` | VL model to use | | `NORNICDB_VISION_GPU_LAYERS` | `-1` | GPU layers (-1 = auto) | | `NORNICDB_VISION_MAX_PIXELS` | `3200000` | Max pixels before scaling (3.2MP) | | `NORNICDB_VISION_PROMPT` | (see below) | Custom prompt for VL | ### Default Vision Prompt ``` Describe this image in detail, including objects, colors, composition, and any text visible. ``` ### Go Configuration Types ```go // pkg/config/features.go type FeatureFlags struct { // ... existing fields ... // Vision-Language Model VisionEnabled bool `json:"vision_enabled" env:"NORNICDB_VISION_ENABLED"` VisionModel string `json:"vision_model" env:"NORNICDB_VISION_MODEL"` VisionGPULayers int `json:"vision_gpu_layers" env:"NORNICDB_VISION_GPU_LAYERS"` VisionMaxPixels int `json:"vision_max_pixels" env:"NORNICDB_VISION_MAX_PIXELS"` // Default: 3200000 (3.2MP) VisionPrompt string `json:"vision_prompt" env:"NORNICDB_VISION_PROMPT"` // Custom prompt for VL } // Defaults const ( DefaultVisionModel = "qwen2.5-vl-2b-instruct" DefaultVisionMaxPixels = 3200000 // 3.2MP DefaultVisionPrompt = "Describe this image in detail, including objects, colors, composition, and any text visible." ) ``` --- ## 4. Node Detection Strategy ### Detection Logic Nodes are processed by the vision pipeline if they match ANY of these criteria: 1. **Labels**: `:Image`, `:Photo`, `:Picture` 2. **Properties**: `image_data`, `image_url`, `base64` 3. **Filename extension**: `.jpg`, `.jpeg`, `.png`, `.gif`, `.webp`, `.bmp` ### Implementation ```go // pkg/vision/detector.go package vision import ( "path/filepath" "strings" "github.com/orneryd/nornicdb/pkg/storage" ) // IsImageNode checks if a node should be processed by the vision pipeline. // Checks both labels and properties. func IsImageNode(node *storage.Node) bool { // Check labels for _, label := range node.Labels { if label == "Image" || label == "Photo" || label == "Picture" { return true } } // Check for image data properties if _, hasData := node.Properties["image_data"]; hasData { return true } if _, hasURL := node.Properties["image_url"]; hasURL { return true } if _, hasBase64 := node.Properties["base64"]; hasBase64 { return true } // Check for common image extensions in filename if filename, ok := node.Properties["filename"].(string); ok { ext := strings.ToLower(filepath.Ext(filename)) switch ext { case ".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp": return true } } return false } // ImageNodeLabels returns all labels that trigger vision processing. func ImageNodeLabels() []string { return []string{"Image", "Photo", "Picture"} } // ImageNodeProperties returns all property names that trigger vision processing. func ImageNodeProperties() []string { return []string{"image_data", "image_url", "base64"} } // ImageExtensions returns all file extensions that trigger vision processing. func ImageExtensions() []string { return []string{".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp"} } ``` --- ## 5. Types and Interfaces ```go // pkg/vision/types.go package vision import ( "context" "time" ) // Config for the vision pipeline. type Config struct { // Enabled activates the vision pipeline Enabled bool // Model is the VL model name (without .gguf extension) Model string // ModelsDir is the directory containing GGUF models // Uses NORNICDB_MODELS_DIR (same as embedder and Heimdall) ModelsDir string // GPULayers controls GPU offloading (-1 = auto) GPULayers int // MaxPixels is the maximum pixels before scaling (default: 3.2MP) MaxPixels int // Prompt is sent with the image to the VL model Prompt string } // DefaultConfig returns sensible defaults for the vision pipeline. func DefaultConfig() Config { return Config{ Enabled: false, Model: "qwen2.5-vl-2b-instruct", ModelsDir: "", // Use NORNICDB_MODELS_DIR GPULayers: -1, // Auto MaxPixels: 3200000, // 3.2MP Prompt: "Describe this image in detail, including objects, colors, composition, and any text visible.", } } // ImageInput represents an image to be processed. type ImageInput struct { // Data is the raw image bytes (decoded from base64 or fetched from URL) Data []byte // MimeType identifies the image format ("image/jpeg", "image/png", etc.) MimeType string // Width is the original image width in pixels Width int // Height is the original image height in pixels Height int // Source describes where the image came from (for logging) Source string // "base64", "url", "file" } // VisionResult contains the VL model output. type VisionResult struct { // Description is the generated text describing the image Description string // Duration is the processing time Duration time.Duration // Scaled indicates whether the image was scaled down Scaled bool // FinalWidth is the width after scaling (same as original if not scaled) FinalWidth int // FinalHeight is the height after scaling (same as original if not scaled) FinalHeight int // OriginalWidth is the original image width OriginalWidth int // OriginalHeight is the original image height OriginalHeight int } // VisionGenerator interface for VL models. type VisionGenerator interface { // DescribeImage generates a text description of an image. // The prompt guides what aspects of the image to describe. DescribeImage(ctx context.Context, img *ImageInput, prompt string) (*VisionResult, error) // ModelInfo returns information about the loaded model. ModelInfo() ModelInfo // Close releases model resources. Close() error } // ModelInfo contains metadata about the loaded VL model. type ModelInfo struct { Name string Path string SizeBytes int64 GPULayers int LoadedAt time.Time } // ImageProcessor handles image scaling and format conversion. type ImageProcessor interface { // Scale resizes an image to fit within maxPixels while preserving aspect ratio. Scale(img *ImageInput, maxPixels int) (*ImageInput, error) // Decode parses image bytes and returns dimensions. Decode(data []byte) (*ImageInput, error) // SupportedFormats returns the list of supported MIME types. SupportedFormats() []string } ``` --- ## 6. Integration with Embedding Pipeline ```go // pkg/embed/embedder.go - Modified to support vision package embed import ( "context" "fmt" "strings" "github.com/orneryd/nornicdb/pkg/storage" "github.com/orneryd/nornicdb/pkg/vision" ) // Embedder generates embeddings for nodes. type Embedder struct { // ... existing fields ... // Vision support visionEnabled bool visionConfig vision.Config visionGen vision.VisionGenerator imgProcessor vision.ImageProcessor } // GenerateNodeEmbedding creates an embedding for a node. // Automatically detects image nodes and routes them through the vision pipeline. func (e *Embedder) GenerateNodeEmbedding(ctx context.Context, node *storage.Node) ([]float32, error) { // Check if this is an image node if vision.IsImageNode(node) && e.visionEnabled { return e.generateImageEmbedding(ctx, node) } // Standard text embedding return e.generateTextEmbedding(ctx, node) } // generateImageEmbedding processes an image node through the vision pipeline. func (e *Embedder) generateImageEmbedding(ctx context.Context, node *storage.Node) ([]float32, error) { // 1. Extract image data from node imgData, mimeType, err := e.extractImageData(node) if err != nil { return nil, fmt.Errorf("failed to extract image: %w", err) } // 2. Create ImageInput img := &vision.ImageInput{ Data: imgData, MimeType: mimeType, } // 3. Decode to get dimensions img, err = e.imgProcessor.Decode(imgData) if err != nil { return nil, fmt.Errorf("failed to decode image: %w", err) } // 4. Scale image if needed if img.Width*img.Height > e.visionConfig.MaxPixels { img, err = e.imgProcessor.Scale(img, e.visionConfig.MaxPixels) if err != nil { return nil, fmt.Errorf("failed to scale image: %w", err) } } // 5. Get description from VL model result, err := e.visionGen.DescribeImage(ctx, img, e.visionConfig.Prompt) if err != nil { return nil, fmt.Errorf("vision model failed: %w", err) } // 6. Combine description with node properties combinedText := e.combineImageContext(result.Description, node) // 7. Store description on node for reference node.Properties["_vl_description"] = result.Description node.Properties["_vl_processed"] = true // 8. Generate text embedding from combined context return e.generateTextEmbeddingFromString(ctx, combinedText) } // extractImageData gets image bytes from a node's properties. func (e *Embedder) extractImageData(node *storage.Node) ([]byte, string, error) { // Try base64 encoded data if data, ok := node.Properties["image_data"].(string); ok { decoded, err := base64.StdEncoding.DecodeString(data) if err != nil { return nil, "", fmt.Errorf("invalid base64: %w", err) } mimeType := detectMimeType(decoded) return decoded, mimeType, nil } // Try base64 property if data, ok := node.Properties["base64"].(string); ok { decoded, err := base64.StdEncoding.DecodeString(data) if err != nil { return nil, "", fmt.Errorf("invalid base64: %w", err) } mimeType := detectMimeType(decoded) return decoded, mimeType, nil } // Try URL if url, ok := node.Properties["image_url"].(string); ok { // Fetch from URL (with timeout) data, mimeType, err := e.fetchImageFromURL(url) if err != nil { return nil, "", fmt.Errorf("failed to fetch image: %w", err) } return data, mimeType, nil } return nil, "", fmt.Errorf("no image data found in node properties") } // combineImageContext merges the VL description with node properties. func (e *Embedder) combineImageContext(description string, node *storage.Node) string { var parts []string // Add VL description first (most important) parts = append(parts, "Image description: "+description) // Add filename if present if filename, ok := node.Properties["filename"].(string); ok { parts = append(parts, "Filename: "+filename) } // Add user-provided caption if present if caption, ok := node.Properties["caption"].(string); ok { parts = append(parts, "Caption: "+caption) } // Add alt text if present if alt, ok := node.Properties["alt"].(string); ok { parts = append(parts, "Alt text: "+alt) } // Add tags if present if tags, ok := node.Properties["tags"].([]interface{}); ok { tagStrs := make([]string, len(tags)) for i, t := range tags { tagStrs[i] = fmt.Sprint(t) } parts = append(parts, "Tags: "+strings.Join(tagStrs, ", ")) } // Add title if present if title, ok := node.Properties["title"].(string); ok { parts = append(parts, "Title: "+title) } return strings.Join(parts, "\n") } // detectMimeType identifies the image format from magic bytes. func detectMimeType(data []byte) string { if len(data) < 4 { return "application/octet-stream" } // Check magic bytes switch { case data[0] == 0xFF && data[1] == 0xD8: return "image/jpeg" case data[0] == 0x89 && data[1] == 0x50 && data[2] == 0x4E && data[3] == 0x47: return "image/png" case data[0] == 0x47 && data[1] == 0x49 && data[2] == 0x46: return "image/gif" case data[0] == 0x52 && data[1] == 0x49 && data[2] == 0x46 && data[3] == 0x46: return "image/webp" case data[0] == 0x42 && data[1] == 0x4D: return "image/bmp" default: return "application/octet-stream" } } ``` --- ## 7. Image Scaling Implementation ```go // pkg/vision/scaler.go package vision import ( "bytes" "fmt" "image" "image/jpeg" "image/png" "math" // For additional format support _ "image/gif" _ "golang.org/x/image/webp" ) // StandardImageProcessor implements ImageProcessor using Go's image package. type StandardImageProcessor struct{} // NewImageProcessor creates a new image processor. func NewImageProcessor() *StandardImageProcessor { return &StandardImageProcessor{} } // Decode parses image bytes and returns an ImageInput with dimensions. func (p *StandardImageProcessor) Decode(data []byte) (*ImageInput, error) { reader := bytes.NewReader(data) cfg, format, err := image.DecodeConfig(reader) if err != nil { return nil, fmt.Errorf("failed to decode image config: %w", err) } mimeType := "image/" + format return &ImageInput{ Data: data, MimeType: mimeType, Width: cfg.Width, Height: cfg.Height, Source: "decoded", }, nil } // Scale resizes an image to fit within maxPixels while preserving aspect ratio. func (p *StandardImageProcessor) Scale(img *ImageInput, maxPixels int) (*ImageInput, error) { currentPixels := img.Width * img.Height if currentPixels <= maxPixels { // No scaling needed return img, nil } // Calculate scale factor scaleFactor := math.Sqrt(float64(maxPixels) / float64(currentPixels)) newWidth := int(float64(img.Width) * scaleFactor) newHeight := int(float64(img.Height) * scaleFactor) // Decode original image reader := bytes.NewReader(img.Data) original, format, err := image.Decode(reader) if err != nil { return nil, fmt.Errorf("failed to decode image: %w", err) } // Create scaled image using simple bilinear interpolation scaled := image.NewRGBA(image.Rect(0, 0, newWidth, newHeight)) // Simple scaling (could use more sophisticated algorithms) for y := 0; y < newHeight; y++ { for x := 0; x < newWidth; x++ { srcX := int(float64(x) / scaleFactor) srcY := int(float64(y) / scaleFactor) scaled.Set(x, y, original.At(srcX, srcY)) } } // Encode back to bytes var buf bytes.Buffer switch format { case "jpeg": err = jpeg.Encode(&buf, scaled, &jpeg.Options{Quality: 85}) case "png": err = png.Encode(&buf, scaled) default: // Default to JPEG for other formats err = jpeg.Encode(&buf, scaled, &jpeg.Options{Quality: 85}) } if err != nil { return nil, fmt.Errorf("failed to encode scaled image: %w", err) } return &ImageInput{ Data: buf.Bytes(), MimeType: img.MimeType, Width: newWidth, Height: newHeight, Source: "scaled", }, nil } // SupportedFormats returns the list of supported MIME types. func (p *StandardImageProcessor) SupportedFormats() []string { return []string{ "image/jpeg", "image/png", "image/gif", "image/webp", "image/bmp", } } ``` --- ## 8. VL Model Integration with llama.cpp ```go // pkg/vision/llama_vision.go package vision import ( "context" "fmt" "log" "os" "path/filepath" "time" "github.com/orneryd/nornicdb/pkg/localllm" ) // LlamaVisionGenerator implements VisionGenerator using llama.cpp. type LlamaVisionGenerator struct { model *localllm.Model modelInfo ModelInfo config Config } // NewLlamaVisionGenerator creates a new VL generator. func NewLlamaVisionGenerator(cfg Config) (*LlamaVisionGenerator, error) { // Find model file modelPath := filepath.Join(cfg.ModelsDir, cfg.Model+".gguf") if _, err := os.Stat(modelPath); os.IsNotExist(err) { return nil, fmt.Errorf("vision model not found: %s", modelPath) } // Load model via llama.cpp // Note: This requires llama.cpp with vision support (LLaVA architecture) model, err := localllm.LoadModel(modelPath, localllm.ModelOptions{ GPULayers: cfg.GPULayers, Threads: 4, Vision: true, // Enable vision mode }) if err != nil { return nil, fmt.Errorf("failed to load vision model: %w", err) } fileInfo, _ := os.Stat(modelPath) return &LlamaVisionGenerator{ model: model, config: cfg, modelInfo: ModelInfo{ Name: cfg.Model, Path: modelPath, SizeBytes: fileInfo.Size(), GPULayers: cfg.GPULayers, LoadedAt: time.Now(), }, }, nil } // DescribeImage generates a text description of an image. func (g *LlamaVisionGenerator) DescribeImage(ctx context.Context, img *ImageInput, prompt string) (*VisionResult, error) { start := time.Now() // Format prompt for vision model // Most VL models expect: <image>\n{prompt} fullPrompt := fmt.Sprintf("<image>\n%s", prompt) // Run inference with image response, err := g.model.GenerateWithImage(ctx, fullPrompt, img.Data, localllm.GenerateOptions{ MaxTokens: 512, Temperature: 0.1, StopTokens: []string{"<|endoftext|>", "<|im_end|>"}, }) if err != nil { return nil, fmt.Errorf("vision inference failed: %w", err) } return &VisionResult{ Description: response, Duration: time.Since(start), Scaled: img.Source == "scaled", FinalWidth: img.Width, FinalHeight: img.Height, OriginalWidth: img.Width, // Would need to track this separately OriginalHeight: img.Height, }, nil } // ModelInfo returns information about the loaded model. func (g *LlamaVisionGenerator) ModelInfo() ModelInfo { return g.modelInfo } // Close releases model resources. func (g *LlamaVisionGenerator) Close() error { if g.model != nil { return g.model.Close() } return nil } ``` --- ## 9. Docker Configuration ### New Build Target ```dockerfile # docker/Dockerfile.arm64-metal-bge-heimdall-vision FROM timothyswt/nornicdb-arm64-metal-bge-heimdall:latest # Add vision model # Qwen2.5-VL-2B is ~2GB COPY models/qwen2.5-vl-2b-instruct.gguf /app/models/ # Enable vision by default ENV NORNICDB_VISION_ENABLED=true ENV NORNICDB_VISION_MODEL=qwen2.5-vl-2b-instruct ENV NORNICDB_VISION_MAX_PIXELS=3200000 # Total image size: ~3.7GB (1.1GB base + 2.6GB VL model) ``` ### Docker Compose ```yaml # docker-compose.vision.yml version: '3.8' services: nornicdb-vision: image: timothyswt/nornicdb-arm64-metal-bge-heimdall-vision:latest ports: - "7474:7474" - "7687:7687" volumes: - nornicdb-data:/data - ./custom-models:/app/models # For BYOM environment: NORNICDB_HEIMDALL_ENABLED: "true" NORNICDB_VISION_ENABLED: "true" NORNICDB_VISION_MODEL: "qwen2.5-vl-2b-instruct" deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] volumes: nornicdb-data: ``` --- ## 10. Usage Examples ### Creating Image Nodes ```cypher // Create an image node with base64 data CREATE (img:Image { filename: 'vacation.jpg', image_data: $base64Data, caption: 'Beach sunset in Hawaii', tags: ['vacation', 'beach', 'sunset'] }) // The vision pipeline automatically: // 1. Detects :Image label // 2. Scales image to ≤3.2MP if needed // 3. Generates VL description: "A stunning sunset over a tropical beach..." // 4. Combines description with properties // 5. Generates text embedding // 6. Stores _vl_description and _embedding on node ``` ### Creating Image Nodes from URLs ```cypher // Create an image node from URL CREATE (img:Image { image_url: 'https://example.com/photo.jpg', title: 'Product Photo', alt: 'Red sneakers on white background' }) ``` ### Semantic Search on Images ```cypher // Find images similar to a text query // First, embed the query text CALL db.index.vector.queryNodes('images', 10, $queryEmbedding) YIELD node, score RETURN node.filename, node._vl_description, score ORDER BY score DESC ``` ### Querying VL Descriptions ```cypher // Find images by their generated descriptions MATCH (img:Image) WHERE img._vl_description CONTAINS 'sunset' RETURN img.filename, img._vl_description ``` ### Mixed Content Search ```cypher // Search across images and text content together CALL db.index.vector.queryNodes('content', 20, $queryEmbedding) YIELD node, score RETURN CASE WHEN 'Image' IN labels(node) THEN 'IMAGE' ELSE 'TEXT' END as type, node.filename, node.content, node._vl_description, score ORDER BY score DESC ``` --- ## 11. Model Recommendations ### Recommended VL Models | Model | Size | Quality | Speed | Use Case | |-------|------|---------|-------|----------| | `qwen2.5-vl-2b-instruct` | ~2 GB | Good | Fast | **Recommended** - balanced | | `qwen2.5-vl-7b-instruct` | ~7 GB | Better | Slower | Higher quality descriptions | | `llava-v1.6-mistral-7b` | ~7 GB | Good | Medium | Alternative option | | `moondream2` | ~1.5 GB | Basic | Fast | Lightweight option | | `bakllava-1` | ~4 GB | Good | Medium | Good balance | ### Download Commands ```bash # Qwen2.5-VL-2B (Recommended) curl -L -o models/qwen2.5-vl-2b-instruct.gguf \ "https://huggingface.co/Qwen/Qwen2.5-VL-2B-Instruct-GGUF/resolve/main/qwen2.5-vl-2b-instruct-q4_k_m.gguf" # Qwen2.5-VL-7B (Higher quality) curl -L -o models/qwen2.5-vl-7b-instruct.gguf \ "https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct-GGUF/resolve/main/qwen2.5-vl-7b-instruct-q4_k_m.gguf" # MoonDream2 (Lightweight) curl -L -o models/moondream2.gguf \ "https://huggingface.co/vikhyatk/moondream2/resolve/main/moondream2-gguf/moondream2-q4_k_m.gguf" ``` ### Quantization Options | Quantization | Quality | Size | Speed | |--------------|---------|------|-------| | `q4_k_m` | Good | ~40% | Fast | **Recommended** | | `q5_k_m` | Better | ~50% | Medium | | `q8_0` | Best | ~80% | Slower | | `f16` | Original | 100% | Slowest | --- ## 12. BYOM (Bring Your Own Model) ### Custom Model Setup ```bash # 1. Download or train your VL model in GGUF format # 2. Place in models directory cp my-custom-vl-model.gguf /path/to/models/ # 3. Configure NornicDB export NORNICDB_VISION_MODEL=my-custom-vl-model # 4. Optionally customize the prompt export NORNICDB_VISION_PROMPT="Describe this image focusing on: objects, text, colors, and mood." ``` ### Docker with Custom Model ```bash docker run -d \ -p 7474:7474 \ -p 7687:7687 \ -v nornicdb-data:/data \ -v /path/to/models:/app/models \ -e NORNICDB_VISION_ENABLED=true \ -e NORNICDB_VISION_MODEL=my-custom-vl-model \ timothyswt/nornicdb-arm64-metal-bge-heimdall ``` --- ## 13. Performance Considerations ### Memory Requirements | Model | VRAM (GPU) | RAM (CPU fallback) | |-------|------------|-------------------| | qwen2.5-vl-2b | ~3 GB | ~4 GB | | qwen2.5-vl-7b | ~8 GB | ~10 GB | | moondream2 | ~2 GB | ~3 GB | ### Processing Time | Image Size | Scale Time | VL Inference | Embedding | Total | |------------|------------|--------------|-----------|-------| | 1MP | 0ms | ~500ms | ~50ms | ~550ms | | 3.2MP | 0ms | ~600ms | ~50ms | ~650ms | | 12MP | ~100ms | ~600ms | ~50ms | ~750ms | | 48MP | ~200ms | ~600ms | ~50ms | ~850ms | ### Optimization Tips 1. **Use GPU acceleration**: Set `NORNICDB_VISION_GPU_LAYERS=-1` for auto 2. **Batch processing**: Process multiple images in parallel 3. **Pre-scale images**: If you control input, scale before storing 4. **Use smaller models**: moondream2 is 3x faster than qwen2.5-vl-7b 5. **Cache descriptions**: `_vl_description` is stored, no re-processing needed --- ## 14. Multi-Model Memory Management Strategy ### The Problem Running 3 models simultaneously is memory-intensive: | Model | VRAM | RAM (CPU) | |-------|------|-----------| | BGE-M3 (Embedding) | ~1 GB | ~1.5 GB | | Qwen2.5-0.5B (Heimdall) | ~1 GB | ~1.5 GB | | Qwen2.5-VL-2B (Vision) | ~3 GB | ~4 GB | | **Total (all loaded)** | **~5 GB** | **~7 GB** | With larger models: | Model | VRAM | RAM (CPU) | |-------|------|-----------| | BGE-M3 (Embedding) | ~1 GB | ~1.5 GB | | Qwen2.5-3B (Heimdall) | ~4 GB | ~5 GB | | Qwen2.5-VL-7B (Vision) | ~8 GB | ~10 GB | | **Total (all loaded)** | **~13 GB** | **~16.5 GB** | Most systems can't afford to keep all models loaded simultaneously. ### Solution: Adaptive Model Lifecycle Manager ``` ┌─────────────────────────────────────────────────────────────────┐ │ Model Lifecycle Manager │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Memory Budget Controller │ │ │ │ └─ Max VRAM: 8GB └─ Max RAM: 12GB │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌──────────────────┼──────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ │ │ Embedding │ │ Heimdall │ │ Vision │ │ │ │ (Priority 1) │ │ (Priority 2) │ │ (Priority 3) │ │ │ │ ALWAYS HOT │ │ WARM/COLD │ │ COLD/UNLOAD │ │ │ └────────────────┘ └────────────────┘ └────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ LRU Eviction Queue │ │ │ │ [Vision: 5min idle] → [Heimdall: 2min idle] │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Model Priority Levels | Priority | Model | Behavior | Rationale | |----------|-------|----------|-----------| | **1 (Highest)** | Embedding (BGE-M3) | Always loaded | Used for every node creation, query, search | | **2 (Medium)** | Heimdall SLM | Load on demand, keep warm | Used for chat, less frequent than embeddings | | **3 (Lowest)** | Vision VL | Load on demand, unload quickly | Only for image nodes, most memory-intensive | ### Configuration ```go // pkg/models/lifecycle.go type LifecycleConfig struct { // Memory budgets MaxVRAM int64 `env:"NORNICDB_MAX_VRAM"` // Max GPU memory (bytes), 0 = unlimited MaxRAM int64 `env:"NORNICDB_MAX_RAM"` // Max CPU memory (bytes), 0 = unlimited // Keep-alive durations (how long to keep model loaded after last use) EmbeddingKeepAlive time.Duration `env:"NORNICDB_EMBEDDING_KEEPALIVE"` // Default: forever (0) HeimdallKeepAlive time.Duration `env:"NORNICDB_HEIMDALL_KEEPALIVE"` // Default: 5 minutes VisionKeepAlive time.Duration `env:"NORNICDB_VISION_KEEPALIVE"` // Default: 2 minutes // Preloading PreloadEmbedding bool `env:"NORNICDB_PRELOAD_EMBEDDING"` // Default: true PreloadHeimdall bool `env:"NORNICDB_PRELOAD_HEIMDALL"` // Default: false PreloadVision bool `env:"NORNICDB_PRELOAD_VISION"` // Default: false // Concurrent model limit (for memory-constrained systems) MaxConcurrentModels int `env:"NORNICDB_MAX_CONCURRENT_MODELS"` // Default: 3 } // Defaults optimized for 8GB systems func DefaultLifecycleConfig() LifecycleConfig { return LifecycleConfig{ MaxVRAM: 0, // Unlimited (auto-detect) MaxRAM: 0, // Unlimited (auto-detect) EmbeddingKeepAlive: 0, // Never unload HeimdallKeepAlive: 5 * time.Minute, VisionKeepAlive: 2 * time.Minute, PreloadEmbedding: true, PreloadHeimdall: false, PreloadVision: false, MaxConcurrentModels: 3, } } ``` ### Environment Variable Examples ```bash # Memory-constrained system (8GB total) export NORNICDB_MAX_VRAM=4294967296 # 4GB VRAM limit export NORNICDB_MAX_RAM=6442450944 # 6GB RAM limit export NORNICDB_HEIMDALL_KEEPALIVE=2m # Unload Heimdall after 2 min idle export NORNICDB_VISION_KEEPALIVE=30s # Unload Vision after 30 sec export NORNICDB_MAX_CONCURRENT_MODELS=2 # Only 2 models at once # High-memory system (32GB+) export NORNICDB_PRELOAD_HEIMDALL=true # Keep Heimdall always loaded export NORNICDB_PRELOAD_VISION=true # Keep Vision always loaded export NORNICDB_HEIMDALL_KEEPALIVE=0 # Never unload export NORNICDB_VISION_KEEPALIVE=0 # Never unload # Embedding-only mode (minimal memory) export NORNICDB_HEIMDALL_ENABLED=false # Disable Heimdall export NORNICDB_VISION_ENABLED=false # Disable Vision # Only embedding model loaded (~1.5GB) ``` ### Model States ```go type ModelState string const ( ModelStateUnloaded ModelState = "unloaded" // Not in memory ModelStateLoading ModelState = "loading" // Currently loading ModelStateHot ModelState = "hot" // Loaded, recently used ModelStateWarm ModelState = "warm" // Loaded, idle but within keep-alive ModelStateCold ModelState = "cold" // Loaded, past keep-alive, candidate for eviction ModelStateEvicting ModelState = "evicting" // Being unloaded ) ``` ### Lifecycle Manager Interface ```go // pkg/models/manager.go type ModelManager interface { // Acquire gets a model, loading it if necessary. // Blocks until model is ready or context is cancelled. Acquire(ctx context.Context, modelType ModelType) (Model, error) // Release signals that the caller is done with the model. // Model may be kept warm or scheduled for eviction. Release(modelType ModelType) // Preload loads a model without using it (for startup). Preload(ctx context.Context, modelType ModelType) error // Evict forces a model to unload immediately. Evict(modelType ModelType) error // Status returns the current state of all models. Status() map[ModelType]ModelStatus // MemoryUsage returns current memory consumption. MemoryUsage() MemoryStats } type ModelStatus struct { State ModelState LoadedAt time.Time LastUsedAt time.Time UseCount int64 MemoryVRAM int64 MemoryRAM int64 } type MemoryStats struct { TotalVRAM int64 UsedVRAM int64 AvailableVRAM int64 TotalRAM int64 UsedRAM int64 AvailableRAM int64 LoadedModels []ModelType } ``` ### Eviction Algorithm ```go // pkg/models/eviction.go // EvictIfNeeded checks memory budget and evicts models if necessary. // Uses priority-based LRU eviction. func (m *Manager) EvictIfNeeded(requiredVRAM, requiredRAM int64) error { stats := m.MemoryUsage() // Check if we have enough memory vramNeeded := (stats.UsedVRAM + requiredVRAM) - m.config.MaxVRAM ramNeeded := (stats.UsedRAM + requiredRAM) - m.config.MaxRAM if vramNeeded <= 0 && ramNeeded <= 0 { return nil // No eviction needed } // Build eviction candidates (sorted by priority, then LRU) candidates := m.getEvictionCandidates() for _, candidate := range candidates { if vramNeeded <= 0 && ramNeeded <= 0 { break } // Don't evict embedding model (priority 1) if candidate.Type == ModelTypeEmbedding { continue } // Don't evict models currently in use if candidate.InUse { continue } // Evict this model if err := m.evictModel(candidate.Type); err != nil { log.Printf("[ModelManager] Failed to evict %s: %v", candidate.Type, err) continue } vramNeeded -= candidate.MemoryVRAM ramNeeded -= candidate.MemoryRAM log.Printf("[ModelManager] Evicted %s to free memory (VRAM: %d MB, RAM: %d MB)", candidate.Type, candidate.MemoryVRAM / 1024 / 1024, candidate.MemoryRAM / 1024 / 1024) } if vramNeeded > 0 || ramNeeded > 0 { return fmt.Errorf("unable to free enough memory: need VRAM=%d MB, RAM=%d MB", vramNeeded / 1024 / 1024, ramNeeded / 1024 / 1024) } return nil } // getEvictionCandidates returns models sorted by eviction priority. // Lower priority + older last use = evicted first. func (m *Manager) getEvictionCandidates() []EvictionCandidate { var candidates []EvictionCandidate for modelType, status := range m.Status() { if status.State == ModelStateUnloaded { continue } candidates = append(candidates, EvictionCandidate{ Type: modelType, Priority: m.getPriority(modelType), LastUsed: status.LastUsedAt, InUse: status.UseCount > 0, MemoryVRAM: status.MemoryVRAM, MemoryRAM: status.MemoryRAM, }) } // Sort: lower priority first, then older last-use first sort.Slice(candidates, func(i, j int) bool { if candidates[i].Priority != candidates[j].Priority { return candidates[i].Priority > candidates[j].Priority // Higher number = lower priority } return candidates[i].LastUsed.Before(candidates[j].LastUsed) }) return candidates } ``` ### Keep-Alive Timer ```go // pkg/models/keepalive.go // startKeepAliveTimer starts a goroutine that monitors idle time. func (m *Manager) startKeepAliveTimer(modelType ModelType) { keepAlive := m.getKeepAlive(modelType) if keepAlive == 0 { return // Never evict } go func() { ticker := time.NewTicker(keepAlive / 2) defer ticker.Stop() for { select { case <-ticker.C: status := m.getStatus(modelType) if status.State == ModelStateUnloaded { return } idleTime := time.Since(status.LastUsedAt) if idleTime > keepAlive && status.UseCount == 0 { log.Printf("[ModelManager] %s idle for %v, evicting", modelType, idleTime) m.evictModel(modelType) return } case <-m.ctx.Done(): return } } }() } ``` ### Request Flow with Lifecycle Management ``` ┌─────────────────────────────────────────────────────────────────┐ │ User creates :Image node │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ModelManager.Acquire(Vision) │ │ └─ Vision model not loaded │ │ └─ Check memory budget: need 3GB VRAM │ │ └─ Current: Embedding(1GB) + Heimdall(1GB) = 2GB │ │ └─ Budget: 4GB → OK, load Vision │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Load Vision model (~2 seconds) │ │ └─ GPU memory: 3GB allocated │ │ └─ State: Hot │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Process image → Generate description → Generate embedding │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ModelManager.Release(Vision) │ │ └─ State: Hot → Warm │ │ └─ Start keep-alive timer (2 minutes) │ └─────────────────────────────────────────────────────────────────┘ │ (2 minutes pass, no more images) │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Keep-alive timer fires │ │ └─ Vision idle for 2m │ │ └─ Evict Vision model │ │ └─ Free 3GB VRAM │ │ └─ State: Unloaded │ └─────────────────────────────────────────────────────────────────┘ ``` ### Memory Profiles Pre-configured profiles for common system sizes: ```yaml # profiles.yaml profiles: # 8GB system (e.g., MacBook Air M1) constrained: max_vram: 4GB max_ram: 6GB max_concurrent_models: 2 embedding_keepalive: 0 # Always loaded heimdall_keepalive: 2m vision_keepalive: 30s preload_heimdall: false preload_vision: false # 16GB system (e.g., MacBook Pro M1) balanced: max_vram: 8GB max_ram: 12GB max_concurrent_models: 3 embedding_keepalive: 0 # Always loaded heimdall_keepalive: 5m vision_keepalive: 2m preload_heimdall: true preload_vision: false # 32GB+ system (e.g., Mac Studio) performance: max_vram: 0 # Unlimited max_ram: 0 # Unlimited max_concurrent_models: 3 embedding_keepalive: 0 # Always loaded heimdall_keepalive: 0 # Always loaded vision_keepalive: 0 # Always loaded preload_heimdall: true preload_vision: true # Usage: NORNICDB_MEMORY_PROFILE=balanced ``` ### Monitoring & Metrics ```go // Prometheus metrics var ( modelLoadTime = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "nornicdb_model_load_seconds", Help: "Time to load models", Buckets: []float64{0.5, 1, 2, 5, 10, 30}, }, []string{"model_type"}, ) modelMemoryUsage = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Name: "nornicdb_model_memory_bytes", Help: "Memory usage per model", }, []string{"model_type", "memory_type"}, // memory_type: vram, ram ) modelEvictions = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "nornicdb_model_evictions_total", Help: "Number of model evictions", }, []string{"model_type", "reason"}, // reason: idle, memory_pressure, manual ) modelState = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Name: "nornicdb_model_state", Help: "Current model state (0=unloaded, 1=loading, 2=hot, 3=warm, 4=cold)", }, []string{"model_type"}, ) ) ``` ### API for Model Management ```cypher // Check model status CALL db.models.status() YIELD model, state, memoryMB, lastUsed, useCount // Force load a model CALL db.models.load('vision') YIELD success, loadTimeMs // Force evict a model CALL db.models.evict('vision') YIELD success, freedMemoryMB // Get memory stats CALL db.models.memory() YIELD totalVRAM, usedVRAM, totalRAM, usedRAM, loadedModels ``` ```bash # HTTP API GET /api/models/status { "models": { "embedding": {"state": "hot", "memory_mb": 1024, "last_used": "2024-12-03T10:00:00Z"}, "heimdall": {"state": "warm", "memory_mb": 1536, "last_used": "2024-12-03T09:55:00Z"}, "vision": {"state": "unloaded", "memory_mb": 0, "last_used": null} }, "memory": { "vram_used_mb": 2560, "vram_total_mb": 8192, "ram_used_mb": 3072, "ram_total_mb": 16384 } } POST /api/models/evict {"model": "vision"} POST /api/models/load {"model": "heimdall"} ``` ### Startup Sequence ``` ┌─────────────────────────────────────────────────────────────────┐ │ NornicDB Startup │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 1. Initialize ModelManager │ │ └─ Detect available VRAM/RAM │ │ └─ Apply memory profile (constrained/balanced/performance) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 2. Preload Embedding Model (always) │ │ └─ Load BGE-M3 │ │ └─ State: Hot │ │ └─ Log: "✅ Embedding model ready (1024 MB)" │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 3. Conditionally Preload Heimdall │ │ └─ If NORNICDB_PRELOAD_HEIMDALL=true │ │ └─ Load Qwen2.5-0.5B │ │ └─ Log: "✅ Heimdall AI Assistant ready (512 MB)" │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 4. Conditionally Preload Vision │ │ └─ If NORNICDB_PRELOAD_VISION=true │ │ └─ Check memory budget first │ │ └─ Load Qwen2.5-VL-2B │ │ └─ Log: "✅ Vision pipeline ready (2048 MB)" │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 5. Ready for requests │ │ └─ Log: "🚀 NornicDB ready (models: embedding, heimdall)" │ │ └─ Log: " Vision: on-demand (2min keep-alive)" │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## 15. LLM Robustness & Error Handling ### Problem: CGO Crashes Running LLMs via `llama.cpp` CGO bindings can crash with: - `SIGABRT` during token decoding (invalid tokens, context overflow) - `SIGSEGV` from corrupted contexts or memory issues - Silent failures from GPU memory exhaustion ### Solution: Validation Layer in C We added comprehensive validation functions in the CGO code: ```c // Safe decode with validation (in llama.go CGO block) int safe_gen_decode(struct llama_context* ctx, struct llama_model* model, int32_t* tokens, int n_tokens, int start_pos, char* error_buf, int error_buf_size) { // 1. Null checks (context, model, tokens) // 2. Context size validation (tokens + pos < ctx_size) // 3. Token validation (0 <= token < vocab_size) // 4. Context health check (KV cache accessible) // 5. Protected decode with detailed error messages } int safe_gen_decode_token(struct llama_context* ctx, struct llama_model* model, int32_t token, int pos, char* error_buf, int error_buf_size) { // Same validations for single-token generation } ``` ### Error Codes | Code | Meaning | |------|---------| | -100 | Context is NULL | | -101 | Model is NULL | | -102 | Tokens pointer is NULL | | -103 | Invalid token count (n_tokens <= 0) | | -104 | Context overflow (pos + n > ctx_size) | | -105 | Invalid token (outside vocabulary) | | -106 | Context health check failed | | -107 | Batch allocation failed | ### Go-Side Error Handling ```go // GenerateStream uses safe decode with detailed error reporting func (g *GenerationModel) GenerateStream(ctx context.Context, prompt string, params GenerateParams, callback func(token string) error) error { // Pre-flight health check if g.model == nil || g.ctx == nil { return fmt.Errorf("model or context is nil - model may have been closed") } // Safe decode with error buffer errorBuf := make([]byte, 512) result := C.safe_gen_decode(g.ctx, g.model, tokens, n, 0, (*C.char)(unsafe.Pointer(&errorBuf[0])), C.int(len(errorBuf))) if result != 0 { errMsg := C.GoString((*C.char)(unsafe.Pointer(&errorBuf[0]))) return fmt.Errorf("prefill failed: %s (code=%d)", errMsg, result) } // Generation loop with per-token validation for i := 0; i < params.MaxTokens; i++ { token := C.sample_token(g.ctx, g.model, temp, top_p, top_k) result := C.safe_gen_decode_token(g.ctx, g.model, token, C.int(pos), (*C.char)(unsafe.Pointer(&errorBuf[0])), C.int(len(errorBuf))) if result != 0 { errMsg := C.GoString((*C.char)(unsafe.Pointer(&errorBuf[0]))) return fmt.Errorf("decode failed at position %d: %s (code=%d)", pos, errMsg, result) } } } ``` ### Benefits 1. **Crash Prevention**: Invalid inputs caught before they reach llama.cpp 2. **Detailed Errors**: Know exactly what failed and why 3. **Debug Info**: Error messages include actual values (token ID, vocab size, etc.) 4. **Graceful Degradation**: Errors return to Go instead of crashing the process ### Future Enhancements - Signal handling (`setjmp`/`longjmp`) to catch and recover from crashes - Memory pressure detection before allocation - Automatic context reset on recoverable errors - Prometheus metrics for decode failures by type --- ## 16. Implementation Roadmap ### Phase 1: Model Lifecycle Manager (Week 1-2) - [ ] Create `pkg/models` package - [ ] Implement ModelManager interface - [ ] Add memory detection (VRAM/RAM) - [ ] Implement acquire/release pattern - [ ] Add keep-alive timers - [ ] Implement priority-based LRU eviction - [ ] Add memory profiles (constrained/balanced/performance) - [ ] Prometheus metrics for model states - [ ] Unit tests for eviction algorithm ### Phase 2: Vision Foundation (Week 3-4) - [ ] Create `pkg/vision` package - [ ] Implement types and interfaces - [ ] Add configuration support - [ ] Implement node detection logic - [ ] Integrate with ModelManager ### Phase 3: Image Processing (Week 5-6) - [ ] Implement image decoder - [ ] Implement image scaler (bilinear + Lanczos options) - [ ] Add MIME type detection from magic bytes - [ ] Handle URL fetching with timeouts - [ ] Add image validation and security checks ### Phase 4: VL Integration (Week 7-8) - [ ] Extend llama.cpp bindings for vision (LLaVA architecture) - [ ] Implement LlamaVisionGenerator - [ ] Test with Qwen2.5-VL, MoonDream, LLaVA models - [ ] Integration with ModelManager for lifecycle - [ ] GPU memory tracking ### Phase 5: Embedding Pipeline Integration (Week 9-10) - [ ] Modify Embedder to detect image nodes - [ ] Implement vision pipeline routing - [ ] Implement context combination - [ ] Store `_vl_description` and `_vl_processed` properties - [ ] Add automatic embedding on node creation - [ ] Batch processing support ### Phase 6: API & Monitoring (Week 11-12) - [ ] Add Cypher procedures (db.vision.*, db.models.*) - [ ] Add HTTP API endpoints - [ ] Prometheus metrics for vision pipeline - [ ] Grafana dashboard templates - [ ] Memory usage alerts ### Phase 7: Docker & Documentation (Week 13-14) - [ ] Create Docker build target with VL model - [ ] Download and test recommended models - [ ] Write user documentation - [ ] Add examples and tutorials - [ ] Performance benchmarks - [ ] Memory profile recommendations --- ## 15. Security Considerations | Concern | Mitigation | |---------|------------| | Large image DoS | MaxPixels limit (3.2MP default) | | Malformed images | Strict image validation before processing | | URL fetching risks | Timeout limits, size limits, allowed hosts | | Model prompt injection | Sanitize image metadata before combining | | Resource exhaustion | Queue limits, concurrent processing limits | --- ## 16. Future Enhancements ### Short Term - [ ] Support for image URLs with authentication - [ ] Batch image processing API - [ ] Custom prompts per node label ### Medium Term - [ ] Multi-image nodes (galleries) - [ ] Video frame extraction - [ ] OCR-specific mode for documents - [ ] Image similarity search (CLIP-style) ### Long Term - [ ] Real-time image stream processing - [ ] Image generation integration - [ ] Visual question answering (VQA) - [ ] Image-to-graph extraction (scene graphs) --- ## 17. API Reference ### Cypher Procedures (Proposed) ```cypher // Process a single image node CALL db.vision.process(nodeId) YIELD description, embedding // Batch process all Image nodes CALL db.vision.processAll() YIELD processed, errors // Get vision pipeline status CALL db.vision.status() YIELD enabled, model, processed_count ``` ### HTTP Endpoints (Proposed) ```bash # Process an image directly POST /api/vision/describe Content-Type: multipart/form-data image: <binary> prompt: "Describe this image" # Response { "description": "A sunset over mountains...", "processing_time_ms": 523 } ``` --- **Document Version:** 1.0.0 **Last Updated:** December 2024 **Status:** PROPOSAL - Ready for implementation

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server