M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

Overview Schema Related Servers Score Discussions

Mimir
nornicdb
docs
architecture

COGNITIVE_SLM_PROPOSAL.md•57.8 KiB

# Cognitive SLM Integration Proposal for NornicDB **Status:** PROPOSAL **Version:** 1.0.0 **Date:** December 2024 **Author:** AI Architecture Review --- ## Executive Summary This proposal outlines the integration of a **Small Language Model (SLM)** directly into NornicDB's core engine, transforming it from a traditional graph database into a **Cognitive Graph Database** — a system with embedded reasoning capabilities for self-monitoring, self-healing, query optimization, and intelligent memory curation. ### What We're Building ``` ┌─────────────────────────────────────────────────────────────────┐ │ NornicDB Cognitive Engine │ ├─────────────────────────────────────────────────────────────────┤ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ Embedding │ │ Reasoning │ │ Graph Engine │ │ │ │ Model │ │ SLM │ │ (Storage + │ │ │ │ (BGE-M3) │ │ (Qwen2.5) │ │ Cypher) │ │ │ │ 1024 dims │ │ 0.5B-3B │ │ │ │ │ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │ │ │ │ │ │ │ └───────────────────┴──────────────────────┘ │ │ │ │ │ ┌────────▼────────┐ │ │ │ Model Manager │ │ │ │ (Scheduler + │ │ │ │ GPU Memory) │ │ │ └─────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## 1. Validation of Approach ### 1.1 Why This is Novel No production graph database embeds an SLM in the engine itself: - **Neo4j**: External LLM integration only (via plugins) - **TigerGraph**: No native LLM support - **Dgraph**: No LLM capabilities - **ArangoDB**: External AI services only NornicDB already has: - ✅ llama.cpp integration via CGO (`pkg/localllm`) - ✅ GPU acceleration (Metal/CUDA) - ✅ Embedding generation in-process - ✅ Inference engine for link prediction (`pkg/inference`) - ✅ Decay/memory management (`pkg/decay`) **We're 80% of the way there.** The missing piece is a reasoning model alongside the embedding model. ### 1.2 What the SLM Can Do TODAY (Safe + Practical) | Capability | Description | Input | Output | |------------|-------------|-------|--------| | **Anomaly Detection** | Detect weird node/edge patterns | Graph stats + topology summary | `{anomaly_type, severity, node_ids}` | | **Runtime Diagnosis** | Classify goroutine issues | Stack traces + metrics | `{diagnosis, action_id}` | | **Query Optimization** | Suggest index/rewrite | Query plan + stats | `{suggestion_type, details}` | | **Policy Enforcement** | Validate operations | Operation context | `{allow, reason}` | | **Semantic Dedup** | Identify duplicate nodes | Node pairs + embeddings | `{is_duplicate, confidence}` | | **Memory Curation** | Prioritize/summarize nodes | Node content + access patterns | `{summary, importance_score}` | ### 1.3 Safety Constraints (Non-Negotiable) ```go // ALL SLM outputs MUST map to predefined actions type ActionOpcode int const ( ActionNone ActionOpcode = iota ActionLogWarning ActionThrottleQuery ActionSuggestIndex ActionMergeNodes ActionRestartWorker ActionClearQueue // ... finite, enumerated set ) // SLM output schema - STRICT type SLMResponse struct { Action ActionOpcode `json:"action"` Confidence float64 `json:"confidence"` Reasoning string `json:"reasoning"` Params map[string]any `json:"params,omitempty"` } ``` **Never allow:** - ❌ Arbitrary code execution - ❌ Direct data modification without review - ❌ Security/access control decisions - ❌ Live storage engine changes --- ## 2. Model Selection: MIT/Apache Licensed Options ### 2.1 Recommended Models (Ranked by Suitability) | Model | Size | License | Strengths | Use Case | |-------|------|---------|-----------|----------| | **Qwen2.5-0.5B-Instruct** | 0.5B | Apache 2.0 | Excellent structured output, fast | Primary choice | | **Qwen2.5-1.5B-Instruct** | 1.5B | Apache 2.0 | Better reasoning, still fast | If 0.5B insufficient | | **Qwen2.5-3B-Instruct** | 3B | Apache 2.0 | Best reasoning | Complex tasks | | **SmolLM2-360M-Instruct** | 360M | Apache 2.0 | Ultra-fast, tiny | Simple classification | | **Phi-3.5-mini-instruct** | 3.8B | MIT | Strong reasoning | Alternative to Qwen | | **TinyLlama-1.1B** | 1.1B | Apache 2.0 | Proven stable | Fallback option | ### 2.2 Why Qwen2.5-0.5B is the Primary Recommendation 1. **Structured Output Excellence**: Qwen2.5 family excels at JSON/structured generation 2. **Size/Performance Balance**: 0.5B runs in ~500MB VRAM quantized (Q4_K_M) 3. **Apache 2.0 License**: Commercial-friendly, no restrictions 4. **GGUF Available**: Pre-quantized versions on HuggingFace 5. **Instruction-Following**: Tuned for following precise prompts 6. **Multilingual**: Works across languages (useful for global deployments) ### 2.3 Model Quantization Strategy ``` ┌─────────────────────────────────────────────────────────────┐ │ Quantization Tiers │ ├─────────────────────────────────────────────────────────────┤ │ Q8_0: ~550MB │ Best quality, more VRAM │ │ Q5_K_M: ~400MB │ Good balance │ │ Q4_K_M: ~350MB │ Recommended default ★ │ │ Q4_0: ~300MB │ Fastest, slight quality loss │ │ Q2_K: ~200MB │ Emergency fallback only │ └─────────────────────────────────────────────────────────────┘ ``` **Recommendation**: Ship with Q4_K_M by default, allow Q8_0 for systems with VRAM headroom. --- ## 3. Architecture: Multi-Model Management ### 3.1 Adapting Ollama's Approach Ollama (MIT licensed) solved multi-model GPU scheduling. Key patterns to adapt: ```go // pkg/heimdall/scheduler.go - Inspired by Ollama's scheduler type ModelScheduler struct { mu sync.RWMutex models map[string]*LoadedModel gpuMemory int64 // Available VRAM maxLoaded int // Max concurrent models lru *list.List // LRU for eviction embedModel string // Always-loaded embedding model } type LoadedModel struct { Name string Model *localllm.Model Context *localllm.Context LastUsed time.Time MemoryUsed int64 Purpose ModelPurpose // Embedding, Reasoning, Classification } type ModelPurpose int const ( PurposeEmbedding ModelPurpose = iota PurposeReasoning PurposeClassification ) ``` ### 3.2 Model Loading Strategy ```go // Embedding model: Always loaded (primary workload) // Reasoning model: Loaded on-demand, cached with LRU eviction func (s *ModelScheduler) GetModel(purpose ModelPurpose) (*LoadedModel, error) { s.mu.Lock() defer s.mu.Unlock() modelName := s.modelForPurpose(purpose) // Check if already loaded if model, ok := s.models[modelName]; ok { model.LastUsed = time.Now() s.lru.MoveToFront(model.lruElement) return model, nil } // Need to load - check memory if err := s.ensureMemoryFor(modelName); err != nil { return nil, err } // Load the model return s.loadModel(modelName, purpose) } func (s *ModelScheduler) ensureMemoryFor(modelName string) error { required := s.estimateMemory(modelName) for s.usedMemory + required > s.gpuMemory { // Evict least recently used (but never the embedding model) evictee := s.lru.Back() if evictee == nil { return fmt.Errorf("cannot free enough GPU memory") } model := evictee.Value.(*LoadedModel) if model.Name == s.embedModel { // Never evict embedding model s.lru.MoveToFront(evictee) continue } s.unloadModel(model) } return nil } ``` ### 3.3 Extending Current llama.go Current `pkg/localllm/llama.go` supports single-model loading. Extensions needed: ```go // pkg/localllm/llama.go - Extensions // ModelType distinguishes embedding vs generation models type ModelType int const ( ModelTypeEmbedding ModelType = iota ModelTypeGeneration ) // GenerationOptions for text generation (vs embedding) type GenerationOptions struct { MaxTokens int Temperature float32 TopP float32 TopK int StopTokens []string } // Generate produces text completion (for reasoning SLM) func (m *Model) Generate(ctx context.Context, prompt string, opts GenerationOptions) (string, error) { m.mu.Lock() defer m.mu.Unlock() // Tokenize prompt tokens := m.tokenize(prompt) // Generate tokens iteratively var output strings.Builder for i := 0; i < opts.MaxTokens; i++ { select { case <-ctx.Done(): return output.String(), ctx.Err() default: } nextToken, err := m.sampleNext(tokens, opts) if err != nil { return output.String(), err } if m.isStopToken(nextToken, opts.StopTokens) { break } tokens = append(tokens, nextToken) output.WriteString(m.detokenize(nextToken)) } return output.String(), nil } ``` --- ## 4. SLM Subsystems ### 4.1 Anomaly Detection System ```go // pkg/heimdall/anomaly/detector.go type AnomalyDetector struct { scheduler *ModelScheduler store storage.Engine } type GraphStats struct { TotalNodes int64 `json:"total_nodes"` TotalEdges int64 `json:"total_edges"` NodesByLabel map[string]int64 `json:"nodes_by_label"` EdgesByType map[string]int64 `json:"edges_by_type"` AvgDegree float64 `json:"avg_degree"` MaxDegree int64 `json:"max_degree"` SuperNodes []string `json:"super_nodes"` // Nodes with >1000 edges RecentGrowth float64 `json:"recent_growth_rate"` } const anomalyPrompt = `<|im_start|>system You are a graph database anomaly detector. Analyze graph statistics and identify anomalies. Output JSON only: {"anomaly_detected": bool, "type": string, "severity": "low"|"medium"|"high"|"critical", "affected_nodes": [], "recommendation": string} <|im_end|> <|im_start|>user Graph Statistics: %s Recent changes: - Nodes added last hour: %d - Edges added last hour: %d - Labels with unusual growth: %v Identify any anomalies. <|im_end|> <|im_start|>assistant ` func (d *AnomalyDetector) Analyze(ctx context.Context) (*AnomalyReport, error) { stats := d.collectStats() prompt := fmt.Sprintf(anomalyPrompt, mustJSON(stats), stats.RecentNodeGrowth, stats.RecentEdgeGrowth, stats.UnusualLabels, ) model, err := d.scheduler.GetModel(PurposeReasoning) if err != nil { return nil, err } response, err := model.Generate(ctx, prompt, GenerationOptions{ MaxTokens: 256, Temperature: 0.1, // Low temperature for deterministic output StopTokens: []string{"<|im_end|>"}, }) if err != nil { return nil, err } return parseAnomalyResponse(response) } ``` ### 4.2 Runtime Health Diagnosis ```go // pkg/heimdall/health/diagnostician.go type Diagnostician struct { scheduler *ModelScheduler } type RuntimeSnapshot struct { GoroutineCount int `json:"goroutine_count"` HeapAlloc uint64 `json:"heap_alloc_mb"` GCPauseNs uint64 `json:"gc_pause_ns"` BlockedRoutines []BlockedRoutine `json:"blocked_routines"` LockContention map[string]int64 `json:"lock_contention"` QueueDepths map[string]int `json:"queue_depths"` } type BlockedRoutine struct { ID uint64 `json:"id"` State string `json:"state"` WaitingOn string `json:"waiting_on"` Duration Duration `json:"duration"` Stack []string `json:"stack_summary"` // Top 3 frames only } const diagnosisPrompt = `<|im_start|>system You are a Go runtime diagnostician. Analyze runtime metrics and identify issues. Output JSON: {"diagnosis": string, "severity": "healthy"|"warning"|"critical", "action_id": int, "details": string} Action IDs: 0 = No action needed 1 = Log warning 2 = Restart worker pool 3 = Clear specific queue 4 = Trigger GC 5 = Reduce concurrency <|im_end|> <|im_start|>user Runtime Snapshot: %s <|im_end|> <|im_start|>assistant ` func (d *Diagnostician) Diagnose(ctx context.Context, snapshot RuntimeSnapshot) (*Diagnosis, error) { prompt := fmt.Sprintf(diagnosisPrompt, mustJSON(snapshot)) model, err := d.scheduler.GetModel(PurposeClassification) if err != nil { return nil, err } response, err := model.Generate(ctx, prompt, GenerationOptions{ MaxTokens: 128, Temperature: 0.0, // Deterministic for safety }) return parseDiagnosis(response) } ``` ### 4.3 Memory Curator (Agent-Facing) ```go // pkg/heimdall/curator/memory_curator.go type MemoryCurator struct { scheduler *ModelScheduler decay *decay.Engine } type MemoryNode struct { ID string `json:"id"` Content string `json:"content"` Labels []string `json:"labels"` AccessCount int64 `json:"access_count"` LastAccess time.Time `json:"last_access"` CreatedAt time.Time `json:"created_at"` EdgeCount int `json:"edge_count"` Importance float64 `json:"current_importance"` } const curationPrompt = `<|im_start|>system You are a memory curator for an AI agent's knowledge graph. Evaluate memories for importance. Output JSON: {"should_keep": bool, "new_importance": float, "summary": string, "merge_candidates": []} <|im_end|> <|im_start|>user Memory to evaluate: %s Related memories (by embedding similarity): %s Agent's recent focus areas: %v <|im_end|> <|im_start|>assistant ` func (c *MemoryCurator) EvaluateMemory(ctx context.Context, node MemoryNode, similar []MemoryNode) (*CurationDecision, error) { prompt := fmt.Sprintf(curationPrompt, mustJSON(node), mustJSON(similar), c.getRecentFocusAreas(), ) model, err := c.scheduler.GetModel(PurposeReasoning) if err != nil { return nil, err } response, err := model.Generate(ctx, prompt, GenerationOptions{ MaxTokens: 200, Temperature: 0.3, }) return parseCurationDecision(response) } ``` --- ## 5. Configuration ### 5.1 Environment Variables The SLM is controlled via **feature flags** in the existing `FeatureFlagsConfig`, following the same BYOM (Bring Your Own Model) pattern as embeddings. ```bash # Enable SLM (opt-in, disabled by default) NORNICDB_HEIMDALL_ENABLED=true # BYOM: Use same models directory as embeddings NORNICDB_MODELS_DIR=/data/models # Model Selection (without .gguf extension) NORNICDB_HEIMDALL_MODEL=qwen2.5-0.5b-instruct # GPU Configuration NORNICDB_HEIMDALL_GPU_LAYERS=-1 # -1 = auto (all on GPU, fallback to CPU) # Generation Parameters NORNICDB_HEIMDALL_MAX_TOKENS=512 NORNICDB_HEIMDALL_TEMPERATURE=0.1 # Low for deterministic output # Feature Toggles (default to enabled when SLM is enabled) NORNICDB_HEIMDALL_ANOMALY_DETECTION=true NORNICDB_HEIMDALL_RUNTIME_DIAGNOSIS=true NORNICDB_HEIMDALL_MEMORY_CURATION=false # Experimental ``` **Key Design Decisions:** - **Feature Flag**: SLM is opt-in via `NORNICDB_HEIMDALL_ENABLED=true` - **BYOM**: Uses same `NORNICDB_MODELS_DIR` as embeddings - drop in `.gguf` files - **CPU Fallback**: If GPU memory is insufficient, automatically falls back to CPU - **No Remote**: Only local models supported (no remote LLM management yet) ### 5.2 Implementation (COMPLETED) SLM configuration is integrated into the existing `FeatureFlagsConfig` in `pkg/config/config.go`: ```go // pkg/config/config.go - IMPLEMENTED // FeatureFlagsConfig includes SLM settings: type FeatureFlagsConfig struct { // ... existing flags ... // SLM (Small Language Model) for cognitive database features SLMEnabled bool // NORNICDB_HEIMDALL_ENABLED SLMModel string // NORNICDB_HEIMDALL_MODEL SLMGPULayers int // NORNICDB_HEIMDALL_GPU_LAYERS SLMMaxTokens int // NORNICDB_HEIMDALL_MAX_TOKENS SLMTemperature float32 // NORNICDB_HEIMDALL_TEMPERATURE SLMAnomalyDetection bool // NORNICDB_HEIMDALL_ANOMALY_DETECTION SLMRuntimeDiagnosis bool // NORNICDB_HEIMDALL_RUNTIME_DIAGNOSIS SLMMemoryCuration bool // NORNICDB_HEIMDALL_MEMORY_CURATION } ``` The SLM package (`pkg/heimdall`) provides its own `Config` type: ```go // pkg/heimdall/types.go - IMPLEMENTED type Config struct { Enabled bool ModelsDir string // Uses NORNICDB_MODELS_DIR (shared with embeddings) Model string MaxTokens int Temperature float32 GPULayers int // -1 = auto, 0 = CPU only AnomalyDetection bool AnomalyInterval time.Duration RuntimeDiagnosis bool RuntimeInterval time.Duration MemoryCuration bool CurationInterval time.Duration } // ConfigFromFeatureFlags creates Config from FeatureFlagsConfig func ConfigFromFeatureFlags(flags FeatureFlagsSource, modelsDir string) Config ``` **CPU Fallback Behavior:** The `Manager` automatically falls back to CPU if GPU loading fails: ```go // pkg/heimdall/scheduler.go - IMPLEMENTED generator, err := loadGenerator(modelPath, gpuLayers) if err != nil { fmt.Printf("⚠️ GPU loading failed, trying CPU fallback: %v\n", err) generator, err = loadGenerator(modelPath, 0) // CPU only } ``` --- ## 6. Implementation Plan ### Phase 1: Foundation ✅ COMPLETED - [x] Add SLM feature flags to `pkg/config/config.go` - [x] Create `pkg/heimdall/types.go` with common types - [x] Create `pkg/heimdall/scheduler.go` (Manager) with BYOM pattern - [x] Create `pkg/heimdall/handler.go` with HTTP/SSE endpoints - [x] CPU fallback when GPU memory insufficient **Files Created:** - `pkg/heimdall/types.go` - Config, Generator interface, action opcodes - `pkg/heimdall/scheduler.go` - Manager with BYOM model loading - `pkg/heimdall/handler.go` - OpenAI-compatible HTTP API **Remaining:** - [ ] Extend `pkg/localllm/llama.go` with `GenerateStream()` CGO implementation - [ ] Write tests for model loading - [ ] Benchmark GPU memory usage ### Phase 2: Core Subsystems (Week 3-4) - [ ] Implement `pkg/heimdall/anomaly/detector.go` - [ ] Implement `pkg/heimdall/health/diagnostician.go` - [ ] Create action opcode registry - [ ] Build confidence threshold system - [ ] Integration tests with mock models ### Phase 3: Memory Curation (Week 5-6) - [ ] Implement `pkg/heimdall/curator/memory_curator.go` - [ ] Integrate with `pkg/decay` engine - [ ] Add semantic deduplication - [ ] Build summarization pipeline - [ ] Test with real agent workloads ### Phase 4: Production Hardening (Week 7-8) - [ ] Add metrics/observability (Prometheus) - [ ] Implement graceful degradation (SLM failure → fallback) - [ ] Documentation + examples - [ ] Performance benchmarks - [ ] Security audit of action opcodes --- ## 7. Resource Requirements ### 7.1 GPU Memory Budget ``` ┌─────────────────────────────────────────────────────────────┐ │ GPU Memory Budget (8GB Total) │ ├─────────────────────────────────────────────────────────────┤ │ BGE-M3 Embedding (Q4_K_M) │ ~600MB │ │ Qwen2.5-0.5B (Q4_K_M) │ ~350MB │ │ KV Cache (both models) │ ~200MB │ │ System Reserve │ ~512MB │ │ ───────────────────────────────────────────── │ │ Total Required │ ~1.7GB │ │ Available for Graph Ops │ ~6.3GB │ └─────────────────────────────────────────────────────────────┘ ``` **Conclusion**: Fits comfortably on any modern GPU (even 4GB discrete or Apple M1 8GB unified). ### 7.2 CPU Fallback For systems without GPU: - BGE-M3: ~10-50ms per embedding (acceptable) - Qwen2.5-0.5B: ~100-500ms per generation (acceptable for background tasks) Recommendation: Run SLM subsystems on separate thread pool to not block queries. --- ## 8. Model Download Strategy ### 8.1 Recommended GGUF Sources ```bash # Primary: HuggingFace (official quantizations) https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF # Alternative: TheBloke quantizations (community) https://huggingface.co/TheBloke/ ``` ### 8.2 Download Script ```bash #!/bin/bash # scripts/download-slm.sh MODEL_DIR="${NORNICDB_MODELS_DIR:-/data/models}" mkdir -p "$MODEL_DIR" # Download Qwen2.5-0.5B-Instruct (Q4_K_M) wget -O "$MODEL_DIR/qwen2.5-0.5b-instruct.gguf" \ "https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf" # Verify checksum echo "Expected SHA256: <checksum>" sha256sum "$MODEL_DIR/qwen2.5-0.5b-instruct.gguf" echo "✅ SLM model downloaded to $MODEL_DIR" ``` --- ## 9. Risk Mitigation | Risk | Mitigation | |------|------------| | SLM produces invalid action | Strict JSON schema validation + action whitelist | | SLM hallucinates node IDs | Cross-reference all IDs against storage | | GPU OOM | Memory budget enforcement + graceful eviction | | Latency spikes | Async processing + timeout enforcement | | Model corruption | Checksum verification on load | | Prompt injection | Sanitize all user-derived input | --- ## 10. Success Metrics ### 10.1 Quantitative - [ ] Anomaly detection catches 80%+ of synthetic anomalies - [ ] Runtime diagnosis accuracy >90% on labeled test set - [ ] Memory curation reduces node count by 10-20% with <5% false positives - [ ] P99 latency impact <10ms on query path - [ ] GPU memory usage <2GB total (embedding + reasoning) ### 10.2 Qualitative - [ ] Zero unintended data modifications - [ ] All SLM actions logged and auditable - [ ] Graceful degradation when SLM unavailable - [ ] Clear documentation for operators --- ## 11. Conclusion Embedding a reasoning SLM alongside the embedding model transforms NornicDB into a **Cognitive Graph Database** — the first of its kind. The architecture is: 1. **Safe**: Bounded actions, confidence thresholds, audit logs 2. **Efficient**: <2GB GPU, LRU eviction, async processing 3. **Practical**: Real anomaly detection, health diagnosis, memory curation 4. **Extensible**: Clean interfaces for future subsystems **Next Step**: Approve proposal and begin Phase 1 implementation. --- ## Appendix A: Model Comparison Benchmarks | Model | Params | VRAM (Q4_K_M) | Tokens/sec (M2) | JSON Accuracy | |-------|--------|---------------|-----------------|---------------| | SmolLM2-360M | 360M | ~200MB | ~150 | 85% | | Qwen2.5-0.5B | 0.5B | ~350MB | ~100 | 94% | | Qwen2.5-1.5B | 1.5B | ~900MB | ~50 | 97% | | Qwen2.5-3B | 3B | ~1.8GB | ~25 | 98% | | Phi-3.5-mini | 3.8B | ~2.2GB | ~20 | 96% | **Recommendation**: Start with Qwen2.5-0.5B. Upgrade to 1.5B if JSON accuracy is insufficient. --- ## Appendix B: Sample Prompts ### B.1 Anomaly Detection ``` <|im_start|>system You are a graph database anomaly detector. Analyze the following graph statistics and identify any anomalies. Output ONLY valid JSON in this exact format: {"anomaly_detected": boolean, "type": "none"|"supernode"|"growth_spike"|"orphan_cluster"|"duplicate_nodes", "severity": "low"|"medium"|"high"|"critical", "affected_nodes": ["id1", "id2"], "recommendation": "string"} <|im_end|> <|im_start|>user Graph Statistics: - Total nodes: 125,432 - Total edges: 892,104 - Average degree: 7.1 - Max degree: 45,892 (node: "user-config-cache") - Nodes added last hour: 12,500 - Typical hourly addition: 200 Is this anomalous? <|im_end|> <|im_start|>assistant {"anomaly_detected": true, "type": "supernode", "severity": "high", "affected_nodes": ["user-config-cache"], "recommendation": "Node 'user-config-cache' has 45,892 edges which is 6,464x the average. Investigate for possible runaway process or missing pagination."} ``` ### B.2 Runtime Diagnosis ``` <|im_start|>system You are a Go runtime diagnostician. Analyze metrics and identify issues. Output ONLY valid JSON: {"diagnosis": "string", "severity": "healthy"|"warning"|"critical", "action_id": 0-5, "details": "string"} Action IDs: 0=none, 1=log_warning, 2=restart_workers, 3=clear_queue, 4=trigger_gc, 5=reduce_concurrency <|im_end|> <|im_start|>user Runtime metrics: - Goroutines: 12,847 (normal: ~500) - Heap: 4.2GB (limit: 8GB) - GC pause: 450ms (normal: <50ms) - Blocked routines: 847 waiting on "queryExecutor.mu" <|im_end|> <|im_start|>assistant {"diagnosis": "Lock contention causing goroutine explosion", "severity": "critical", "action_id": 5, "details": "847 goroutines blocked on queryExecutor.mu suggests lock contention. Reduce query concurrency to relieve pressure."} ``` --- ## Appendix C: License Compliance | Component | License | Commercial Use | Modification | Distribution | |-----------|---------|----------------|--------------|--------------| | Qwen2.5 | Apache 2.0 | ✅ | ✅ | ✅ (with notice) | | SmolLM2 | Apache 2.0 | ✅ | ✅ | ✅ (with notice) | | Phi-3.5 | MIT | ✅ | ✅ | ✅ | | TinyLlama | Apache 2.0 | ✅ | ✅ | ✅ (with notice) | | llama.cpp | MIT | ✅ | ✅ | ✅ | | Ollama | MIT | ✅ | ✅ | ✅ | **All recommended models are fully compatible with commercial use.** --- ## 12. BYOM Architecture (Bring Your Own Model) ### 12.1 Unified Models Directory All models (embedding + reasoning SLM) live in the same directory: ``` ${NORNICDB_MODELS_DIR}/ # Default: /data/models ├── bge-m3.gguf # Embedding model (existing) ├── qwen2.5-0.5b-instruct.gguf # Reasoning SLM (new) ├── qwen2.5-1.5b-instruct.gguf # Alternative larger SLM └── custom-finetuned.gguf # User's custom model ``` ### 12.2 Model Registry ```go // pkg/heimdall/registry.go type ModelRegistry struct { mu sync.RWMutex models map[string]*ModelInfo basePath string } type ModelInfo struct { Name string `json:"name"` Path string `json:"path"` Type ModelType `json:"type"` // embedding, reasoning, classification Size int64 `json:"size_bytes"` Quantization string `json:"quantization"` // Q4_K_M, Q8_0, etc. Loaded bool `json:"loaded"` LastUsed time.Time `json:"last_used"` VRAMEstimate int64 `json:"vram_estimate"` } type ModelType string const ( ModelTypeEmbedding ModelType = "embedding" ModelTypeReasoning ModelType = "reasoning" ModelTypeClassification ModelType = "classification" ) // ScanModels discovers all GGUF files in the models directory func (r *ModelRegistry) ScanModels() error { r.mu.Lock() defer r.mu.Unlock() entries, err := os.ReadDir(r.basePath) if err != nil { return err } for _, entry := range entries { if strings.HasSuffix(entry.Name(), ".gguf") { info, _ := entry.Info() modelName := strings.TrimSuffix(entry.Name(), ".gguf") r.models[modelName] = &ModelInfo{ Name: modelName, Path: filepath.Join(r.basePath, entry.Name()), Type: r.inferModelType(modelName), Size: info.Size(), Quantization: r.detectQuantization(modelName), VRAMEstimate: r.estimateVRAM(info.Size()), } } } return nil } // inferModelType guesses model purpose from name patterns func (r *ModelRegistry) inferModelType(name string) ModelType { lower := strings.ToLower(name) switch { case strings.Contains(lower, "embed") || strings.Contains(lower, "bge") || strings.Contains(lower, "e5") || strings.Contains(lower, "nomic"): return ModelTypeEmbedding case strings.Contains(lower, "instruct") || strings.Contains(lower, "chat"): return ModelTypeReasoning default: return ModelTypeReasoning // Default to reasoning } } ``` ### 12.3 Configuration Extension ```bash # Environment Variables NORNICDB_MODELS_DIR=/data/models # Shared models directory NORNICDB_EMBEDDING_MODEL=bge-m3 # Embedding model name NORNICDB_HEIMDALL_MODEL=qwen2.5-0.5b-instruct # Reasoning SLM name NORNICDB_HEIMDALL_FALLBACK_MODEL=tinyllama-1.1b # Fallback if primary OOM ``` ### 12.4 Model Hot-Swap Models can be switched at runtime without restart: ```go // POST /api/admin/models/load type LoadModelRequest struct { ModelName string `json:"model_name"` Purpose ModelType `json:"purpose"` Force bool `json:"force"` // Unload current if needed } // GET /api/admin/models // Returns all available models and their status ``` --- ## 13. Bifrost (Admin UI) ### 13.1 Architecture Overview A translucent terminal-style chat interface for direct SLM interaction: ``` ┌─────────────────────────────────────────────────────────────────┐ │ NornicDB Admin │ ├─────────────────────────────────────────────────────────────────┤ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ 🧠 Bifrost [qwen2.5-0.5b] ▼ ─ □ x │ │ │ ├──────────────────────────────────────────────────────────┤ │ │ │ #nornicdb-slm │ │ │ │ │ │ │ │ > Analyze current graph health │ │ │ │ │ │ │ │ {"status": "healthy", "nodes": 125432, │ │ │ │ "edges": 892104, "anomalies": [], │ │ │ │ "recommendations": ["Consider indexing label:File"]} │ │ │ │ │ │ │ │ > What queries are running slow? │ │ │ │ │ │ │ │ Analyzing query logs... │ │ │ │ ████████░░░░░░░░ 50% │ │ │ │ │ │ │ ├──────────────────────────────────────────────────────────┤ │ │ │ > _ [Send]│ │ │ └──────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ### 13.2 UI Component (React + TypeScript) Inspired by Pegasus CliView, modernized for React 18: ```typescript // frontend/src/components/SLMPortal/SLMPortal.tsx import React, { useState, useRef, useEffect, useCallback } from 'react'; import { motion, AnimatePresence } from 'framer-motion'; import './SLMPortal.css'; interface Message { id: string; role: 'user' | 'assistant' | 'system'; content: string; timestamp: Date; streaming?: boolean; } interface SLMPortalProps { isOpen: boolean; onClose: () => void; modelName?: string; } export const SLMPortal: React.FC<SLMPortalProps> = ({ isOpen, onClose, modelName = 'qwen2.5-0.5b' }) => { const [messages, setMessages] = useState<Message[]>([ { id: '0', role: 'system', content: '#nornicdb-slm\n\nCognitive Database Assistant Ready', timestamp: new Date() } ]); const [input, setInput] = useState(''); const [isStreaming, setIsStreaming] = useState(false); const [commandHistory, setCommandHistory] = useState<string[]>([]); const [historyIndex, setHistoryIndex] = useState(-1); const scrollRef = useRef<HTMLDivElement>(null); const inputRef = useRef<HTMLTextAreaElement>(null); const wsRef = useRef<WebSocket | null>(null); // Auto-scroll to bottom useEffect(() => { if (scrollRef.current) { scrollRef.current.scrollTop = scrollRef.current.scrollHeight; } }, [messages]); // Focus input when opened useEffect(() => { if (isOpen && inputRef.current) { inputRef.current.focus(); } }, [isOpen]); // WebSocket connection for streaming const connectWS = useCallback(() => { const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'; const ws = new WebSocket(`${protocol}//${window.location.host}/api/bifrost/chat/stream`); ws.onopen = () => { console.log('Bifrost connected'); }; ws.onmessage = (event) => { const data = JSON.parse(event.data); if (data.type === 'token') { // Streaming token - append to last message setMessages(prev => { const last = prev[prev.length - 1]; if (last?.streaming) { return [ ...prev.slice(0, -1), { ...last, content: last.content + data.token } ]; } return prev; }); } else if (data.type === 'done') { // Stream complete setMessages(prev => { const last = prev[prev.length - 1]; if (last?.streaming) { return [ ...prev.slice(0, -1), { ...last, streaming: false } ]; } return prev; }); setIsStreaming(false); } else if (data.type === 'error') { setMessages(prev => [ ...prev, { id: crypto.randomUUID(), role: 'system', content: `Error: ${data.message}`, timestamp: new Date() } ]); setIsStreaming(false); } }; ws.onerror = (error) => { console.error('Bifrost error:', error); setIsStreaming(false); }; ws.onclose = () => { console.log('Bifrost disconnected'); // Auto-reconnect after 3s setTimeout(connectWS, 3000); }; wsRef.current = ws; }, []); useEffect(() => { if (isOpen) { connectWS(); } return () => { wsRef.current?.close(); }; }, [isOpen, connectWS]); const sendMessage = () => { if (!input.trim() || isStreaming) return; const userMessage: Message = { id: crypto.randomUUID(), role: 'user', content: input, timestamp: new Date() }; // Add to history setCommandHistory(prev => [...prev, input]); setHistoryIndex(-1); // Add user message setMessages(prev => [...prev, userMessage]); // Add placeholder for assistant response const assistantMessage: Message = { id: crypto.randomUUID(), role: 'assistant', content: '', timestamp: new Date(), streaming: true }; setMessages(prev => [...prev, assistantMessage]); // Send via WebSocket setIsStreaming(true); wsRef.current?.send(JSON.stringify({ type: 'chat', content: input, model: modelName })); setInput(''); }; const handleKeyDown = (e: React.KeyboardEvent) => { if (e.key === 'Enter' && !e.shiftKey) { e.preventDefault(); sendMessage(); } else if (e.key === 'ArrowUp') { e.preventDefault(); if (historyIndex < commandHistory.length - 1) { const newIndex = historyIndex + 1; setHistoryIndex(newIndex); setInput(commandHistory[commandHistory.length - 1 - newIndex]); } } else if (e.key === 'ArrowDown') { e.preventDefault(); if (historyIndex > 0) { const newIndex = historyIndex - 1; setHistoryIndex(newIndex); setInput(commandHistory[commandHistory.length - 1 - newIndex]); } else { setHistoryIndex(-1); setInput(''); } } else if (e.key === 'Escape') { onClose(); } }; return ( <AnimatePresence> {isOpen && ( <motion.div className="slm-portal-overlay" initial={{ opacity: 0 }} animate={{ opacity: 1 }} exit={{ opacity: 0 }} > <motion.div className="slm-portal" initial={{ y: '100%' }} animate={{ y: 0 }} exit={{ y: '100%' }} transition={{ type: 'spring', damping: 25, stiffness: 300 }} > {/* Header */} <div className="slm-portal-header"> <div className="slm-portal-title"> <span className="slm-icon">🧠</span> <span>Bifrost Portal</span> </div> <div className="slm-portal-model"> <select defaultValue={modelName}> <option value="qwen2.5-0.5b">qwen2.5-0.5b</option> <option value="qwen2.5-1.5b">qwen2.5-1.5b</option> <option value="phi-3.5-mini">phi-3.5-mini</option> </select> </div> <button className="slm-portal-close" onClick={onClose}>×</button> </div> {/* Messages */} <div className="slm-portal-messages" ref={scrollRef}> {messages.map((msg) => ( <div key={msg.id} className={`slm-message slm-message-${msg.role}`}> <span className="slm-message-prefix"> {msg.role === 'user' ? '> ' : msg.role === 'system' ? '# ' : ''} </span> <span className="slm-message-content"> {msg.content} {msg.streaming && <span className="slm-cursor">▋</span>} </span> </div> ))} </div> {/* Input */} <div className="slm-portal-input-container"> <span className="slm-input-prefix">></span> <textarea ref={inputRef} className="slm-portal-input" value={input} onChange={(e) => setInput(e.target.value)} onKeyDown={handleKeyDown} placeholder="Enter command..." disabled={isStreaming} rows={1} /> <button className="slm-send-button" onClick={sendMessage} disabled={isStreaming || !input.trim()} > {isStreaming ? '...' : 'Send'} </button> </div> </motion.div> </motion.div> )} </AnimatePresence> ); }; ``` ### 13.3 Portal Styling (CSS) ```css /* frontend/src/components/SLMPortal/SLMPortal.css */ .slm-portal-overlay { position: fixed; inset: 0; background: rgba(0, 0, 0, 0.5); backdrop-filter: blur(4px); z-index: 1000; display: flex; align-items: flex-end; justify-content: center; } .slm-portal { width: 100%; max-width: 900px; height: 60vh; min-height: 400px; display: flex; flex-direction: column; border-radius: 12px 12px 0 0; overflow: hidden; box-shadow: 0 -10px 40px rgba(0, 0, 0, 0.4); /* Translucent dark background - Pegasus style */ background: linear-gradient( 135deg, rgba(15, 23, 42, 0.95) 0%, rgba(30, 41, 59, 0.92) 100% ); backdrop-filter: blur(20px); border: 1px solid rgba(255, 255, 255, 0.1); } .slm-portal-header { display: flex; align-items: center; padding: 12px 16px; background: rgba(0, 0, 0, 0.3); border-bottom: 1px solid rgba(255, 255, 255, 0.1); } .slm-portal-title { display: flex; align-items: center; gap: 8px; font-family: 'SF Mono', 'Fira Code', monospace; font-size: 14px; font-weight: 600; color: #10b981; /* Emerald green */ } .slm-icon { font-size: 18px; } .slm-portal-model { margin-left: auto; margin-right: 16px; } .slm-portal-model select { background: rgba(255, 255, 255, 0.1); border: 1px solid rgba(255, 255, 255, 0.2); border-radius: 6px; color: #94a3b8; padding: 4px 8px; font-family: 'SF Mono', monospace; font-size: 12px; cursor: pointer; } .slm-portal-close { background: none; border: none; color: #64748b; font-size: 24px; cursor: pointer; padding: 0 8px; transition: color 0.2s; } .slm-portal-close:hover { color: #ef4444; } .slm-portal-messages { flex: 1; overflow-y: auto; padding: 16px; font-family: 'SF Mono', 'Fira Code', 'Consolas', monospace; font-size: 14px; line-height: 1.6; /* Inset shadow for depth */ box-shadow: inset 0 20px 40px -20px rgba(0, 0, 0, 0.5); } .slm-message { margin-bottom: 8px; white-space: pre-wrap; word-break: break-word; } .slm-message-user { color: #f97316; /* Orange - command input */ } .slm-message-assistant { color: #22d3ee; /* Cyan - SLM response */ } .slm-message-system { color: #10b981; /* Emerald - system messages */ opacity: 0.8; } .slm-message-prefix { color: #64748b; user-select: none; } .slm-cursor { animation: blink 1s step-end infinite; color: #22d3ee; } @keyframes blink { 50% { opacity: 0; } } .slm-portal-input-container { display: flex; align-items: center; padding: 12px 16px; background: rgba(0, 0, 0, 0.4); border-top: 1px solid rgba(255, 255, 255, 0.1); gap: 8px; } .slm-input-prefix { color: #f97316; font-family: 'SF Mono', monospace; font-weight: bold; } .slm-portal-input { flex: 1; background: rgba(255, 255, 255, 0.05); border: 1px solid rgba(255, 255, 255, 0.1); border-radius: 6px; color: #f8fafc; font-family: 'SF Mono', 'Fira Code', monospace; font-size: 14px; padding: 8px 12px; resize: none; outline: none; transition: border-color 0.2s; } .slm-portal-input:focus { border-color: #f97316; } .slm-portal-input::placeholder { color: #475569; } .slm-send-button { background: linear-gradient(135deg, #f97316, #ea580c); border: none; border-radius: 6px; color: white; font-family: 'SF Mono', monospace; font-size: 12px; font-weight: 600; padding: 8px 16px; cursor: pointer; transition: transform 0.1s, opacity 0.2s; } .slm-send-button:hover:not(:disabled) { transform: scale(1.02); } .slm-send-button:disabled { opacity: 0.5; cursor: not-allowed; } /* Scrollbar styling */ .slm-portal-messages::-webkit-scrollbar { width: 8px; } .slm-portal-messages::-webkit-scrollbar-track { background: rgba(255, 255, 255, 0.05); } .slm-portal-messages::-webkit-scrollbar-thumb { background: rgba(255, 255, 255, 0.2); border-radius: 4px; } .slm-portal-messages::-webkit-scrollbar-thumb:hover { background: rgba(255, 255, 255, 0.3); } ``` ### 13.4 Backend: TLS WebSocket Stream Endpoint ```go // pkg/heimdall/api/chat_handler.go package api import ( "context" "encoding/json" "net/http" "time" "github.com/gorilla/websocket" "github.com/orneryd/nornicdb/pkg/auth" "github.com/orneryd/nornicdb/pkg/heimdall" ) var upgrader = websocket.Upgrader{ ReadBufferSize: 1024, WriteBufferSize: 1024, CheckOrigin: func(r *http.Request) bool { // In production, validate origin return true }, } type ChatHandler struct { scheduler *slm.ModelScheduler authz *auth.Authorizer } type ChatMessage struct { Type string `json:"type"` // "chat", "ping" Content string `json:"content"` Model string `json:"model,omitempty"` } type StreamToken struct { Type string `json:"type"` // "token", "done", "error" Token string `json:"token,omitempty"` Message string `json:"message,omitempty"` } // HandleChatStream handles WebSocket connections for SLM chat // Requires admin RBAC role func (h *ChatHandler) HandleChatStream(w http.ResponseWriter, r *http.Request) { // RBAC check - admin only user := auth.UserFromContext(r.Context()) if user == nil || !h.authz.HasRole(user, "admin") { http.Error(w, "Forbidden: admin role required", http.StatusForbidden) return } // Upgrade to WebSocket conn, err := upgrader.Upgrade(w, r, nil) if err != nil { http.Error(w, "WebSocket upgrade failed", http.StatusInternalServerError) return } defer conn.Close() // Set read deadline for pings conn.SetReadDeadline(time.Now().Add(60 * time.Second)) conn.SetPongHandler(func(string) error { conn.SetReadDeadline(time.Now().Add(60 * time.Second)) return nil }) for { _, message, err := conn.ReadMessage() if err != nil { if websocket.IsUnexpectedCloseError(err, websocket.CloseGoingAway, websocket.CloseAbnormalClosure) { // Log error } break } var msg ChatMessage if err := json.Unmarshal(message, &msg); err != nil { h.sendError(conn, "Invalid message format") continue } switch msg.Type { case "chat": h.handleChat(conn, msg) case "ping": conn.WriteJSON(StreamToken{Type: "pong"}) } } } func (h *ChatHandler) handleChat(conn *websocket.Conn, msg ChatMessage) { ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second) defer cancel() // Get or load the requested model model, err := h.scheduler.GetModel(msg.Model) if err != nil { h.sendError(conn, "Model not available: "+err.Error()) return } // Build prompt with system context prompt := h.buildPrompt(msg.Content) // Stream generation with token callback err = model.GenerateStream(ctx, prompt, slm.GenerateParams{ MaxTokens: 512, Temperature: 0.3, StopTokens: []string{"<|im_end|>", "<|endoftext|>"}, }, func(token string) error { return conn.WriteJSON(StreamToken{ Type: "token", Token: token, }) }) if err != nil { h.sendError(conn, "Generation failed: "+err.Error()) return } // Signal completion conn.WriteJSON(StreamToken{Type: "done"}) } func (h *ChatHandler) buildPrompt(userInput string) string { return `<|im_start|>system You are a cognitive database assistant embedded in NornicDB. You help administrators: - Analyze graph health and structure - Diagnose performance issues - Suggest optimizations - Answer questions about the database state You have access to graph statistics and can run analysis queries. Output structured JSON when analyzing data. Be concise and technical. <|im_end|> <|im_start|>user ` + userInput + ` <|im_end|> <|im_start|>assistant ` } func (h *ChatHandler) sendError(conn *websocket.Conn, msg string) { conn.WriteJSON(StreamToken{ Type: "error", Message: msg, }) } ``` ### 13.5 TLS Configuration ```go // pkg/heimdall/api/tls.go type TLSConfig struct { CertFile string KeyFile string // For self-signed certs in dev InsecureSkipVerify bool } // The Bifrost runs on a separate TLS port // Default: 7475 (adjacent to HTTP 7474) func StartSLMPortalServer(cfg TLSConfig, handler *ChatHandler) error { mux := http.NewServeMux() // WebSocket endpoint mux.HandleFunc("/api/bifrost/chat/stream", handler.HandleChatStream) // REST endpoints for model management mux.HandleFunc("/api/bifrost/models", handler.ListModels) mux.HandleFunc("/api/bifrost/models/load", handler.LoadModel) server := &http.Server{ Addr: ":7475", Handler: mux, TLSConfig: &tls.Config{ MinVersion: tls.VersionTLS12, CipherSuites: []uint16{ tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, }, }, } return server.ListenAndServeTLS(cfg.CertFile, cfg.KeyFile) } ``` ### 13.6 RBAC Integration ```go // pkg/config/rbac.json - Add SLM permissions { "roles": { "admin": { "permissions": [ "slm:chat", "slm:models:read", "slm:models:load", "slm:models:unload", "slm:config:read", "slm:config:write" ] }, "operator": { "permissions": [ "slm:models:read" ] }, "user": { "permissions": [] } } } ``` ### 13.7 Environment Variables ```bash # Bifrost Configuration NORNICDB_HEIMDALL_PORTAL_ENABLED=true NORNICDB_HEIMDALL_PORTAL_PORT=7475 NORNICDB_HEIMDALL_PORTAL_TLS_CERT=/certs/slm-portal.crt NORNICDB_HEIMDALL_PORTAL_TLS_KEY=/certs/slm-portal.key NORNICDB_HEIMDALL_PORTAL_ALLOWED_ROLES=admin ``` --- ## 14. Updated Implementation Plan ### Phase 1: Foundation (Week 1-2) - [ ] BYOM Model Registry (`pkg/heimdall/registry.go`) - [ ] Extend `pkg/localllm/llama.go` with `GenerateStream()` - [ ] Model scheduler with LRU eviction - [ ] Unit tests for model loading/switching ### Phase 2: Chat Portal Backend (Week 3-4) - [ ] WebSocket stream handler - [ ] TLS server on separate port - [ ] RBAC integration for admin-only access - [ ] Streaming token generation ### Phase 3: Chat Portal UI (Week 5-6) - [ ] React SLMPortal component - [ ] Translucent terminal styling - [ ] Command history (up/down arrows) - [ ] Model selector dropdown - [ ] Keyboard shortcuts (Escape to close) ### Phase 4: Integration & Polish (Week 7-8) - [ ] Graph context injection (stats, health) - [ ] Built-in commands (`/health`, `/stats`, `/models`) - [ ] Prometheus metrics for portal usage - [ ] Documentation + screenshots - [ ] Security audit --- ## 15. Security Considerations | Concern | Mitigation | |---------|------------| | Unauthorized SLM access | Admin RBAC required, JWT validation | | Prompt injection | Input sanitization, output schema validation | | TLS downgrade | Min TLS 1.2, strong cipher suites | | DoS via long generations | Max tokens limit, request timeout | | Model path traversal | Validate model names against registry | | WebSocket hijacking | Origin validation, CSRF tokens | --- ## 16. Implemented Features (v1.0.0) The following features from this proposal have been implemented: ### Core Heimdall System - ✅ Multi-model management (embedding + reasoning) - ✅ GPU-accelerated inference via llama.cpp - ✅ BYOM (Bring Your Own Model) support - ✅ CPU fallback when GPU unavailable ### Bifrost Chat Interface - ✅ OpenAI-compatible API endpoints - ✅ Server-Sent Events (SSE) for streaming - ✅ Norse-themed UI with translucent terminal styling - ✅ Session-persistent chat history - ✅ Built-in commands (/help, /clear, /status, /model) ### Plugin Architecture - ✅ `HeimdallPlugin` interface for subsystem management - ✅ Action registration and invocation system - ✅ Built-in Watcher plugin with hello-world example ### Advanced Plugin Features (NEW) #### Optional Lifecycle Hooks Plugins can implement optional interfaces: ```go // PrePromptHook - Modify prompts before SLM processing type PrePromptHook interface { PrePrompt(ctx *PromptContext) error } // PreExecuteHook - Validate/modify before action execution type PreExecuteHook interface { PreExecute(ctx *PreExecuteContext, done func(PreExecuteResult)) } // PostExecuteHook - Post-execution logging/state updates type PostExecuteHook interface { PostExecute(ctx *PostExecuteContext) } // DatabaseEventHook - React to database operations type DatabaseEventHook interface { OnDatabaseEvent(event *DatabaseEvent) } ``` #### Database Event Types The `DatabaseEventHook` receives events for: | Event Type | Description | |------------|-------------| | `node.created`, `node.updated`, `node.deleted`, `node.read` | Node operations | | `relationship.created`, `relationship.updated`, `relationship.deleted` | Relationship operations | | `query.executed`, `query.failed` | Query execution | | `index.created`, `index.dropped` | Index operations | | `transaction.commit`, `transaction.rollback` | Transaction events | | `database.started`, `database.shutdown` | System events | | `backup.started`, `backup.completed` | Backup events | #### Autonomous Action Invocation Plugins can autonomously trigger SLM actions via `HeimdallInvoker`: ```go type HeimdallInvoker interface { // Synchronous action invocation InvokeAction(action string, params map[string]interface{}) (*ActionResult, error) // Send natural language prompt to SLM SendPrompt(prompt string) (*ActionResult, error) // Async versions (fire-and-forget) InvokeActionAsync(action string, params map[string]interface{}) SendPromptAsync(prompt string) } ``` **Example: Autonomous Anomaly Detection** ```go func (p *SecurityPlugin) OnDatabaseEvent(event *heimdall.DatabaseEvent) { if event.Type == heimdall.EventQueryFailed { p.failureCount++ if p.failureCount >= 5 && p.ctx.Heimdall != nil { // Trigger analysis after threshold exceeded p.ctx.Heimdall.InvokeActionAsync("heimdall.anomaly.detect", map[string]interface{}{ "trigger": "autonomous", "reason": "query_failures", }) } } } ``` #### Inline Notification System Notifications from lifecycle hooks are queued and sent inline with streaming responses: ```go func (p *MyPlugin) PrePrompt(ctx *heimdall.PromptContext) error { ctx.NotifyInfo("Processing", "Analyzing your request...") return nil } ``` Notification flow: 1. PrePrompt notifications → sent before AI response 2. PreExecute notifications → sent after AI response, before action result 3. PostExecute notifications → sent after action result UI displays notifications with `[Heimdall]:` prefix and distinct styling. #### Request Cancellation Any lifecycle hook can cancel a request: ```go func (p *MyPlugin) PrePrompt(ctx *heimdall.PromptContext) error { if !p.isAuthorized(ctx.UserMessage) { ctx.Cancel("Unauthorized request", "PrePrompt:myplugin") return nil } return nil } ``` Cancellation: - Stops the request immediately - Sends cancellation message to user via Bifrost - Logs reason and cancelling hook ### Data Flow Architecture ``` User Message → Bifrost → PromptContext │ ┌─────────▼─────────┐ │ PrePrompt Hooks │ → Notifications queued │ (can cancel) │ └─────────┬─────────┘ │ ┌─────────▼─────────┐ │ Heimdall SLM │ → Generates action JSON └─────────┬─────────┘ │ ┌─────────▼─────────┐ │ PreExecute Hooks │ → Validate/modify params │ (can cancel) │ └─────────┬─────────┘ │ ┌─────────▼─────────┐ │ Action Execution │ → Plugin handler runs └─────────┬─────────┘ │ ┌─────────▼─────────┐ │ PostExecute Hooks │ → Log, notify, update state └─────────┬─────────┘ │ ┌─────────▼─────────┐ │ Streaming Response│ → Notifications + result └───────────────────┘ ``` ---

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

COGNITIVE_SLM_PROPOSAL.md•57.8 KiB