Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
NORNICDB_NATIVE_EMBEDDINGS_ANALYSIS.mdβ€’17.7 kB
# Critical Analysis: NornicDB Native Embeddings Plan **Analyst:** Cascade AI **Date:** November 28, 2025 **Status:** ⚠️ **PLAN REQUIRES SIGNIFICANT REVISION** --- ## Executive Summary The plan in `NORNICDB_NATIVE_EMBEDDINGS_PLAN.md` has **fundamental gaps** that prevent implementation: | Issue | Severity | Impact | |-------|----------|--------| | NornicDB native embeddings **not yet implemented** | πŸ”΄ Critical | Plan assumes features that don't exist | | No database provider detection in codebase | πŸ”΄ Critical | Core foundation missing | | No strategy pattern exists | πŸ”΄ Critical | Would require extensive refactoring | | `src/services/` directory doesn't exist | 🟑 Medium | File paths don't match reality | | VisionLanguageService naming mismatch | 🟒 Low | Already exists as `VLService` | **Recommendation:** Either wait for NornicDB native embeddings OR implement a simpler "skip embeddings" approach. --- ## 1. Critical Finding: NornicDB Native Embeddings Are Not Ready ### From NornicDB Architecture Docs **File:** `nornicdb/docs/ARCHITECTURE.md` (lines 144-160) ``` NornicDB does NOT: - Generate embeddings (Mimir handles this via Ollama/OpenAI) - Read source files (Mimir handles file indexing) NornicDB DOES: - Receive pre-embedded nodes from Mimir - Store nodes and relationships - Provide vector similarity search using existing embeddings ``` ### From NornicDB Implementation Plans The plan references native embedding generation, but: - `LOCAL_GGUF_EMBEDDING_IMPLEMENTATION.md` is a **future RFC**, not implemented - `LOCAL_GGUF_EMBEDDING_FEASIBILITY.md` is a **feasibility study**, estimated 2-4 weeks for core + 2-3 months production-ready - No Go code exists in `nornicdb/pkg/embed/` for local GGUF execution - Current NornicDB only has Ollama/OpenAI embedder interfaces **Conclusion:** The plan assumes NornicDB can generate embeddings natively **RIGHT NOW**, but this capability doesn't exist yet. --- ## 2. Missing Foundation: No Provider Detection ### What the Plan Proposes ```typescript class DatabaseProviderDetector { async detectProvider(uri: string): Promise<DatabaseProvider> { // Method 1: Check health endpoint for NornicDB signature const healthCheck = await fetch(`${httpUrl}/health`); const headers = healthCheck.headers.get('x-database-engine'); if (headers?.includes('nornicdb')) return DatabaseProvider.NORNICDB; // Method 2: Query for NornicDB-specific procedures const result = await session.run('CALL dbms.procedures() ...'); // ... } } ``` ### What Actually Exists **Search Results:** Zero files contain: - `detectProvider` - `DatabaseProvider` enum - `database.*provider` pattern - `MIMIR_DATABASE_PROVIDER` environment variable **File:** `.env.default` - No database provider configuration exists **Conclusion:** The entire detection mechanism needs to be built from scratch. --- ## 3. No Strategy Pattern Exists ### What the Plan Proposes ```typescript interface EmbeddingStrategy { shouldGenerateEmbedding(contentType: string): boolean; generateEmbedding(content: string): Promise<number[] | null>; handleImageFile(imagePath: string): Promise<string>; } class NornicDBStrategy implements EmbeddingStrategy { /* ... */ } class Neo4jStrategy implements EmbeddingStrategy { /* ... */ } ``` **File to Create:** `src/services/EmbeddingStrategyFactory.ts` ### What Actually Exists **Search Results:** Zero files with "Strategy" pattern in `src/` **Current Architecture:** Direct service calls throughout codebase: ```typescript // GraphManager.ts (line 572, 786, 981) const result = await this.embeddingsService.generateEmbedding(textContent); // FileIndexer.ts (lines 503, 558) const chunkEmbeddings = await this.embeddingsService.generateChunkEmbeddings(enrichedContent); const embedding = await this.embeddingsService.generateEmbedding(enrichedContent); // nodes-api.ts (lines 641, 708) const chunks = await embeddingsService.generateChunkEmbeddings(textContent); const result = await embeddingsService.generateEmbedding(textContent); // UnifiedSearchService.ts (line 239) const queryEmbedding = await this.embeddingsService.generateEmbedding(query); // ConversationHistoryManager.ts (lines 226, 367) const embeddingResult = await this.embeddingsService.generateEmbedding(content); const queryEmbedding = await this.embeddingsService.generateEmbedding(query); ``` **Affected Files:** 10+ files call `embeddingsService` directly **Conclusion:** Implementing the strategy pattern requires refactoring **the entire embedding integration layer**. --- ## 4. File Structure Mismatches ### Plan's Proposed Structure ``` src/ β”œβ”€β”€ services/ β”‚ β”œβ”€β”€ EmbeddingStrategyFactory.ts ❌ Directory doesn't exist β”‚ └── VisionLanguageService.ts ❌ Wrong name β”œβ”€β”€ config/ β”‚ β”œβ”€β”€ database-provider.ts ❌ Doesn't exist β”‚ └── vl-provider.ts ❌ Doesn't exist ``` ### Actual Structure ``` src/ β”œβ”€β”€ indexing/ β”‚ β”œβ”€β”€ EmbeddingsService.ts βœ… Exists β”‚ β”œβ”€β”€ VLService.ts βœ… Exists (not VisionLanguageService) β”‚ └── FileIndexer.ts βœ… Exists β”œβ”€β”€ config/ β”‚ └── LLMConfigLoader.ts βœ… Exists (has VL config) β”œβ”€β”€ managers/ β”‚ β”œβ”€β”€ GraphManager.ts βœ… Exists β”‚ └── UnifiedSearchService.ts βœ… Exists ``` **Note:** `src/services/` directory does not exist. The plan needs updated file paths. --- ## 5. What Actually Works: VLService ### Good News: VL Service Already Exists **File:** `src/indexing/VLService.ts` (196 lines) Already implements: - βœ… OpenAI-compatible image description API - βœ… Data URL support for images - βœ… Configuration via LLMConfigLoader - βœ… Timeout handling (2 minutes default) - βœ… Error handling and connection testing **File:** `src/indexing/FileIndexer.ts` (lines 292-350) Already integrates: - βœ… Image detection and processing - βœ… VL service initialization - βœ… Two modes: describe (VL model) and direct multimodal embedding - βœ… ImageProcessor for resizing and data URL creation **File:** `src/config/LLMConfigLoader.ts` (lines 176-200) Already configures: - βœ… VL provider with fallback to general embeddings config - βœ… All VL environment variables - βœ… Model, API URL, context size, max tokens, temperature **Conclusion:** The VL integration is **ALREADY IMPLEMENTED**. The plan just needs to reference the existing `VLService` instead of creating a new "VisionLanguageService". --- ## 6. Real Integration Points ### Where Embedding Generation Happens | Location | Purpose | Lines | Complexity | |----------|---------|-------|------------| | **GraphManager.ts** | Node creation/updates with embeddings | 504-820 | High - core integration | | **FileIndexer.ts** | File indexing with chunk/full embeddings | 281-615 | High - chunking logic | | **nodes-api.ts** | REST API for node operations | 638-726 | Medium - API layer | | **UnifiedSearchService.ts** | Vector search query embeddings | 236-240 | Low - single call | | **ConversationHistoryManager.ts** | Chat message embeddings | 223-230, 364-370 | Medium - dual calls | | **DocumentParser.ts** | PDF/DOCX text extraction | 65-164 | Low - no embedding here | | **backfill-embeddings.ts** | Migration script | 89-252 | Low - one-off script | ### Key Observation The two **highest impact** locations are: 1. **GraphManager.ts** - Creates nodes with embeddings (called from everywhere) 2. **FileIndexer.ts** - Indexes files with embeddings (called from file watcher) **Strategy:** Focus detection and strategy pattern in these two files first. --- ## 7. Simpler Alternative Approach Instead of the complex strategy pattern, consider a **conditional flag approach**: ### Option A: Simple Detection with Skip Flag ```typescript // In GraphManager.ts constructor private skipEmbeddings: boolean = false; async initialize(): Promise<void> { // Detect database provider this.skipEmbeddings = await this.detectNornicDB(); if (this.skipEmbeddings) { console.log('πŸ”§ Detected NornicDB - embeddings will be handled by database'); this.embeddingsService = null; } else { console.log('πŸ”§ Detected Neo4j - Mimir will generate embeddings'); this.embeddingsService = new EmbeddingsService(); await this.embeddingsService.initialize(); } } private async detectNornicDB(): Promise<boolean> { // Simple detection: check for NornicDB-specific response const session = this.driver.session(); try { const result = await session.run('RETURN 1 as test'); const summary = result.summary; const serverAgent = summary.server?.agent || ''; return serverAgent.toLowerCase().includes('nornicdb'); } catch { return false; } finally { await session.close(); } } // Then in addNode/updateNode: if (!this.skipEmbeddings && this.embeddingsService?.isEnabled()) { // Generate embeddings } ``` **Pros:** - Much simpler implementation (1-2 days vs 19-28 hours claimed) - Minimal refactoring needed - Backward compatible - Easy to test **Cons:** - Not as elegant as strategy pattern - Conditional logic scattered across methods ### Option B: Wait for NornicDB Native Embeddings According to `LOCAL_GGUF_EMBEDDING_IMPLEMENTATION.md`: - Estimated effort: 2-4 weeks for core - Additional 2-3 months for production-ready - Not yet started **Recommendation:** Implement Option A (simple skip flag) now, then migrate to native embeddings when NornicDB implements them. --- ## 8. Updated Implementation Phases ### Phase 0: Foundation (NEW - Not in Original Plan) **Estimated:** 2-3 days - βœ… Create database provider detection utility - βœ… Add `MIMIR_DATABASE_PROVIDER` environment variable - βœ… Test detection against Neo4j and NornicDB - βœ… Document detection mechanism ### Phase 1: Detection & Configuration **Estimated:** 2-3 days (not 2-4 hours as claimed) - βœ… Implement provider detection in GraphManager - βœ… Add skip flag when NornicDB detected - βœ… Update initialization logging - βœ… Test with both databases ### Phase 2: VL Service Integration (ALREADY DONE) **Status:** βœ… **COMPLETE** - VLService.ts already exists and works - LLMConfigLoader already has VL config with fallback - FileIndexer already integrates VL service - **No work needed here** ### Phase 3: Conditional Embedding Generation **Estimated:** 3-5 days (not 4-6 hours as claimed) - Update GraphManager.addNode() to skip embeddings when flag set - Update GraphManager.updateNode() to skip embeddings when flag set - Update FileIndexer to skip embeddings when flag set - Update nodes-api.ts to skip embeddings when flag set - Ensure all 10+ integration points respect the flag ### Phase 4: Integration Testing **Estimated:** 4-6 days (not 4-6 hours as claimed) - Test with Neo4j + Mimir embeddings (existing behavior) - Test with NornicDB + skipped embeddings (new behavior) - Test VL service with both databases - Test file indexing with both databases - Test vector search with both databases - Test provider auto-detection edge cases - Test manual override via env var ### Phase 5: Documentation **Estimated:** 1-2 days - Update architecture docs - Update API docs - Add migration guide - Update environment variable docs - Add troubleshooting guide **Revised Total Estimate:** 12-19 days (not 19-28 hours) --- ## 9. Environment Variables: Actual vs Planned ### Planned Variables (Not in Codebase) ```bash MIMIR_DATABASE_PROVIDER=nornicdb # ❌ Doesn't exist MIMIR_NORNICDB_SKIP_EMBEDDINGS=true # ❌ Doesn't exist MIMIR_NORNICDB_VL_ONLY=true # ❌ Doesn't exist ``` ### Existing Variables (Already Work) ```bash # From .env.default NEO4J_URI=bolt://localhost:7687 # βœ… Exists NEO4J_USER=neo4j # βœ… Exists NEO4J_PASSWORD=password # βœ… Exists MIMIR_EMBEDDINGS_ENABLED=true # βœ… Exists # From LLMConfigLoader.ts MIMIR_EMBEDDINGS_PROVIDER=ollama # βœ… Exists MIMIR_EMBEDDINGS_MODEL=nomic-embed-text # βœ… Exists MIMIR_EMBEDDINGS_VL_PROVIDER=llama.cpp # βœ… Exists MIMIR_EMBEDDINGS_VL_API=http://... # βœ… Exists MIMIR_EMBEDDINGS_VL_MODEL=qwen2.5-vl # βœ… Exists ``` **Recommendation:** Add only `MIMIR_DATABASE_PROVIDER` for manual override. Auto-detect by default. --- ## 10. Risk Assessment ### High Risks 1. **NornicDB native embeddings may never be implemented** β†’ Use simpler skip approach 2. **Detection mechanism may be unreliable** β†’ Provide manual override 3. **Refactoring introduces bugs in existing Neo4j flow** β†’ Comprehensive testing required 4. **Performance impact of detection on startup** β†’ Cache result, don't re-detect ### Medium Risks 1. **VL service configuration complexity** β†’ Already handled by existing code 2. **Chunking behavior differs between providers** β†’ Document clearly 3. **Migration path unclear for existing users** β†’ Write migration guide ### Low Risks 1. **File path structure mismatch** β†’ Easy to fix in plan 2. **Naming inconsistency (VLService vs VisionLanguageService)** β†’ Already consistent --- ## 11. Revised Recommendations ### Immediate Actions (Week 1) 1. **Update the plan** to reflect: - NornicDB native embeddings don't exist yet - Actual file structure (`src/indexing/` not `src/services/`) - Real integration points (10+ files) - Realistic effort estimate (12-19 days) 2. **Implement simple detection**: - Add `detectNornicDB()` method to GraphManager - Add skip flag for embeddings - Test with both databases 3. **Document workaround**: - Users can manually disable embeddings with `MIMIR_EMBEDDINGS_ENABLED=false` when using NornicDB - Until native embeddings are ready ### Short-term (Weeks 2-3) 4. **Implement conditional skip logic**: - Update GraphManager methods - Update FileIndexer methods - Update API endpoints 5. **Integration testing**: - Test all code paths with both databases - Verify VL service works with both ### Long-term (Months) 6. **Wait for NornicDB native embeddings**: - Monitor `LOCAL_GGUF_EMBEDDING_IMPLEMENTATION.md` progress - When ready, migrate to native embeddings - Deprecate Mimir embedding generation for NornicDB --- ## 12. Valid Parts of the Plan These sections are **CORRECT** and should be kept: βœ… **Section 2: Vision Language Provider Configuration** - VL provider fallback to default LLM is already implemented in LLMConfigLoader - Configuration hierarchy matches existing code - Example configurations are accurate βœ… **Section 7: Configuration Changes - VL Provider Configuration** - Existing VL environment variables work as described - Priority order (explicit VL β†’ default LLM) is correct βœ… **Section 8: Backwards Compatibility Checklist - Neo4j Mode** - All Neo4j behavior is correctly preserved - Existing env vars respected βœ… **Section 9: Key Design Decisions - Decision 3: VL Service Reusability** - VLService is already reusable (see VLService.ts) - OpenAI-compatible API works as described --- ## 13. Invalid/Premature Parts of the Plan These sections need **MAJOR REVISION**: ❌ **Section 1: Database Provider Detection** - Implementation doesn't exist - Health endpoint detection untested - Cypher procedure detection may not work ❌ **Section 3: Embedding Strategy by Provider** - Strategy pattern doesn't exist - `EmbeddingStrategyFactory` needs to be created - All 10+ callsites need refactoring ❌ **Section 4: VisionLanguageService Refactor** - Already exists as VLService (different name) - Refactor not needed - Code is already unified ❌ **Section 5: File Changes Required** - File paths don't match reality - `src/services/` doesn't exist - Missing many actual integration points ❌ **Section 6: Implementation Phases** - Time estimates are 10x too optimistic - Missing Phase 0 (foundation work) - Phases 1-4 marked as complete but nothing implemented ❌ **Section 10: Code Reusability Matrix** - Assumes strategy pattern exists - "Reuse Score: 80%" is meaningless without implementation --- ## 14. Conclusion **The plan is not implementable in its current state.** ### What Must Happen First 1. **NornicDB native embeddings must be implemented** (2-4 weeks per their RFC) - OR accept that Mimir will skip embeddings when NornicDB is detected 2. **Foundation must be built**: - Database provider detection utility - Environment variable infrastructure - Testing framework for dual-database scenarios 3. **Real integration points must be identified**: - All 10+ files that call embeddingsService - Complex chunking logic in FileIndexer - Node creation/update flows in GraphManager ### Recommended Path Forward **Option A: Simple Skip Approach (Recommended)** - Implement detection + skip flag (2-3 days) - Update integration points (3-5 days) - Test thoroughly (4-6 days) - **Total: 2-3 weeks** **Option B: Wait for NornicDB Native** - Wait for `LOCAL_GGUF_EMBEDDING_IMPLEMENTATION.md` to complete (2-4 weeks + 2-3 months) - Then implement detection + delegation - **Total: 3-4 months** **Option C: Full Strategy Pattern (Not Recommended)** - Build entire abstraction layer - Refactor all callsites - Extensive testing - **Total: 4-6 weeks** --- **Status:** Ready for stakeholder decision on path forward. **Next Steps:** 1. Decide: Simple skip vs wait for native vs full strategy pattern 2. Update plan based on decision 3. Create Phase 0 foundation tasks 4. Begin implementation with realistic timeline --- *Analysis completed by Cascade AI - November 28, 2025*

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server