Skip to main content
Glama
ENHANCED_INDEX_ARCHITECTURE.md11.8 kB
# Enhanced Index Architecture Diagram ## Overview The Enhanced Capability Index is a complex system designed to provide semantic relationships, verb-based triggers, and extensible metadata for portfolio elements. This document diagrams the architecture to identify issues and plan integration. ## Component Hierarchy ``` ┌─────────────────────────────────────────────────────────────────┐ │ EnhancedIndexManager │ │ (Singleton) │ │ - Orchestrates entire indexing process │ │ - Manages file locking and caching │ │ - Controls index building and retrieval │ └─────────────────┬──────────────────────────┬────────────────────┘ │ │ ▼ ▼ ┌──────────────────────┐ ┌────────────────────┐ │ IndexConfigManager │ │ FileLock │ │ (Singleton) │ │ (Instance per │ │ │ │ index file) │ │ - Central config │ │ │ │ - Performance limits │ │ - Prevents races │ │ - NLP thresholds │ │ - Stale detection │ └──────────────────────┘ └────────────────────┘ │ ├──────────────┬────────────┬──────────────┐ ▼ ▼ ▼ ▼ ┌──────────────────┐ ┌──────────┐ ┌─────────────┐ ┌──────────────────┐ │ NLPScoringManager│ │ VerbTrigger│ │Relationship│ │PortfolioIndexMgr│ │ (Instance) │ │ Manager │ │ Manager │ │ (Singleton) │ │ │ │(Singleton) │ │(Singleton) │ │ │ │ - Jaccard calc │ │ │ │ │ │ - File scanning │ │ - Entropy calc │ │ - Verb │ │ - Pattern │ │ - Metadata parse │ │ - LRU cache │ │ mapping │ │ matching │ │ - Entry creation │ │ - Score caching │ │ - Category │ │ - Inverse │ └──────────┬───────┘ └──────────────────┘ │ detect │ │ rels │ │ └─────┬──────┘ └────────────┘ ▼ │ ┌──────────────────┐ ▼ │ PortfolioManager │ ⚠️ CIRCULAR! │ (Singleton) │ Calls getIndex() │ │ during build! │ - Directory scan │ │ - File validation│ └──────────────────┘ ``` ## Data Flow & Processing Pipeline ``` 1. INDEX BUILD REQUEST │ ▼ 2. FILE LOCK ACQUISITION (60s timeout) │ ▼ 3. PORTFOLIO SCANNING │ ├─> Read all .md files from: │ - ~/.dollhouse/portfolio/personas/ │ - ~/.dollhouse/portfolio/skills/ │ - ~/.dollhouse/portfolio/templates/ │ - ~/.dollhouse/portfolio/agents/ │ - ~/.dollhouse/portfolio/memories/ │ - ~/.dollhouse/portfolio/ensembles/ │ ▼ 4. METADATA EXTRACTION (Per File) │ ├─> SecureYamlParser.parse() │ ├─> Size validation (64KB YAML, 1MB content) │ ├─> YAML bomb detection │ ├─> Unicode normalization │ ├─> Pattern matching (CRITICAL ISSUE HERE!) │ └─> Field validation │ ▼ 5. ELEMENT DEFINITION BUILDING │ ├─> Core metadata (name, description, version) ├─> Search data (keywords, tags, triggers) ├─> Verb triggers extraction └─> Initial relationships │ ▼ 6. SEMANTIC RELATIONSHIP CALCULATION (BOTTLENECK!) │ ├─> Text preparation (combine fields) ├─> Entropy calculation (per element) ├─> Similarity matrix calculation: │ │ │ ├─> IF elements <= 20 THEN │ │ └─> Full matrix (all pairs) │ │ - O(n²) comparisons │ │ - With our fix: max 190 comparisons │ │ │ └─> IF elements > 20 THEN │ └─> Sampled relationships │ ├─> Keyword clustering (60% budget) │ └─> Cross-type sampling (40% budget) │ - With our fix: max 100 total │ ▼ 7. RELATIONSHIP DISCOVERY │ ├─> Pattern-based discovery (regex matching) ├─> Verb-based discovery (DISABLED - circular dep!) └─> Inverse relationship creation │ ▼ 8. INDEX PERSISTENCE │ └─> Save to ~/.dollhouse/portfolio/capability-index.yaml ``` ## Initialization Chain Issues ``` USER CALLS getIndex() │ ▼ EnhancedIndexManager.getInstance() │ ├─> Creates singleton if not exists │ ├─> new IndexConfigManager() │ ├─> new NLPScoringManager() │ ├─> VerbTriggerManager.getInstance() │ ├─> RelationshipManager.getInstance() │ └─> new FileLock() │ ▼ manager.getIndex() │ ├─> Check if index exists and is fresh │ └─> If stale/missing → buildIndex() │ ▼ buildIndex() │ ├─> Acquire file lock (ISSUE: Can timeout in tests) ├─> Get PortfolioIndexManager.getInstance() │ └─> Triggers full portfolio scan ├─> Process each element ├─> calculateSemanticRelationships() │ └─> ISSUE: Can run thousands of comparisons ├─> discoverRelationships() │ └─> ISSUE: VerbTriggerManager calls getIndex() = CIRCULAR! └─> Save index ``` ## Identified Issues & Bottlenecks ### 🔴 CRITICAL Issues 1. **Circular Dependency** ``` RelationshipManager.discoverVerbRelationships() → VerbTriggerManager.getVerbsForElement() → EnhancedIndexManager.getIndex() ← CIRCULAR! ``` **Status**: Fixed by disabling verb-based discovery 2. **NLP Scoring Explosion** - Original: Could make 50,000+ comparisons for 100 elements - Fixed: Limited to max 100 comparisons - **Remaining Issue**: Still makes many redundant calculations ### 🟡 MEDIUM Issues 3. **File Lock Conflicts** - Multiple processes can fight over lock - Tests create race conditions - Stale lock detection sometimes fails 4. **Security Validation False Positives** - Skills with "audit", "security", "scan" trigger alerts - Overly aggressive pattern matching - Blocks legitimate security testing tools ### 🟢 MINOR Issues 5. **Cache Efficiency** - LRU cache works but evicts too frequently - Cache keys use only first 50 chars (collision risk) - No persistent cache between runs ## Performance Analysis ### Current Timings (After Fixes) ``` Operation | Time | Details --------------------------|---------|------------------ Portfolio Scan | ~50ms | 50-100 files Metadata Extraction | ~200ms | Includes security validation Entropy Calculation | ~10ms | Per element NLP Scoring (per pair) | ~1ms | With cache hit Full Build (50 elements) | ~5000ms | With our limits Index Retrieval (cached) | ~10ms | From disk ``` ### Scaling Characteristics ``` Elements | Comparisons | Time | Strategy ---------|-------------|-------|---------- 10 | 45 | 1s | Full matrix 20 | 190 | 2s | Full matrix 50 | 100 | 3s | Sampled 100 | 100 | 5s | Sampled 500 | 100 | 8s | Sampled ``` ## Integration Points with Main App Currently **NOT INTEGRATED** - Enhanced Index is completely isolated! ``` src/index.ts (Main MCP Server) │ ├─> Uses: PortfolioManager directly ├─> Uses: PersonaManager directly ├─> Uses: SkillManager directly └─> Does NOT use: EnhancedIndexManager ❌ Where it SHOULD integrate: │ ├─> portfolio_search tool │ └─> Could use semantic relationships │ ├─> activate_element tool │ └─> Could suggest related elements │ ├─> get_active_elements tool │ └─> Could show relationships │ └─> New tools: ├─> find_similar_elements ├─> get_element_relationships └─> search_by_verb_trigger ``` ## Proposed Solutions ### Phase 1: Stabilize (Current Session) ✅ Limit comparisons aggressively ✅ Add timeout circuit breakers ✅ Disable circular dependencies ⬜ Fix security validation patterns ⬜ Improve test isolation ### Phase 2: Optimize ⬜ Implement progressive indexing (index on demand) ⬜ Add persistent cache between runs ⬜ Use worker threads for NLP calculations ⬜ Implement incremental updates (only reindex changed files) ### Phase 3: Integrate ⬜ Create new MCP tools for relationship queries ⬜ Add relationship info to existing tools ⬜ Enable verb-based triggers in main app ⬜ Add relationship-aware element suggestions ### Phase 4: Enhance ⬜ Add more relationship types ⬜ Implement element composition ⬜ Add dependency tracking ⬜ Enable cross-element validation ## Key Architectural Decisions Needed 1. **Should Enhanced Index be required or optional?** - Required: All users get relationships but pay performance cost - Optional: Faster startup but features may be missing - **Recommendation**: Optional with lazy loading 2. **How to handle verb triggers without circular deps?** - Option A: Two-phase building (relationships after index) - Option B: Pass index to verb manager instead of fetching - Option C: Separate verb index file - **Recommendation**: Option B 3. **What's the right comparison limit?** - Current: 100 (very conservative) - Original: 10,000+ (too high) - **Recommendation**: 500 with better sampling 4. **How to integrate with main app?** - Option A: Replace existing managers - Option B: Augment with optional features - Option C: Parallel system with migration path - **Recommendation**: Option B ## Next Steps 1. **Immediate** (This session): - Fix security validation patterns - Re-enable verb triggers safely - Increase comparison limit to 500 2. **Short term** (Next session): - Add integration points to main app - Create new relationship-aware tools - Implement incremental indexing 3. **Long term**: - Full production integration - Performance optimization - Feature enhancement --- *Created: September 24, 2025* *Status: Architecture documented, stabilization in progress*

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DollhouseMCP/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server