ENHANCED_INDEX_ARCHITECTURE.mdβ’11.8 kB
# Enhanced Index Architecture Diagram
## Overview
The Enhanced Capability Index is a complex system designed to provide semantic relationships, verb-based triggers, and extensible metadata for portfolio elements. This document diagrams the architecture to identify issues and plan integration.
## Component Hierarchy
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EnhancedIndexManager β
β (Singleton) β
β - Orchestrates entire indexing process β
β - Manages file locking and caching β
β - Controls index building and retrieval β
βββββββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββ ββββββββββββββββββββββ
β IndexConfigManager β β FileLock β
β (Singleton) β β (Instance per β
β β β index file) β
β - Central config β β β
β - Performance limits β β - Prevents races β
β - NLP thresholds β β - Stale detection β
ββββββββββββββββββββββββ ββββββββββββββββββββββ
β
ββββββββββββββββ¬βββββββββββββ¬βββββββββββββββ
βΌ βΌ βΌ βΌ
ββββββββββββββββββββ ββββββββββββ βββββββββββββββ ββββββββββββββββββββ
β NLPScoringManagerβ β VerbTriggerβ βRelationshipβ βPortfolioIndexMgrβ
β (Instance) β β Manager β β Manager β β (Singleton) β
β β β(Singleton) β β(Singleton) β β β
β - Jaccard calc β β β β β β - File scanning β
β - Entropy calc β β - Verb β β - Pattern β β - Metadata parse β
β - LRU cache β β mapping β β matching β β - Entry creation β
β - Score caching β β - Category β β - Inverse β ββββββββββββ¬ββββββββ
ββββββββββββββββββββ β detect β β rels β β
βββββββ¬βββββββ ββββββββββββββ βΌ
β ββββββββββββββββββββ
βΌ β PortfolioManager β
β οΈ CIRCULAR! β (Singleton) β
Calls getIndex() β β
during build! β - Directory scan β
β - File validationβ
ββββββββββββββββββββ
```
## Data Flow & Processing Pipeline
```
1. INDEX BUILD REQUEST
β
βΌ
2. FILE LOCK ACQUISITION (60s timeout)
β
βΌ
3. PORTFOLIO SCANNING
β
ββ> Read all .md files from:
β - ~/.dollhouse/portfolio/personas/
β - ~/.dollhouse/portfolio/skills/
β - ~/.dollhouse/portfolio/templates/
β - ~/.dollhouse/portfolio/agents/
β - ~/.dollhouse/portfolio/memories/
β - ~/.dollhouse/portfolio/ensembles/
β
βΌ
4. METADATA EXTRACTION (Per File)
β
ββ> SecureYamlParser.parse()
β ββ> Size validation (64KB YAML, 1MB content)
β ββ> YAML bomb detection
β ββ> Unicode normalization
β ββ> Pattern matching (CRITICAL ISSUE HERE!)
β ββ> Field validation
β
βΌ
5. ELEMENT DEFINITION BUILDING
β
ββ> Core metadata (name, description, version)
ββ> Search data (keywords, tags, triggers)
ββ> Verb triggers extraction
ββ> Initial relationships
β
βΌ
6. SEMANTIC RELATIONSHIP CALCULATION (BOTTLENECK!)
β
ββ> Text preparation (combine fields)
ββ> Entropy calculation (per element)
ββ> Similarity matrix calculation:
β β
β ββ> IF elements <= 20 THEN
β β ββ> Full matrix (all pairs)
β β - O(nΒ²) comparisons
β β - With our fix: max 190 comparisons
β β
β ββ> IF elements > 20 THEN
β ββ> Sampled relationships
β ββ> Keyword clustering (60% budget)
β ββ> Cross-type sampling (40% budget)
β - With our fix: max 100 total
β
βΌ
7. RELATIONSHIP DISCOVERY
β
ββ> Pattern-based discovery (regex matching)
ββ> Verb-based discovery (DISABLED - circular dep!)
ββ> Inverse relationship creation
β
βΌ
8. INDEX PERSISTENCE
β
ββ> Save to ~/.dollhouse/portfolio/capability-index.yaml
```
## Initialization Chain Issues
```
USER CALLS getIndex()
β
βΌ
EnhancedIndexManager.getInstance()
β
ββ> Creates singleton if not exists
β ββ> new IndexConfigManager()
β ββ> new NLPScoringManager()
β ββ> VerbTriggerManager.getInstance()
β ββ> RelationshipManager.getInstance()
β ββ> new FileLock()
β
βΌ
manager.getIndex()
β
ββ> Check if index exists and is fresh
β ββ> If stale/missing β buildIndex()
β
βΌ
buildIndex()
β
ββ> Acquire file lock (ISSUE: Can timeout in tests)
ββ> Get PortfolioIndexManager.getInstance()
β ββ> Triggers full portfolio scan
ββ> Process each element
ββ> calculateSemanticRelationships()
β ββ> ISSUE: Can run thousands of comparisons
ββ> discoverRelationships()
β ββ> ISSUE: VerbTriggerManager calls getIndex() = CIRCULAR!
ββ> Save index
```
## Identified Issues & Bottlenecks
### π΄ CRITICAL Issues
1. **Circular Dependency**
```
RelationshipManager.discoverVerbRelationships()
β VerbTriggerManager.getVerbsForElement()
β EnhancedIndexManager.getIndex() β CIRCULAR!
```
**Status**: Fixed by disabling verb-based discovery
2. **NLP Scoring Explosion**
- Original: Could make 50,000+ comparisons for 100 elements
- Fixed: Limited to max 100 comparisons
- **Remaining Issue**: Still makes many redundant calculations
### π‘ MEDIUM Issues
3. **File Lock Conflicts**
- Multiple processes can fight over lock
- Tests create race conditions
- Stale lock detection sometimes fails
4. **Security Validation False Positives**
- Skills with "audit", "security", "scan" trigger alerts
- Overly aggressive pattern matching
- Blocks legitimate security testing tools
### π’ MINOR Issues
5. **Cache Efficiency**
- LRU cache works but evicts too frequently
- Cache keys use only first 50 chars (collision risk)
- No persistent cache between runs
## Performance Analysis
### Current Timings (After Fixes)
```
Operation | Time | Details
--------------------------|---------|------------------
Portfolio Scan | ~50ms | 50-100 files
Metadata Extraction | ~200ms | Includes security validation
Entropy Calculation | ~10ms | Per element
NLP Scoring (per pair) | ~1ms | With cache hit
Full Build (50 elements) | ~5000ms | With our limits
Index Retrieval (cached) | ~10ms | From disk
```
### Scaling Characteristics
```
Elements | Comparisons | Time | Strategy
---------|-------------|-------|----------
10 | 45 | 1s | Full matrix
20 | 190 | 2s | Full matrix
50 | 100 | 3s | Sampled
100 | 100 | 5s | Sampled
500 | 100 | 8s | Sampled
```
## Integration Points with Main App
Currently **NOT INTEGRATED** - Enhanced Index is completely isolated!
```
src/index.ts (Main MCP Server)
β
ββ> Uses: PortfolioManager directly
ββ> Uses: PersonaManager directly
ββ> Uses: SkillManager directly
ββ> Does NOT use: EnhancedIndexManager β
Where it SHOULD integrate:
β
ββ> portfolio_search tool
β ββ> Could use semantic relationships
β
ββ> activate_element tool
β ββ> Could suggest related elements
β
ββ> get_active_elements tool
β ββ> Could show relationships
β
ββ> New tools:
ββ> find_similar_elements
ββ> get_element_relationships
ββ> search_by_verb_trigger
```
## Proposed Solutions
### Phase 1: Stabilize (Current Session)
β
Limit comparisons aggressively
β
Add timeout circuit breakers
β
Disable circular dependencies
β¬ Fix security validation patterns
β¬ Improve test isolation
### Phase 2: Optimize
β¬ Implement progressive indexing (index on demand)
β¬ Add persistent cache between runs
β¬ Use worker threads for NLP calculations
β¬ Implement incremental updates (only reindex changed files)
### Phase 3: Integrate
β¬ Create new MCP tools for relationship queries
β¬ Add relationship info to existing tools
β¬ Enable verb-based triggers in main app
β¬ Add relationship-aware element suggestions
### Phase 4: Enhance
β¬ Add more relationship types
β¬ Implement element composition
β¬ Add dependency tracking
β¬ Enable cross-element validation
## Key Architectural Decisions Needed
1. **Should Enhanced Index be required or optional?**
- Required: All users get relationships but pay performance cost
- Optional: Faster startup but features may be missing
- **Recommendation**: Optional with lazy loading
2. **How to handle verb triggers without circular deps?**
- Option A: Two-phase building (relationships after index)
- Option B: Pass index to verb manager instead of fetching
- Option C: Separate verb index file
- **Recommendation**: Option B
3. **What's the right comparison limit?**
- Current: 100 (very conservative)
- Original: 10,000+ (too high)
- **Recommendation**: 500 with better sampling
4. **How to integrate with main app?**
- Option A: Replace existing managers
- Option B: Augment with optional features
- Option C: Parallel system with migration path
- **Recommendation**: Option B
## Next Steps
1. **Immediate** (This session):
- Fix security validation patterns
- Re-enable verb triggers safely
- Increase comparison limit to 500
2. **Short term** (Next session):
- Add integration points to main app
- Create new relationship-aware tools
- Implement incremental indexing
3. **Long term**:
- Full production integration
- Performance optimization
- Feature enhancement
---
*Created: September 24, 2025*
*Status: Architecture documented, stabilization in progress*