Skip to main content
Glama

DollhouseMCP

by DollhouseMCP
ENHANCED_INDEX_ARCHITECTURE.mdβ€’11.8 kB
# Enhanced Index Architecture Diagram ## Overview The Enhanced Capability Index is a complex system designed to provide semantic relationships, verb-based triggers, and extensible metadata for portfolio elements. This document diagrams the architecture to identify issues and plan integration. ## Component Hierarchy ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ EnhancedIndexManager β”‚ β”‚ (Singleton) β”‚ β”‚ - Orchestrates entire indexing process β”‚ β”‚ - Manages file locking and caching β”‚ β”‚ - Controls index building and retrieval β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ IndexConfigManager β”‚ β”‚ FileLock β”‚ β”‚ (Singleton) β”‚ β”‚ (Instance per β”‚ β”‚ β”‚ β”‚ index file) β”‚ β”‚ - Central config β”‚ β”‚ β”‚ β”‚ - Performance limits β”‚ β”‚ - Prevents races β”‚ β”‚ - NLP thresholds β”‚ β”‚ - Stale detection β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β–Ό β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ NLPScoringManagerβ”‚ β”‚ VerbTriggerβ”‚ β”‚Relationshipβ”‚ β”‚PortfolioIndexMgrβ”‚ β”‚ (Instance) β”‚ β”‚ Manager β”‚ β”‚ Manager β”‚ β”‚ (Singleton) β”‚ β”‚ β”‚ β”‚(Singleton) β”‚ β”‚(Singleton) β”‚ β”‚ β”‚ β”‚ - Jaccard calc β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ - File scanning β”‚ β”‚ - Entropy calc β”‚ β”‚ - Verb β”‚ β”‚ - Pattern β”‚ β”‚ - Metadata parse β”‚ β”‚ - LRU cache β”‚ β”‚ mapping β”‚ β”‚ matching β”‚ β”‚ - Entry creation β”‚ β”‚ - Score caching β”‚ β”‚ - Category β”‚ β”‚ - Inverse β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ detect β”‚ β”‚ rels β”‚ β”‚ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β–Ό β”‚ PortfolioManager β”‚ ⚠️ CIRCULAR! β”‚ (Singleton) β”‚ Calls getIndex() β”‚ β”‚ during build! β”‚ - Directory scan β”‚ β”‚ - File validationβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Data Flow & Processing Pipeline ``` 1. INDEX BUILD REQUEST β”‚ β–Ό 2. FILE LOCK ACQUISITION (60s timeout) β”‚ β–Ό 3. PORTFOLIO SCANNING β”‚ β”œβ”€> Read all .md files from: β”‚ - ~/.dollhouse/portfolio/personas/ β”‚ - ~/.dollhouse/portfolio/skills/ β”‚ - ~/.dollhouse/portfolio/templates/ β”‚ - ~/.dollhouse/portfolio/agents/ β”‚ - ~/.dollhouse/portfolio/memories/ β”‚ - ~/.dollhouse/portfolio/ensembles/ β”‚ β–Ό 4. METADATA EXTRACTION (Per File) β”‚ β”œβ”€> SecureYamlParser.parse() β”‚ β”œβ”€> Size validation (64KB YAML, 1MB content) β”‚ β”œβ”€> YAML bomb detection β”‚ β”œβ”€> Unicode normalization β”‚ β”œβ”€> Pattern matching (CRITICAL ISSUE HERE!) β”‚ └─> Field validation β”‚ β–Ό 5. ELEMENT DEFINITION BUILDING β”‚ β”œβ”€> Core metadata (name, description, version) β”œβ”€> Search data (keywords, tags, triggers) β”œβ”€> Verb triggers extraction └─> Initial relationships β”‚ β–Ό 6. SEMANTIC RELATIONSHIP CALCULATION (BOTTLENECK!) β”‚ β”œβ”€> Text preparation (combine fields) β”œβ”€> Entropy calculation (per element) β”œβ”€> Similarity matrix calculation: β”‚ β”‚ β”‚ β”œβ”€> IF elements <= 20 THEN β”‚ β”‚ └─> Full matrix (all pairs) β”‚ β”‚ - O(nΒ²) comparisons β”‚ β”‚ - With our fix: max 190 comparisons β”‚ β”‚ β”‚ └─> IF elements > 20 THEN β”‚ └─> Sampled relationships β”‚ β”œβ”€> Keyword clustering (60% budget) β”‚ └─> Cross-type sampling (40% budget) β”‚ - With our fix: max 100 total β”‚ β–Ό 7. RELATIONSHIP DISCOVERY β”‚ β”œβ”€> Pattern-based discovery (regex matching) β”œβ”€> Verb-based discovery (DISABLED - circular dep!) └─> Inverse relationship creation β”‚ β–Ό 8. INDEX PERSISTENCE β”‚ └─> Save to ~/.dollhouse/portfolio/capability-index.yaml ``` ## Initialization Chain Issues ``` USER CALLS getIndex() β”‚ β–Ό EnhancedIndexManager.getInstance() β”‚ β”œβ”€> Creates singleton if not exists β”‚ β”œβ”€> new IndexConfigManager() β”‚ β”œβ”€> new NLPScoringManager() β”‚ β”œβ”€> VerbTriggerManager.getInstance() β”‚ β”œβ”€> RelationshipManager.getInstance() β”‚ └─> new FileLock() β”‚ β–Ό manager.getIndex() β”‚ β”œβ”€> Check if index exists and is fresh β”‚ └─> If stale/missing β†’ buildIndex() β”‚ β–Ό buildIndex() β”‚ β”œβ”€> Acquire file lock (ISSUE: Can timeout in tests) β”œβ”€> Get PortfolioIndexManager.getInstance() β”‚ └─> Triggers full portfolio scan β”œβ”€> Process each element β”œβ”€> calculateSemanticRelationships() β”‚ └─> ISSUE: Can run thousands of comparisons β”œβ”€> discoverRelationships() β”‚ └─> ISSUE: VerbTriggerManager calls getIndex() = CIRCULAR! └─> Save index ``` ## Identified Issues & Bottlenecks ### πŸ”΄ CRITICAL Issues 1. **Circular Dependency** ``` RelationshipManager.discoverVerbRelationships() β†’ VerbTriggerManager.getVerbsForElement() β†’ EnhancedIndexManager.getIndex() ← CIRCULAR! ``` **Status**: Fixed by disabling verb-based discovery 2. **NLP Scoring Explosion** - Original: Could make 50,000+ comparisons for 100 elements - Fixed: Limited to max 100 comparisons - **Remaining Issue**: Still makes many redundant calculations ### 🟑 MEDIUM Issues 3. **File Lock Conflicts** - Multiple processes can fight over lock - Tests create race conditions - Stale lock detection sometimes fails 4. **Security Validation False Positives** - Skills with "audit", "security", "scan" trigger alerts - Overly aggressive pattern matching - Blocks legitimate security testing tools ### 🟒 MINOR Issues 5. **Cache Efficiency** - LRU cache works but evicts too frequently - Cache keys use only first 50 chars (collision risk) - No persistent cache between runs ## Performance Analysis ### Current Timings (After Fixes) ``` Operation | Time | Details --------------------------|---------|------------------ Portfolio Scan | ~50ms | 50-100 files Metadata Extraction | ~200ms | Includes security validation Entropy Calculation | ~10ms | Per element NLP Scoring (per pair) | ~1ms | With cache hit Full Build (50 elements) | ~5000ms | With our limits Index Retrieval (cached) | ~10ms | From disk ``` ### Scaling Characteristics ``` Elements | Comparisons | Time | Strategy ---------|-------------|-------|---------- 10 | 45 | 1s | Full matrix 20 | 190 | 2s | Full matrix 50 | 100 | 3s | Sampled 100 | 100 | 5s | Sampled 500 | 100 | 8s | Sampled ``` ## Integration Points with Main App Currently **NOT INTEGRATED** - Enhanced Index is completely isolated! ``` src/index.ts (Main MCP Server) β”‚ β”œβ”€> Uses: PortfolioManager directly β”œβ”€> Uses: PersonaManager directly β”œβ”€> Uses: SkillManager directly └─> Does NOT use: EnhancedIndexManager ❌ Where it SHOULD integrate: β”‚ β”œβ”€> portfolio_search tool β”‚ └─> Could use semantic relationships β”‚ β”œβ”€> activate_element tool β”‚ └─> Could suggest related elements β”‚ β”œβ”€> get_active_elements tool β”‚ └─> Could show relationships β”‚ └─> New tools: β”œβ”€> find_similar_elements β”œβ”€> get_element_relationships └─> search_by_verb_trigger ``` ## Proposed Solutions ### Phase 1: Stabilize (Current Session) βœ… Limit comparisons aggressively βœ… Add timeout circuit breakers βœ… Disable circular dependencies ⬜ Fix security validation patterns ⬜ Improve test isolation ### Phase 2: Optimize ⬜ Implement progressive indexing (index on demand) ⬜ Add persistent cache between runs ⬜ Use worker threads for NLP calculations ⬜ Implement incremental updates (only reindex changed files) ### Phase 3: Integrate ⬜ Create new MCP tools for relationship queries ⬜ Add relationship info to existing tools ⬜ Enable verb-based triggers in main app ⬜ Add relationship-aware element suggestions ### Phase 4: Enhance ⬜ Add more relationship types ⬜ Implement element composition ⬜ Add dependency tracking ⬜ Enable cross-element validation ## Key Architectural Decisions Needed 1. **Should Enhanced Index be required or optional?** - Required: All users get relationships but pay performance cost - Optional: Faster startup but features may be missing - **Recommendation**: Optional with lazy loading 2. **How to handle verb triggers without circular deps?** - Option A: Two-phase building (relationships after index) - Option B: Pass index to verb manager instead of fetching - Option C: Separate verb index file - **Recommendation**: Option B 3. **What's the right comparison limit?** - Current: 100 (very conservative) - Original: 10,000+ (too high) - **Recommendation**: 500 with better sampling 4. **How to integrate with main app?** - Option A: Replace existing managers - Option B: Augment with optional features - Option C: Parallel system with migration path - **Recommendation**: Option B ## Next Steps 1. **Immediate** (This session): - Fix security validation patterns - Re-enable verb triggers safely - Increase comparison limit to 500 2. **Short term** (Next session): - Add integration points to main app - Create new relationship-aware tools - Implement incremental indexing 3. **Long term**: - Full production integration - Performance optimization - Feature enhancement --- *Created: September 24, 2025* *Status: Architecture documented, stabilization in progress*

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DollhouseMCP/DollhouseMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server