DollhouseMCP

DollhouseMCP
docs
architecture

ENHANCED_INDEX_ARCHITECTURE.md•11.5 KiB

# Enhanced Index Architecture Diagram

## Overview
The Enhanced Capability Index is a complex system designed to provide semantic relationships, verb-based triggers, and extensible metadata for portfolio elements. This document diagrams the architecture to identify issues and plan integration.

## Component Hierarchy

```
┌─────────────────────────────────────────────────────────────────┐
│                     EnhancedIndexManager                         │
│                        (Singleton)                               │
│  - Orchestrates entire indexing process                         │
│  - Manages file locking and caching                             │
│  - Controls index building and retrieval                        │
└─────────────────┬──────────────────────────┬────────────────────┘
                  │                          │
                  ▼                          ▼
    ┌──────────────────────┐      ┌────────────────────┐
    │  IndexConfigManager  │      │     FileLock       │
    │    (Singleton)       │      │  (Instance per     │
    │                      │      │   index file)      │
    │ - Central config     │      │                    │
    │ - Performance limits │      │ - Prevents races   │
    │ - NLP thresholds    │      │ - Stale detection  │
    └──────────────────────┘      └────────────────────┘
                  │
                  ├──────────────┬────────────┬──────────────┐
                  ▼              ▼            ▼              ▼
    ┌──────────────────┐  ┌──────────┐  ┌─────────────┐  ┌──────────────────┐
    │ NLPScoringManager│  │ VerbTrigger│ │Relationship│  │PortfolioIndexMgr│
    │  (Instance)      │  │  Manager   │ │  Manager   │  │   (Singleton)    │
    │                  │  │(Singleton) │ │(Singleton) │  │                  │
    │ - Jaccard calc   │  │            │ │            │  │ - File scanning  │
    │ - Entropy calc   │  │ - Verb     │ │ - Pattern  │  │ - Metadata parse │
    │ - LRU cache     │  │   mapping  │ │   matching │  │ - Entry creation │
    │ - Score caching │  │ - Category │ │ - Inverse  │  └──────────┬───────┘
    └──────────────────┘  │   detect  │ │   rels     │             │
                          └─────┬──────┘ └────────────┘             ▼
                                │                          ┌──────────────────┐
                                ▼                          │ PortfolioManager │
                          ⚠️ CIRCULAR!                      │   (Singleton)    │
                          Calls getIndex()                 │                  │
                          during build!                    │ - Directory scan │
                                                          │ - File validation│
                                                          └──────────────────┘
```

## Data Flow & Processing Pipeline

```
1. INDEX BUILD REQUEST
   │
   ▼
2. FILE LOCK ACQUISITION (60s timeout)
   │
   ▼
3. PORTFOLIO SCANNING
   │
   ├─> Read all .md files from:
   │   - ~/.dollhouse/portfolio/personas/
   │   - ~/.dollhouse/portfolio/skills/
   │   - ~/.dollhouse/portfolio/templates/
   │   - ~/.dollhouse/portfolio/agents/
   │   - ~/.dollhouse/portfolio/memories/
   │   - ~/.dollhouse/portfolio/ensembles/
   │
   ▼
4. METADATA EXTRACTION (Per File)
   │
   ├─> SecureYamlParser.parse()
   │   ├─> Size validation (64KB YAML, 1MB content)
   │   ├─> YAML bomb detection
   │   ├─> Unicode normalization
   │   ├─> Pattern matching (CRITICAL ISSUE HERE!)
   │   └─> Field validation
   │
   ▼
5. ELEMENT DEFINITION BUILDING
   │
   ├─> Core metadata (name, description, version)
   ├─> Search data (keywords, tags, triggers)
   ├─> Verb triggers extraction
   └─> Initial relationships
   │
   ▼
6. SEMANTIC RELATIONSHIP CALCULATION (BOTTLENECK!)
   │
   ├─> Text preparation (combine fields)
   ├─> Entropy calculation (per element)
   ├─> Similarity matrix calculation:
   │   │
   │   ├─> IF elements <= 20 THEN
   │   │   └─> Full matrix (all pairs)
   │   │       - O(n²) comparisons
   │   │       - With our fix: max 190 comparisons
   │   │
   │   └─> IF elements > 20 THEN
   │       └─> Sampled relationships
   │           ├─> Keyword clustering (60% budget)
   │           └─> Cross-type sampling (40% budget)
   │               - With our fix: max 100 total
   │
   ▼
7. RELATIONSHIP DISCOVERY
   │
   ├─> Pattern-based discovery (regex matching)
   ├─> Verb-based discovery (DISABLED - circular dep!)
   └─> Inverse relationship creation
   │
   ▼
8. INDEX PERSISTENCE
   │
   └─> Save to ~/.dollhouse/portfolio/capability-index.yaml
```

## Initialization Chain Issues

```
USER CALLS getIndex()
    │
    ▼
EnhancedIndexManager.getInstance()
    │
    ├─> Creates singleton if not exists
    │   ├─> new IndexConfigManager()
    │   ├─> new NLPScoringManager()
    │   ├─> VerbTriggerManager.getInstance()
    │   ├─> RelationshipManager.getInstance()
    │   └─> new FileLock()
    │
    ▼
manager.getIndex()
    │
    ├─> Check if index exists and is fresh
    │   └─> If stale/missing → buildIndex()
    │
    ▼
buildIndex()
    │
    ├─> Acquire file lock (ISSUE: Can timeout in tests)
    ├─> Get PortfolioIndexManager.getInstance()
    │   └─> Triggers full portfolio scan
    ├─> Process each element
    ├─> calculateSemanticRelationships()
    │   └─> ISSUE: Can run thousands of comparisons
    ├─> discoverRelationships()
    │   └─> ISSUE: VerbTriggerManager calls getIndex() = CIRCULAR!
    └─> Save index
```

## Identified Issues & Bottlenecks

### 🔴 CRITICAL Issues

1. **Circular Dependency**
   ```
   RelationshipManager.discoverVerbRelationships()
       → VerbTriggerManager.getVerbsForElement()
           → EnhancedIndexManager.getIndex()  ← CIRCULAR!
   ```
   **Status**: Fixed by disabling verb-based discovery

2. **NLP Scoring Explosion**
   - Original: Could make 50,000+ comparisons for 100 elements
   - Fixed: Limited to max 100 comparisons
   - **Remaining Issue**: Still makes many redundant calculations

### 🟡 MEDIUM Issues

3. **File Lock Conflicts**
   - Multiple processes can fight over lock
   - Tests create race conditions
   - Stale lock detection sometimes fails

4. **Security Validation False Positives**
   - Skills with "audit", "security", "scan" trigger alerts
   - Overly aggressive pattern matching
   - Blocks legitimate security testing tools

### 🟢 MINOR Issues

5. **Cache Efficiency**
   - LRU cache works but evicts too frequently
   - Cache keys use only first 50 chars (collision risk)
   - No persistent cache between runs

## Performance Analysis

### Current Timings (After Fixes)
```
Operation                  | Time    | Details
--------------------------|---------|------------------
Portfolio Scan            | ~50ms   | 50-100 files
Metadata Extraction       | ~200ms  | Includes security validation
Entropy Calculation       | ~10ms   | Per element
NLP Scoring (per pair)    | ~1ms    | With cache hit
Full Build (50 elements)  | ~5000ms | With our limits
Index Retrieval (cached)  | ~10ms   | From disk
```

### Scaling Characteristics
```
Elements | Comparisons | Time  | Strategy
---------|-------------|-------|----------
10       | 45          | 1s    | Full matrix
20       | 190         | 2s    | Full matrix
50       | 100         | 3s    | Sampled
100      | 100         | 5s    | Sampled
500      | 100         | 8s    | Sampled
```

## Integration Points with Main App

Currently **NOT INTEGRATED** - Enhanced Index is completely isolated!

```
src/index.ts (Main MCP Server)
    │
    ├─> Uses: PortfolioManager directly
    ├─> Uses: PersonaManager directly
    ├─> Uses: SkillManager directly
    └─> Does NOT use: EnhancedIndexManager ❌

Where it SHOULD integrate:
    │
    ├─> portfolio_search tool
    │   └─> Could use semantic relationships
    │
    ├─> activate_element tool
    │   └─> Could suggest related elements
    │
    ├─> get_active_elements tool
    │   └─> Could show relationships
    │
    └─> New tools:
        ├─> find_similar_elements
        ├─> get_element_relationships
        └─> search_by_verb_trigger
```

## Proposed Solutions

### Phase 1: Stabilize (Current Session)
✅ Limit comparisons aggressively
✅ Add timeout circuit breakers
✅ Disable circular dependencies
⬜ Fix security validation patterns
⬜ Improve test isolation

### Phase 2: Optimize
⬜ Implement progressive indexing (index on demand)
⬜ Add persistent cache between runs
⬜ Use worker threads for NLP calculations
⬜ Implement incremental updates (only reindex changed files)

### Phase 3: Integrate
⬜ Create new MCP tools for relationship queries
⬜ Add relationship info to existing tools
⬜ Enable verb-based triggers in main app
⬜ Add relationship-aware element suggestions

### Phase 4: Enhance
⬜ Add more relationship types
⬜ Implement element composition
⬜ Add dependency tracking
⬜ Enable cross-element validation

## Key Architectural Decisions Needed

1. **Should Enhanced Index be required or optional?**
   - Required: All users get relationships but pay performance cost
   - Optional: Faster startup but features may be missing
   - **Recommendation**: Optional with lazy loading

2. **How to handle verb triggers without circular deps?**
   - Option A: Two-phase building (relationships after index)
   - Option B: Pass index to verb manager instead of fetching
   - Option C: Separate verb index file
   - **Recommendation**: Option B

3. **What's the right comparison limit?**
   - Current: 100 (very conservative)
   - Original: 10,000+ (too high)
   - **Recommendation**: 500 with better sampling

4. **How to integrate with main app?**
   - Option A: Replace existing managers
   - Option B: Augment with optional features
   - Option C: Parallel system with migration path
   - **Recommendation**: Option B

## Next Steps

1. **Immediate** (This session):
   - Fix security validation patterns
   - Re-enable verb triggers safely
   - Increase comparison limit to 500

2. **Short term** (Next session):
   - Add integration points to main app
   - Create new relationship-aware tools
   - Implement incremental indexing

3. **Long term**:
   - Full production integration
   - Performance optimization
   - Feature enhancement

---
*Created: September 24, 2025*
*Status: Architecture documented, stabilization in progress*

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DollhouseMCP/DollhouseMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ENHANCED_INDEX_ARCHITECTURE.md•11.5 KiB

# Enhanced Index Architecture Diagram

## Overview
The Enhanced Capability Index is a complex system designed to provide semantic relationships, verb-based triggers, and extensible metadata for portfolio elements. This document diagrams the architecture to identify issues and plan integration.

## Component Hierarchy

```
┌─────────────────────────────────────────────────────────────────┐
│                     EnhancedIndexManager                         │
│                        (Singleton)                               │
│  - Orchestrates entire indexing process                         │
│  - Manages file locking and caching                             │
│  - Controls index building and retrieval                        │
└─────────────────┬──────────────────────────┬────────────────────┘
                  │                          │
                  ▼                          ▼
    ┌──────────────────────┐      ┌────────────────────┐
    │  IndexConfigManager  │      │     FileLock       │
    │    (Singleton)       │      │  (Instance per     │
    │                      │      │   index file)      │
    │ - Central config     │      │                    │
    │ - Performance limits │      │ - Prevents races   │
    │ - NLP thresholds    │      │ - Stale detection  │
    └──────────────────────┘      └────────────────────┘
                  │
                  ├──────────────┬────────────┬──────────────┐
                  ▼              ▼            ▼              ▼
    ┌──────────────────┐  ┌──────────┐  ┌─────────────┐  ┌──────────────────┐
    │ NLPScoringManager│  │ VerbTrigger│ │Relationship│  │PortfolioIndexMgr│
    │  (Instance)      │  │  Manager   │ │  Manager   │  │   (Singleton)    │
    │                  │  │(Singleton) │ │(Singleton) │  │                  │
    │ - Jaccard calc   │  │            │ │            │  │ - File scanning  │
    │ - Entropy calc   │  │ - Verb     │ │ - Pattern  │  │ - Metadata parse │
    │ - LRU cache     │  │   mapping  │ │   matching │  │ - Entry creation │
    │ - Score caching │  │ - Category │ │ - Inverse  │  └──────────┬───────┘
    └──────────────────┘  │   detect  │ │   rels     │             │
                          └─────┬──────┘ └────────────┘             ▼
                                │                          ┌──────────────────┐
                                ▼                          │ PortfolioManager │
                          ⚠️ CIRCULAR!                      │   (Singleton)    │
                          Calls getIndex()                 │                  │
                          during build!                    │ - Directory scan │
                                                          │ - File validation│
                                                          └──────────────────┘
```

## Data Flow & Processing Pipeline

```
1. INDEX BUILD REQUEST
   │
   ▼
2. FILE LOCK ACQUISITION (60s timeout)
   │
   ▼
3. PORTFOLIO SCANNING
   │
   ├─> Read all .md files from:
   │   - ~/.dollhouse/portfolio/personas/
   │   - ~/.dollhouse/portfolio/skills/
   │   - ~/.dollhouse/portfolio/templates/
   │   - ~/.dollhouse/portfolio/agents/
   │   - ~/.dollhouse/portfolio/memories/
   │   - ~/.dollhouse/portfolio/ensembles/
   │
   ▼
4. METADATA EXTRACTION (Per File)
   │
   ├─> SecureYamlParser.parse()
   │   ├─> Size validation (64KB YAML, 1MB content)
   │   ├─> YAML bomb detection
   │   ├─> Unicode normalization
   │   ├─> Pattern matching (CRITICAL ISSUE HERE!)
   │   └─> Field validation
   │
   ▼
5. ELEMENT DEFINITION BUILDING
   │
   ├─> Core metadata (name, description, version)
   ├─> Search data (keywords, tags, triggers)
   ├─> Verb triggers extraction
   └─> Initial relationships
   │
   ▼
6. SEMANTIC RELATIONSHIP CALCULATION (BOTTLENECK!)
   │
   ├─> Text preparation (combine fields)
   ├─> Entropy calculation (per element)
   ├─> Similarity matrix calculation:
   │   │
   │   ├─> IF elements <= 20 THEN
   │   │   └─> Full matrix (all pairs)
   │   │       - O(n²) comparisons
   │   │       - With our fix: max 190 comparisons
   │   │
   │   └─> IF elements > 20 THEN
   │       └─> Sampled relationships
   │           ├─> Keyword clustering (60% budget)
   │           └─> Cross-type sampling (40% budget)
   │               - With our fix: max 100 total
   │
   ▼
7. RELATIONSHIP DISCOVERY
   │
   ├─> Pattern-based discovery (regex matching)
   ├─> Verb-based discovery (DISABLED - circular dep!)
   └─> Inverse relationship creation
   │
   ▼
8. INDEX PERSISTENCE
   │
   └─> Save to ~/.dollhouse/portfolio/capability-index.yaml
```

## Initialization Chain Issues

```
USER CALLS getIndex()
    │
    ▼
EnhancedIndexManager.getInstance()
    │
    ├─> Creates singleton if not exists
    │   ├─> new IndexConfigManager()
    │   ├─> new NLPScoringManager()
    │   ├─> VerbTriggerManager.getInstance()
    │   ├─> RelationshipManager.getInstance()
    │   └─> new FileLock()
    │
    ▼
manager.getIndex()
    │
    ├─> Check if index exists and is fresh
    │   └─> If stale/missing → buildIndex()
    │
    ▼
buildIndex()
    │
    ├─> Acquire file lock (ISSUE: Can timeout in tests)
    ├─> Get PortfolioIndexManager.getInstance()
    │   └─> Triggers full portfolio scan
    ├─> Process each element
    ├─> calculateSemanticRelationships()
    │   └─> ISSUE: Can run thousands of comparisons
    ├─> discoverRelationships()
    │   └─> ISSUE: VerbTriggerManager calls getIndex() = CIRCULAR!
    └─> Save index
```

## Identified Issues & Bottlenecks

### 🔴 CRITICAL Issues

1. **Circular Dependency**
   ```
   RelationshipManager.discoverVerbRelationships()
       → VerbTriggerManager.getVerbsForElement()
           → EnhancedIndexManager.getIndex()  ← CIRCULAR!
   ```
   **Status**: Fixed by disabling verb-based discovery

2. **NLP Scoring Explosion**
   - Original: Could make 50,000+ comparisons for 100 elements
   - Fixed: Limited to max 100 comparisons
   - **Remaining Issue**: Still makes many redundant calculations

### 🟡 MEDIUM Issues

3. **File Lock Conflicts**
   - Multiple processes can fight over lock
   - Tests create race conditions
   - Stale lock detection sometimes fails

4. **Security Validation False Positives**
   - Skills with "audit", "security", "scan" trigger alerts
   - Overly aggressive pattern matching
   - Blocks legitimate security testing tools

### 🟢 MINOR Issues

5. **Cache Efficiency**
   - LRU cache works but evicts too frequently
   - Cache keys use only first 50 chars (collision risk)
   - No persistent cache between runs

## Performance Analysis

### Current Timings (After Fixes)
```
Operation                  | Time    | Details
--------------------------|---------|------------------
Portfolio Scan            | ~50ms   | 50-100 files
Metadata Extraction       | ~200ms  | Includes security validation
Entropy Calculation       | ~10ms   | Per element
NLP Scoring (per pair)    | ~1ms    | With cache hit
Full Build (50 elements)  | ~5000ms | With our limits
Index Retrieval (cached)  | ~10ms   | From disk
```

### Scaling Characteristics
```
Elements | Comparisons | Time  | Strategy
---------|-------------|-------|----------
10       | 45          | 1s    | Full matrix
20       | 190         | 2s    | Full matrix
50       | 100         | 3s    | Sampled
100      | 100         | 5s    | Sampled
500      | 100         | 8s    | Sampled
```

## Integration Points with Main App

Currently **NOT INTEGRATED** - Enhanced Index is completely isolated!

```
src/index.ts (Main MCP Server)
    │
    ├─> Uses: PortfolioManager directly
    ├─> Uses: PersonaManager directly
    ├─> Uses: SkillManager directly
    └─> Does NOT use: EnhancedIndexManager ❌

Where it SHOULD integrate:
    │
    ├─> portfolio_search tool
    │   └─> Could use semantic relationships
    │
    ├─> activate_element tool
    │   └─> Could suggest related elements
    │
    ├─> get_active_elements tool
    │   └─> Could show relationships
    │
    └─> New tools:
        ├─> find_similar_elements
        ├─> get_element_relationships
        └─> search_by_verb_trigger
```

## Proposed Solutions

### Phase 1: Stabilize (Current Session)
✅ Limit comparisons aggressively
✅ Add timeout circuit breakers
✅ Disable circular dependencies
⬜ Fix security validation patterns
⬜ Improve test isolation

### Phase 2: Optimize
⬜ Implement progressive indexing (index on demand)
⬜ Add persistent cache between runs
⬜ Use worker threads for NLP calculations
⬜ Implement incremental updates (only reindex changed files)

### Phase 3: Integrate
⬜ Create new MCP tools for relationship queries
⬜ Add relationship info to existing tools
⬜ Enable verb-based triggers in main app
⬜ Add relationship-aware element suggestions

### Phase 4: Enhance
⬜ Add more relationship types
⬜ Implement element composition
⬜ Add dependency tracking
⬜ Enable cross-element validation

## Key Architectural Decisions Needed

1. **Should Enhanced Index be required or optional?**
   - Required: All users get relationships but pay performance cost
   - Optional: Faster startup but features may be missing
   - **Recommendation**: Optional with lazy loading

2. **How to handle verb triggers without circular deps?**
   - Option A: Two-phase building (relationships after index)
   - Option B: Pass index to verb manager instead of fetching
   - Option C: Separate verb index file
   - **Recommendation**: Option B

3. **What's the right comparison limit?**
   - Current: 100 (very conservative)
   - Original: 10,000+ (too high)
   - **Recommendation**: 500 with better sampling

4. **How to integrate with main app?**
   - Option A: Replace existing managers
   - Option B: Augment with optional features
   - Option C: Parallel system with migration path
   - **Recommendation**: Option B

## Next Steps

1. **Immediate** (This session):
   - Fix security validation patterns
   - Re-enable verb triggers safely
   - Increase comparison limit to 500

2. **Short term** (Next session):
   - Add integration points to main app
   - Create new relationship-aware tools
   - Implement incremental indexing

3. **Long term**:
   - Full production integration
   - Performance optimization
   - Feature enhancement

---
*Created: September 24, 2025*
*Status: Architecture documented, stabilization in progress*