# Logging Enhancement PRD
**folder-mcp Observability & Monitoring Improvement**
## Executive Summary
Following comprehensive analysis using the TMOAT (The Mother Of All Tests) methodology and deep code review, this PRD outlines critical improvements to the folder-mcp logging system. The current logging architecture generates significant noise while missing essential operational context, making debugging and system monitoring unnecessarily difficult.
**Primary Issues Identified:**
1. **Log Spam**: Repetitive, uninformative messages that overwhelm logs
2. **Missing Context**: Vague messages that don't provide actionable information
3. **Poor Signal-to-Noise Ratio**: Critical information buried in routine operations
4. **Inconsistent Categorization**: Wrong log levels for routine vs exceptional events
## Problem Analysis
### Critical Issue #1: Memory Monitoring Spam
**Current Behavior:**
```
2025-08-19T18:04:41.938Z WARN [ORCHESTRATOR] High memory usage detected | {"heapUsedMB":27,"heapUtilizationPercent":91,"recommendation":"Consider reducing concurrent operations or restarting daemon"}
2025-08-19T18:04:51.939Z WARN [ORCHESTRATOR] High memory usage detected | {"heapUsedMB":28,"heapUtilizationPercent":91,"recommendation":"Consider reducing concurrent operations or restarting daemon"}
```
**Problems:**
- Triggers on **27-28MB** usage, which is extremely low for a Node.js daemon
- **Every 10 seconds** without any context about what's normal
- **91% heap utilization** might be normal Node.js behavior with small heap
- Users can't distinguish between normal memory pressure and actual memory leaks
- Provides generic recommendation without system context
**Root Cause:** Hardcoded thresholds (85% utilization, 400MB absolute) without considering:
- System total memory
- Normal Node.js heap behavior
- Application baseline memory usage
- Memory growth trends vs absolute values
### Critical Issue #2: Uninformative Model Validation
**Current Behavior:**
```
2025-08-19T14:55:13.562Z INFO [folder-mcp] Handling model list request
2025-08-19T14:55:13.567Z INFO [folder-mcp] Handling model list request
2025-08-19T14:55:14.561Z INFO [folder-mcp] Handling model list request
```
**Problems:**
- Shows **WHAT** (model list access) but not **WHY** (validation context)
- No information about **WHO** requested it (client ID, validation trigger)
- No **OUTCOME** (validation passed/failed, models returned)
- Multiple identical messages with no differentiation
- INFO level for internal validation operations
**Root Cause Analysis:**
Location: `src/daemon/websocket/handlers/model-handlers.ts:36`
```typescript
this.logger.info('Handling model list request');
```
This logs the entry point but provides no context about:
- Request source (validation vs user request)
- Request parameters
- Processing outcome
- Performance metrics
### Additional Logging Issues
**3. Protocol Errors as Warnings**
- WebSocket protocol errors logged as WARN instead of DEBUG/INFO
- Missing request IDs make protocol debugging difficult
**4. Missing Operational Context**
- No startup performance metrics
- Folder indexing progress without context
- Missing transaction tracing for complex operations
**5. Inconsistent Log Levels**
- Routine operations at INFO that should be DEBUG
- System warnings for normal conditions
- Missing ERROR logs for actual failures
## Solution Architecture
### 1. Intelligent Memory Monitoring
**Replace Current:** Fixed threshold warnings
**With:** Adaptive, context-aware monitoring
```typescript
interface MemoryProfile {
baseline: number; // MB - established after 5 minutes
trend: 'stable' | 'growing' | 'oscillating';
growthRate: number; // MB/hour
lastGC: Date;
systemMemory: number; // Total system memory
}
interface MemoryAlert {
level: 'info' | 'warn' | 'critical';
trigger: 'absolute' | 'growth' | 'utilization' | 'leak';
context: string;
recommendation: string;
}
```
**New Memory Logging Strategy:**
- **Baseline establishment**: Silent monitoring for 5 minutes to establish normal usage
- **Growth-based alerts**: Warn on sustained growth (>10MB/hour), not snapshots
- **System-aware thresholds**: Scale warnings based on available system memory
- **Smart intervals**:
- Normal conditions: Log every 5 minutes at DEBUG level
- Growth detected: Every 30 seconds at INFO level
- Critical conditions: Every 10 seconds at WARN level
### 2. Context-Rich Request Logging
**Replace Current:** Generic operation logs
**With:** Request lifecycle tracing
```typescript
interface RequestContext {
requestId: string;
clientId: string;
operation: string;
trigger: 'user' | 'validation' | 'system';
startTime: Date;
parameters?: Record<string, any>;
}
interface OperationOutcome {
requestId: string;
success: boolean;
duration: number;
itemsProcessed?: number;
errorCode?: string;
cacheHit?: boolean;
}
```
**New Request Logging Pattern:**
```typescript
// Instead of: "Handling model list request"
logger.info('Model validation request', {
requestId: 'req_1234567890',
clientId: 'client-abc123',
trigger: 'folder_add_validation',
requestedModel: 'gpu:all-MiniLM-L6-v2'
});
// Include outcome:
logger.info('Model validation completed', {
requestId: 'req_1234567890',
success: true,
duration: 15,
modelAvailable: true,
cacheHit: false
});
```
### 3. Log Level Rationalization
**New Level Guidelines:**
| Level | Purpose | Examples |
|-------|---------|----------|
| **DEBUG** | Development/troubleshooting | Protocol handshakes, internal state transitions, cache hits |
| **INFO** | Normal operations that users care about | Folder added, indexing completed, client connections |
| **WARN** | Concerning but recoverable conditions | Model not cached, temporary file access issues, performance degradation |
| **ERROR** | Operation failures requiring attention | Database connection failed, invalid configuration, indexing failures |
| **FATAL** | System-level failures requiring restart | Unrecoverable errors, critical resource exhaustion |
### 4. Enhanced Operational Visibility
**Add Missing Context:**
**Folder Lifecycle Logging:**
```typescript
logger.info('Folder indexing started', {
path: '/path/to/folder',
estimatedFiles: 1247,
model: 'gpu:all-MiniLM-L6-v2',
indexingId: 'idx_789012345'
});
logger.info('Folder indexing progress', {
indexingId: 'idx_789012345',
progress: 65,
filesProcessed: 810,
filesRemaining: 437,
estimatedCompletion: '2025-08-19T18:15:30Z'
});
logger.info('Folder indexing completed', {
indexingId: 'idx_789012345',
totalFiles: 1247,
duration: 145,
averageProcessingTime: 0.116,
databaseSize: '15.2MB'
});
```
**Performance Telemetry:**
```typescript
logger.info('System performance snapshot', {
activeConnections: 3,
activeFolders: 5,
totalDocuments: 15420,
memoryBaseline: 32,
memoryTrend: 'stable',
avgQueryTime: 0.045
});
```
## Implementation Strategy
### Phase 1: Critical Fixes (Week 1)
**Priority 1: Memory Monitoring Intelligence**
- Replace hardcoded thresholds with adaptive monitoring
- Implement baseline establishment logic
- Add system memory context to warnings
- Reduce logging frequency for normal conditions
**Priority 2: Request Context Enhancement**
- Add request ID generation to all WebSocket messages
- Enhance model handler logging with full context
- Include operation outcomes in logs
### Phase 2: Operational Improvements (Week 2)
**Priority 3: Log Level Corrections**
- Audit all existing log statements for appropriate levels
- Move routine operations from INFO to DEBUG
- Ensure actual errors are logged at ERROR level
**Priority 4: Enhanced Visibility**
- Add folder lifecycle progress logging
- Implement performance telemetry snapshots
- Add startup/shutdown performance metrics
### Phase 3: Advanced Features (Week 3)
**Priority 5: Structured Logging Standards**
- Standardize log message formats across components
- Implement consistent metadata structures
- Add correlation ID tracking for complex operations
**Priority 6: Log Management Features**
- Add log rotation for file transports
- Implement log level runtime configuration
- Add performance impact monitoring for logging itself
## Success Metrics
### Quantitative Goals
- **Reduce log volume by 70%** while maintaining operational visibility
- **Eliminate false positive warnings** (target: <1% of current memory warnings)
- **Increase actionable log ratio** from ~15% to >80%
- **Reduce time to diagnosis** for common issues by 60%
### Qualitative Improvements
- **Clear signal-to-noise**: Users can quickly identify actual issues
- **Actionable information**: Each log entry provides context for next steps
- **Consistent formatting**: Predictable log patterns across all components
- **Progressive detail**: More information available at higher log levels
### Monitoring & Validation
**Log Quality Metrics:**
- False positive rate for warnings and errors
- Time-to-resolution for issues requiring log analysis
- Developer satisfaction with log clarity and usefulness
**Performance Impact:**
- Logging overhead measurement (<5% of total CPU)
- Memory usage of logging infrastructure (<2% of heap)
- Storage efficiency of log files
## Technical Implementation Details
### Memory Monitoring Algorithm
```typescript
class IntelligentMemoryMonitor {
private baseline: number | null = null;
private samples: MemorySample[] = [];
private establishmentStart = Date.now();
checkMemory(): MemoryAlert | null {
const current = this.getCurrentMemory();
this.samples.push(current);
// Baseline establishment period (5 minutes)
if (!this.baseline && Date.now() - this.establishmentStart < 5 * 60 * 1000) {
return null; // Silent monitoring
}
// Establish baseline from stable period
if (!this.baseline) {
this.baseline = this.calculateBaseline();
}
// Growth analysis
const growthRate = this.calculateGrowthRate();
const deviation = current.heapUsed - this.baseline;
// Alert logic
if (growthRate > 10 && deviation > 50) {
return {
level: 'warn',
trigger: 'growth',
context: `Memory growing at ${growthRate.toFixed(1)}MB/hr, ${deviation.toFixed(0)}MB above baseline`,
recommendation: 'Monitor for memory leak patterns'
};
}
return null; // No alert needed
}
}
```
### Enhanced Request Logging
```typescript
class RequestLogger {
private activeRequests = new Map<string, RequestContext>();
startRequest(operation: string, clientId: string, params?: any): string {
const requestId = `req_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
const context: RequestContext = {
requestId,
clientId,
operation,
trigger: this.inferTrigger(operation, params),
startTime: new Date(),
parameters: params
};
this.activeRequests.set(requestId, context);
logger.info(`${operation} started`, {
requestId,
clientId,
trigger: context.trigger,
...params
});
return requestId;
}
completeRequest(requestId: string, outcome: Partial<OperationOutcome>): void {
const context = this.activeRequests.get(requestId);
if (!context) return;
const duration = Date.now() - context.startTime.getTime();
logger.info(`${context.operation} completed`, {
requestId,
success: outcome.success ?? true,
duration,
...outcome
});
this.activeRequests.delete(requestId);
}
}
```
## Migration Plan
### Backward Compatibility
- Maintain existing log format for external tools during transition
- Add feature flag for new vs legacy logging modes
- Provide migration guide for log parsing scripts
### Rollout Strategy
1. **Development Environment**: Full implementation and validation
2. **Staging Environment**: Performance impact testing
3. **Production**: Gradual rollout with monitoring
4. **Feedback Loop**: Adjust thresholds based on real-world usage
### Risk Mitigation
- **Performance Impact**: Benchmark logging overhead before rollout
- **Log Volume**: Monitor disk usage during transition
- **Debugging Capability**: Ensure new logs provide better debugging than old system
---
## Conclusion
The current logging system generates more noise than signal, making system monitoring and debugging unnecessarily difficult. This PRD provides a comprehensive solution that:
1. **Eliminates log spam** through intelligent monitoring and appropriate log levels
2. **Adds essential context** to make every log entry actionable
3. **Improves operational visibility** with structured, informative messages
4. **Maintains performance** while dramatically improving usefulness
Implementation of these changes will transform folder-mcp logs from a source of confusion into a powerful operational tool that enables rapid debugging, clear system understanding, and proactive monitoring.
**Expected Outcome**: Logs that tell the complete story of system behavior in a clear, concise manner that enables rapid problem resolution and confident system operation.