MONITORING.md•8.58 kB
# Monitoring and Logging Guide
CodeCompass MCP includes comprehensive monitoring and logging capabilities inspired by enterprise-grade systems and OpenRouter MCP patterns.
## Overview
The monitoring system provides:
- **Real-time Metrics**: Request counts, response times, error rates, memory usage
- **Performance Insights**: Slowest tools, error-prone operations, peak usage analysis
- **Structured Logging**: JSON and console formats with contextual information
- **Health Checks**: System status monitoring with configurable thresholds
- **Interactive Dashboard**: Real-time command-line monitoring interface
## Quick Start
### View Current Status
```bash
# Single health check
npm run monitor
# Interactive dashboard (refreshes every 5 seconds)
npm run monitor -- --watch
# Export metrics to JSON
npm run monitor -- --export > metrics.json
```
### Using the Health Check Tool
```bash
# Basic health check
curl -X POST http://localhost:3000/health \
-H "Content-Type: application/json" \
-d '{"name": "health_check"}'
# Comprehensive health check with insights
curl -X POST http://localhost:3000/health \
-H "Content-Type: application/json" \
-d '{
"name": "health_check",
"arguments": {
"checks": ["api-limits", "system-health", "monitoring", "configuration"],
"options": {
"include_metrics": true,
"include_insights": true,
"include_logs": true
}
}
}'
```
## Monitoring Features
### 1. System Metrics
The monitoring system tracks:
- **Memory Usage**: Heap usage, RSS, external memory
- **Request Metrics**: Total requests, error count, response times
- **Tool Usage**: Per-tool statistics and performance
- **Uptime**: Server uptime and availability
### 2. Performance Insights
Automatic analysis provides:
- **Slowest Tools**: Tools with highest average response times
- **Error-Prone Operations**: Tools with highest error rates
- **Peak Usage Hours**: Hourly request distribution
- **Recommendations**: Actionable performance improvements
### 3. Health Status
Multi-level health checks:
- **Healthy**: All systems operating normally
- **Degraded**: Some performance issues detected
- **Unhealthy**: Critical issues requiring attention
Health checks monitor:
- Memory usage (threshold: 80%)
- Error rate (threshold: 5%)
- Recent errors (threshold: 5 per 5 minutes)
- Average response time (threshold: 5 seconds)
## Logging System
### Log Levels
- **DEBUG**: Detailed diagnostic information
- **INFO**: General operational messages
- **WARN**: Warning conditions
- **ERROR**: Error conditions requiring attention
### Log Formats
#### Console Format (Development)
```
2023-12-08T10:30:45.123Z INFO [req-123] Request started: fetch_repository_data
2023-12-08T10:30:46.456Z INFO [req-123] Request completed: fetch_repository_data (1333ms)
```
#### JSON Format (Production)
```json
{
"timestamp": "2023-12-08T10:30:45.123Z",
"level": "INFO",
"message": "Request started: fetch_repository_data",
"context": {
"tool": "fetch_repository_data",
"requestId": "req-123"
},
"requestId": "req-123"
}
```
### Configuration
Set log level via environment variable:
```bash
export LOG_LEVEL=debug # debug, info, warn, error
export NODE_ENV=production # Enables JSON logging
```
## Monitoring Dashboard
The interactive dashboard provides real-time visibility into:
### System Overview
- Uptime and memory usage
- Request statistics and error rates
- Response time percentiles
### Tool Performance
- Usage statistics per tool
- Average response times
- Error rates by tool
### Recent Activity
- Last 10 requests with status
- Recent log entries
- Error details
### Performance Insights
- Slowest tools analysis
- Error-prone operations
- Peak usage patterns
- Automated recommendations
## Using the Monitoring API
### Programmatic Access
```javascript
import { monitoring } from './src/utils/monitoring.js';
// Get current metrics
const metrics = monitoring.getMetrics();
// Get health status
const health = monitoring.getHealthStatus();
// Get performance insights
const insights = monitoring.getPerformanceInsights();
// Export all data
const exportData = monitoring.exportMetrics();
```
### Request Tracking
```javascript
// Manual request tracking
const requestId = monitoring.generateRequestId();
const startTime = Date.now();
monitoring.startRequest('my_tool', requestId);
try {
// Your tool logic here
const result = await myTool();
monitoring.completeRequest('my_tool', startTime, true, undefined, requestId);
return result;
} catch (error) {
monitoring.completeRequest('my_tool', startTime, false, error.message, requestId);
throw error;
}
```
### Tool Monitoring Wrapper
```javascript
import { monitorTool } from './src/utils/monitoring.js';
// Wrap any function with automatic monitoring
const monitoredFunction = monitorTool('my_tool', async (param) => {
// Your tool logic
return result;
});
```
## Performance Optimization
### Monitoring Overhead
The monitoring system is designed for minimal overhead:
- Metrics collection: ~0.1ms per request
- Memory usage: ~1MB for 1000 requests
- CPU impact: <1% under normal load
### Buffer Management
- Request metrics: Limited to 1000 recent entries
- Log buffer: Limited to 1000 recent entries
- Automatic cleanup prevents memory leaks
## Docker Integration
Monitoring works seamlessly with Docker:
```bash
# View logs from container
./scripts/docker-logs.sh -f --timestamps
# Monitor container health
docker exec codecompass-mcp node scripts/monitor.js
# Export metrics from container
docker exec codecompass-mcp node scripts/monitor.js --export
```
## Production Deployment
### Environment Variables
```bash
# Enable JSON logging
NODE_ENV=production
# Set log level
LOG_LEVEL=info
# Optional: Enable file logging
LOG_FILE=/var/log/codecompass-mcp.log
```
### External Monitoring
The monitoring system integrates with external systems:
```bash
# Prometheus metrics endpoint (if implemented)
curl http://localhost:3000/metrics
# Health check endpoint
curl http://localhost:3000/health
# JSON metrics export
curl http://localhost:3000/metrics.json
```
## Troubleshooting
### Common Issues
1. **High Memory Usage**
- Check metrics buffer size
- Review tool usage patterns
- Consider request rate limiting
2. **Slow Response Times**
- Analyze slowest tools
- Implement caching where appropriate
- Use chunking for large responses
3. **High Error Rates**
- Review error logs
- Check API rate limits
- Verify configuration
### Debug Mode
Enable debug logging for detailed troubleshooting:
```bash
export LOG_LEVEL=debug
npm run dev
```
## Best Practices
### 1. Regular Monitoring
- Check dashboard during development
- Monitor health checks in production
- Set up alerts for critical thresholds
### 2. Performance Analysis
- Review weekly performance reports
- Identify trending issues
- Optimize based on insights
### 3. Log Management
- Rotate log files in production
- Use structured logging for analysis
- Set appropriate log levels
### 4. Capacity Planning
- Monitor resource usage trends
- Plan for peak usage periods
- Scale based on metrics
## Integration Examples
### CI/CD Health Checks
```yaml
# GitHub Actions example
- name: Health Check
run: |
npm run monitor -- --export > metrics.json
# Parse and validate metrics
node scripts/validate-health.js
```
### Load Testing
```bash
# Monitor during load testing
npm run monitor -- --watch &
# Run load tests
npm run load-test
```
### Custom Dashboards
The monitoring system can integrate with:
- Grafana for visualization
- Prometheus for metrics collection
- ELK stack for log analysis
- New Relic for APM
## API Reference
### Health Check Tool
```json
{
"name": "health_check",
"arguments": {
"checks": ["api-limits", "system-health", "monitoring", "configuration"],
"options": {
"include_metrics": true,
"include_insights": true,
"include_logs": true
}
}
}
```
### Monitoring Scripts
```bash
# Basic monitoring
node scripts/monitor.js
# Watch mode
node scripts/monitor.js --watch
# Export mode
node scripts/monitor.js --export
# Reset metrics
node scripts/monitor.js --reset
```
## Support
For monitoring-related issues:
1. Check the debug logs
2. Review the performance insights
3. Consult the troubleshooting guide
4. Report issues with metrics export
The monitoring system is designed to be self-diagnosing and provides actionable insights for performance optimization and issue resolution.