Skip to main content
Glama
mlaurel

Structured Workflow Engine MCP Server

by mlaurel
monitor-system-health.md5.88 kB
# Monitor System Health ## Goal Establish comprehensive monitoring and observability for applications and infrastructure to detect issues proactively and maintain system reliability. ## Context Required - Running application or system - System architecture and components ## Skip When - Comprehensive monitoring already in place and functioning - System too simple to warrant detailed monitoring - Monitoring setup is handled by infrastructure team - Only temporary or development system ## Complexity Assessment - **Task Complexity**: High - requires monitoring expertise and system architecture understanding ## Task Understanding Assessment If task unclear - ask clarifying questions with multiple choice options ## Monitoring Categories ### Infrastructure Monitoring - **CPU utilization** - track processor usage across all systems - **Memory usage** - monitor RAM consumption and availability - **Disk space** - track storage usage and I/O performance - **Network** - monitor bandwidth, latency, and connectivity - **Container metrics** - CPU, memory, and network for containerized apps ### Application Monitoring - **Response times** - track API and page load performance - **Throughput** - monitor requests per second and concurrent users - **Error rates** - track 4xx and 5xx errors, exceptions, and failures - **Business metrics** - monitor key business KPIs and conversion rates - **User experience** - track real user monitoring and synthetic checks ### Security Monitoring - **Authentication failures** - monitor failed login attempts - **Access patterns** - track unusual access patterns or privilege escalation - **Security events** - monitor firewall blocks, intrusion attempts - **Compliance metrics** - track adherence to security policies - **Vulnerability scanning** - regular security assessment results ### Database Monitoring - **Query performance** - track slow queries and execution times - **Connection pool** - monitor database connections and pool usage - **Replication lag** - track data synchronization across replicas - **Storage usage** - monitor database size and growth trends - **Lock contention** - identify blocking queries and deadlocks ## Alert Configuration ### Alert Severity Levels - **Critical** - immediate response required (system down, data loss) - **Warning** - requires attention within defined timeframe - **Info** - informational alerts for awareness and trending ### Alert Thresholds - Set thresholds based on baseline performance data - Use multiple thresholds to avoid alert fatigue - Configure escalation for unacknowledged critical alerts - Implement intelligent alerting to reduce false positives ### Notification Channels - **Immediate** - SMS, phone calls for critical alerts - **Standard** - email, Slack for warnings and info alerts - **Escalation** - manager notification for unresolved critical issues - **Status pages** - public communication for customer-facing issues ## Dashboard Design ### Executive Dashboard - High-level system health overview - Key business metrics and SLAs - Current incidents and their impact - System availability and performance trends ### Operations Dashboard - Detailed system metrics and performance - Recent alerts and their status - Infrastructure utilization and capacity - Deployment and change tracking ### Development Dashboard - Application performance metrics - Error rates and recent errors - Performance across different environments - Code deployment and rollback tracking ### Business Dashboard - User engagement and conversion metrics - Revenue and transaction monitoring - Customer experience indicators - Business process performance ## Key Tasks 1. **Assess monitoring requirements** - identify critical metrics, SLAs, and monitoring scope 2. **Set up infrastructure monitoring** - monitor servers, containers, networks, and resources 3. **Implement application monitoring** - track application performance, errors, and usage 4. **Configure log aggregation** - centralize and analyze application and system logs 5. **Create dashboards and visualizations** - build monitoring dashboards for different audiences 6. **Set up alerting and notifications** - configure alerts for critical issues and thresholds 7. **Test monitoring and alerts** - validate monitoring setup and alert delivery 8. **Document monitoring procedures** - create runbooks and monitoring documentation ## Monitoring Tools and Technologies ### Open Source Options - **Prometheus + Grafana** - metrics collection and visualization - **ELK Stack** - Elasticsearch, Logstash, Kibana for log analysis - **Jaeger** - distributed tracing for microservices - **Nagios** - traditional infrastructure monitoring ### Cloud Provider Solutions - **AWS CloudWatch** - comprehensive AWS service monitoring - **Azure Monitor** - Microsoft Azure monitoring and analytics - **Google Cloud Monitoring** - GCP infrastructure and application monitoring ### SaaS Solutions - **Datadog** - full-stack monitoring and analytics - **New Relic** - application performance monitoring - **Splunk** - log analysis and security monitoring - **PagerDuty** - incident management and alerting ## Success Criteria - All critical system components monitored with appropriate metrics - Dashboards provide clear visibility into system health - Alerts trigger appropriately for actual issues without false positives - Monitoring data helps with troubleshooting and capacity planning - Team receives notifications through preferred channels - Historical data available for trend analysis ## Key Outputs - Comprehensive monitoring setup for all system components - Real-time dashboards showing system health and performance - Automated alerting for critical issues and threshold breaches - Centralized log aggregation and analysis - Monitoring documentation and runbooks - Performance baselines and trend analysis

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mlaurel/mcp-workflow-engine'

If you have feedback or need assistance with the MCP directory API, please join our Discord server