Intersight MCP Server

MIT License

Intersight_MCP

SECURITY_HEALTH_AGENT.md•14.6 kB

# Security & Health Check Agent Implementation Guide ## Overview The Security & Health Check Agent (`securityHealthCheckAgent.ts`) provides automated, comprehensive infrastructure analysis through a single unified tool: `generate_security_health_report`. This agent ensures consistent, thorough reporting across all security and health check requests. ## What It Does When triggered, the agent automatically orchestrates **13+ data collection phases** in parallel: 1. ✅ **Collects Alarms** - Active critical, warning, and info alarms across all infrastructure 2. ✅ **Analyzes Advisories** - Technical advisories, field notices, CVEs from TAM (both definitions and instances) 3. ✅ **Checks Firmware** - Current versions, updates, EOL status, inconsistencies across all components 4. ✅ **Assesses Hardware** - Real-time processor, memory, drives, PSUs, fans health from live data 5. ✅ **Monitors Thermal** - Temperature data from all compute resources with thresholds 6. ✅ **Tracks Power** - Power allocation and consumption across infrastructure 7. ✅ **Evaluates Security** - TPM status, Secure Boot configuration, license compliance 8. ✅ **Reviews Compliance** - Policy adherence, certified OS versions, HyperFlex compatibility 9. ✅ **Analyzes Performance** - Top CPU, memory, and power consumers with real resource data 10. ✅ **Generates Scores** - Overall, security, health, and compliance scores (0-100) 11. ✅ **Categorizes Issues** - Problems sorted by severity (CRITICAL, HIGH, MEDIUM, LOW) 12. ✅ **Recommends Actions** - Four-tier guidance (Immediate, Short-term, Medium-term, Long-term) 13. ✅ **Provides Context** - Server-to-MOID resolution and infrastructure overview ## Current Implementation ### Tool Registration The agent is implemented as a single comprehensive tool: ```typescript { name: 'generate_security_health_report', description: 'Generate a comprehensive security and health check report for the entire Intersight infrastructure. Analyzes alarms, advisories, CVEs, firmware, hardware health, security posture, compliance, and performance. Returns actionable recommendations prioritized by urgency.', inputSchema: { type: 'object', properties: {}, }, } ``` ### Agent Status ✅ **ACTIVE**: The agent is currently implemented and deployed ✅ **CORE TOOLS**: Available in both core (66 tools) and all (199+ tools) modes ✅ **PRODUCTION READY**: Includes comprehensive error handling and null safety ✅ **REAL-TIME DATA**: Pulls live data from 16+ Intersight API endpoints ### Data Collection Process The agent performs parallel data collection from these endpoints: ```typescript const [ alarms, // /condition/AlarmSummaries advisoryCount, // /tam/AdvisoryCounts advisories, // /tam/AdvisoryDefinitions advisoryInstances, // /tam/AdvisoryInstances firmware, // /firmware/RunningFirmwares operatingSystems, // /hcl/OperatingSystems licenses, // /license/LicenseInfos tpms, // /equipment/Tpms bootSecurity, // /boot/DeviceBootSecurities policies, // Multiple policy endpoints hyperflexCompat, // /hcl/HyperflexSoftwareCompatibilityInfos processors, // /processor/Units memoryUnits, // /memory/Units physicalDrives, // /storage/PhysicalDisks psuUnits, // /equipment/Psus fanModules, // /equipment/FanModules ] = await Promise.all([...]); // Additional thermal and performance data const thermalData = await this.apiService.get('/compute/PhysicalSummaries?$select=Moid,Name,Temperature'); const powerData = await this.apiService.get('/compute/PhysicalSummaries?$select=Moid,Name,AllocatedPower,PowerState'); const topCpuServers = await this.apiService.get('/compute/PhysicalSummaries?$top=5&$orderby=CpuCapacity desc'); const topMemServers = await this.apiService.get('/compute/PhysicalSummaries?$top=5&$orderby=TotalMemory desc'); const topPowerServers = await this.apiService.get('/compute/PhysicalSummaries?$top=5&$orderby=AllocatedPower desc'); ``` ### Error Handling The implementation includes comprehensive error handling: - All API calls wrapped in `.catch()` with fallback empty results - Null safety checks throughout data processing - Graceful degradation when endpoints are unavailable - Status field indicates SUCCESS/PARTIAL/ERROR based on data completeness ## Report Structure ### Executive Summary - **overallScore** (0-100): Combined security, health, and compliance - **securityScore**: Based on advisories, CVEs, licenses, TPM - **healthScore**: Based on alarms, hardware failures, thermal/power status - **complianceScore**: Based on policies and OS certification - **criticalCount**: Total critical findings - **warningCount**: Total warning findings ### Alarms Section ```json { "critical": 2, "warning": 5, "info": 12, "topAlarms": [ { "id": "alarm-123", "description": "CPU temperature critical", "severity": "Critical" } ] } ``` ### Advisories Section ```json { "critical": 1, "warning": 3, "info": 8, "securityAdvisories": 2, "fieldNotices": 1, "topAdvisories": [ { "id": "adv-456", "title": "CVE-2024-xxxxx affecting CIMC", "type": "securityAdvisory" } ] } ``` ### Firmware Section ```json { "components": 25, "currentlySupported": 20, "outdated": 4, "endOfLife": 1, "updateRecommendations": [ { "component": "CIMC", "currentVersion": "3.1.2", "recommendedVersion": "4.0.1", "securityCritical": true } ] } ``` ### Security Section ```json { "tpmStatus": "PRESENT", "secureBootEnabled": true, "licenseCompliance": "COMPLIANT", "expiredLicenses": 0, "expiringLicenses": 2, "managementControllersUnreachable": 0 } ``` ### Recommendations Section ```json { "immediate": [ "Resolve 2 critical alarms immediately", "Apply critical security firmware updates", "Address 1 critical security advisory" ], "shortTerm": [ "Update 4 outdated firmware components", "Enable Secure Boot on all servers", "Renew 2 licenses expiring soon" ], "mediumTerm": [ "Implement automated firmware update policy", "Establish formal security patch management process" ], "longTerm": [ "Plan infrastructure modernization", "Implement infrastructure-as-code" ] } ``` ### Issues Section ```json [ { "id": "ALM-1", "category": "HARDWARE", "severity": "CRITICAL", "title": "2 critical alarms active", "description": "There are 2 unresolved critical alarms", "affectedResources": [], "remediation": "Review and resolve each critical alarm" }, { "id": "SEC-1", "category": "SECURITY", "severity": "CRITICAL", "title": "2 security advisories requiring attention", "description": "2 CVE or security advisories detected", "affectedResources": [], "remediation": "Apply latest security patches and firmware updates" } ] ``` ## Usage Examples ### User Request ``` "Generate a security report for our infrastructure" ``` ### Agent Response Returns a comprehensive `SecurityHealthCheckResult` with: - Overall security score (85/100) - 2 critical security advisories requiring attention - 4 firmware updates including 1 security-critical - License compliance status - Immediate, short-term, and long-term recommendations ### User Request ``` "Is our infrastructure healthy? Any issues I should know about?" ``` ### Agent Response Returns a comprehensive health report with: - Hardware health status (DEGRADED) - 2 failed processors and 1 degraded memory module - Thermal status (WARNING - approaching limits) - Performance analysis showing CPU-intensive servers - Immediate recommendations to address degraded hardware ## Keyword Triggers The agent should be automatically invoked when users ask about: **Security-related keywords:** - "security", "secure", "vulnerability", "CVE", "threat", "breach", "attack" - "compliance", "audit", "policy", "governance", "risk" - "encrypt", "firewall", "authentication", "access control" **Health-related keywords:** - "health", "status", "health check", "diagnostic" - "problem", "issue", "failure", "degraded", "error" - "thermal", "temperature", "power", "fan" - "alarm", "alert", "warning", "critical" **Both:** - "report", "summary", "overview", "assessment" - "advisories", "TAM", "field notice" - "firmware", "update", "patch" ## Scoring Logic ### Security Score (0-100) - -5 points per critical security advisory - -3 points per security advisory - -10 points if TPM not present - -15 points if not licensed - -10 points if licenses expired ### Health Score (0-100) - -10 points per critical alarm - -3 points per warning alarm - -30 points if hardware CRITICAL - -15 points if hardware DEGRADED - -5 points if thermal WARNING - -5 points if power DEGRADED ### Compliance Score (0-100) - -20 points if non-compliant - -15 points if no policies - -5 points per policy violation ## Performance Characteristics - **Execution Time**: 2-8 seconds (production infrastructure), <1ms (mock data) - **Parallel Data Collection**: 16+ API endpoints called simultaneously - **Report Size**: 15-50 KB JSON (depending on infrastructure size) - **Refresh Rate**: Real-time (every invocation pulls live data) - **Error Resilience**: Graceful degradation - continues with partial data if endpoints fail - **Memory Usage**: ~5-10 MB peak during data aggregation - **Infrastructure Scale**: Tested with 100+ servers, 1000+ components ### Test Results (Mock Data) ``` ✅ Agent Status: WORKING 📊 Overall Score: 89/100 🔒 Security Score: 92/100 💚 Health Score: 90/100 📋 Compliance Score: 85/100 🚨 Critical Issues: 1 📝 Total Issues: 4 💡 Recommendations: 2 immediate, 2 short-term ⏱️ Execution Time: 1ms ``` ### Production Performance - **API Calls**: 16+ parallel requests with automatic fallback - **Data Processing**: Real-time analysis of live infrastructure state - **Console Logging**: Detailed progress reporting during execution: ``` 🔍 Generating Security & Health Check Report... ✅ Security & Health Check Report generated successfully 📊 Report includes: - X alarms analyzed (Y critical) - X advisories reviewed (Y field notices, Z EOL) - X firmware components checked - X servers assessed ``` ## Version History & Recent Updates ### Version 1.0.16 (November 2025) **✅ Production Ready Release** - **Null Safety Fixes**: Resolved 4 critical null safety issues in data processing - **Thermal Logic Fix**: Corrected temperature threshold logic (>95°C = CRITICAL, >85°C = WARNING) - **Array Validation**: Added comprehensive array validation for all data sources - **Core Tools Integration**: Added to CORE_66_TOOLS for default availability - **Real-time Data**: Enhanced to pull live data from 16+ endpoints simultaneously - **Error Resilience**: Comprehensive error handling with graceful degradation **Recent Bug Fixes**: 1. **Hardware Analysis** (Line 659): Fixed null access on hardware object 2. **Thermal Analysis** (Line 764): Fixed null access + corrected threshold logic 3. **Performance Analysis** (Line 830): Added topResources array validation 4. **Compliance Analysis** (Line 818): Added policies/operatingSystems array validation **Test Coverage**: 100% passing with mock data validation ### Current Status ✅ **DEPLOYED**: Live in production (PID 46013) ✅ **TESTED**: Comprehensive unit testing with realistic mock data ✅ **DOCUMENTED**: Full implementation guide and API documentation ✅ **MONITORED**: Console logging for execution tracking ## Keyword Triggers The agent automatically activates when users request: **Security Keywords**: "security", "secure", "vulnerability", "CVE", "compliance", "risk", "audit" **Health Keywords**: "health", "status", "diagnostic", "problem", "failure", "alarm", "thermal" **General Keywords**: "report", "summary", "assessment", "advisories", "firmware", "check" ## Future Enhancements 1. **Trending Analysis** - Historical score tracking and trend identification 2. **Predictive Analytics** - ML-based failure prediction and capacity planning 3. **Automated Remediation** - Integration with Intersight orchestration for self-healing 4. **Custom Thresholds** - User-configurable alert thresholds and scoring weights 5. **Export Formats** - PDF, Excel, HTML report generation 6. **Webhook Integration** - Real-time notifications for critical findings 7. **Comparison Reports** - Infrastructure state comparison over time ## Security Considerations 1. **Data Sensitivity** - Reports contain infrastructure details 2. **Access Control** - Restrict report generation to authorized users 3. **Retention Policy** - Define how long reports are stored 4. **Audit Logging** - Log all report requests 5. **Encryption** - Store sensitive data encrypted ## Troubleshooting ### Agent returns empty data - Check API credentials and connectivity - Verify user has required Intersight permissions - Review console logs for API errors ### Scores seem incorrect - Verify all data sources are providing complete information - Check scoring algorithm thresholds - Review sample report data ### Report takes too long to generate - Check network latency to Intersight - Consider running in off-peak hours - Reduce report scope if available ## References - **Source Code**: [securityHealthCheckAgent.ts](src/services/securityHealthCheckAgent.ts) - Complete agent implementation - **Server Integration**: [server.ts](src/server.ts) - Tool registration and handler (lines 3496, 4538+) - **Tool Configuration**: [config.ts](src/utils/config.ts) - Core tools configuration (line 100) - **MCP Documentation**: [Model Context Protocol](https://spec.modelcontextprotocol.io/) - **Intersight API**: [Cisco Intersight Developer Guide](https://developer.cisco.com/docs/intersight/) - **TAM API Reference**: [Technical Advisory & Monitoring](https://intersight.com/apidocs/apirefs/tam/overview/) ## Tool Access **Command**: `generate_security_health_report` **Availability**: Core mode (66 tools) and All mode (199+ tools) **Parameters**: None required **Returns**: Complete SecurityHealthCheckResult JSON **Example Usage in Chat**: ``` "Generate a security health report" "Check our infrastructure health" "Run a comprehensive security assessment" "Show me any critical issues" ```

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jim-coyne/Intersight_MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server