Skip to main content
Glama

ARC Config MCP Server

by tsviz
ENHANCED_TROUBLESHOOTING.mdโ€ข11.1 kB
# Enhanced ARC MCP Server - Comprehensive Troubleshooting Guide ## ๐ŸŽฏ Overview This enhanced version of the ARC MCP server includes comprehensive troubleshooting capabilities based on real-world experience with ARC installations and cleanup operations. The system can automatically detect, diagnose, and fix common issues without requiring manual command-line intervention. ## ๐Ÿ”ง Enhanced Troubleshooting Scenarios ### 1. Namespace Stuck Terminating **Issue:** Namespace remains in "Terminating" state indefinitely **Real-world Example:** `arc-systems` namespace stuck for 9+ hours due to finalizers **Auto-Fix Capabilities:** - โœ… Automatically detects stuck resources with finalizers - โœ… Force removes finalizers from orphaned runner resources - โœ… Uses `kubectl patch` to clear namespace finalizers - โœ… Force finalizes namespace via API endpoint **Manual Steps Automated:** ```bash # These steps are now automated by the MCP server kubectl patch namespace arc-systems -p '{"metadata":{"finalizers":null}}' --type=merge kubectl get namespace arc-systems -o json | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/arc-systems/finalize" -f - ``` ### 2. Image Pull Authentication Issues **Issue:** `ImagePullBackOff` or `ErrImagePull` due to GitHub Container Registry authentication **Real-world Example:** `ghcr.io/actions/actions-runner-controller:v0.27.6`: unauthorized **Auto-Fix Capabilities:** - โœ… Detects GHCR authentication failures - โœ… Attempts alternative image repositories - โœ… Uses specific stable image versions - โœ… Configures image pull secrets if needed - โœ… Falls back to DockerHub mirrors **Prevention Strategies:** - Uses proven image versions instead of `latest` - Configures proper GitHub token permissions - Implements repository failover mechanisms ### 3. Certificate Manager Issues **Issue:** cert-manager pods not ready, blocking ARC installation **Real-world Example:** CRDs not available, webhook not responsive **Auto-Fix Capabilities:** - โœ… Waits intelligently for cert-manager readiness - โœ… Tests webhook connectivity with sample resources - โœ… Validates CRDs are properly installed - โœ… Provides fallback installation methods (Helm vs kubectl) **Comprehensive Validation:** - Checks namespace existence - Validates all deployments are ready - Tests CRD availability - Verifies webhook responsiveness ### 4. Helm Installation Timeouts **Issue:** Helm installations timeout due to resource constraints or image pulls **Real-world Example:** Installation hangs for 10+ minutes waiting for pods **Auto-Fix Capabilities:** - โœ… Dynamically adjusts timeout values based on cluster size - โœ… Monitors pod startup progress in real-time - โœ… Detects and resolves resource constraint issues - โœ… Provides intelligent retry mechanisms **Smart Monitoring:** - Real-time pod status updates - Progress visualization during installation - Intelligent failure detection and recovery ### 5. Pod Security Standards Violations **Issue:** Pods rejected due to security policy violations **Real-world Example:** `runAsNonRoot` conflicts, privilege escalation issues **Auto-Fix Capabilities:** - โœ… Automatically configures proper security contexts - โœ… Adjusts namespace security policies when needed - โœ… Uses privileged namespaces for compatibility - โœ… Optimizes security settings for ARC requirements ### 6. Resource Finalizer Issues **Issue:** Custom resources stuck due to finalizers preventing deletion **Real-world Example:** `runner.actions.summerwind.dev` finalizers on 3 runner instances **Auto-Fix Capabilities:** - โœ… Detects all stuck resources with finalizers - โœ… Force removes finalizers from specific resource types - โœ… Handles AutoscalingRunnerSets, RunnerDeployments, and Runners - โœ… Provides granular finalizer management **Supported Resource Types:** - `runners.actions.summerwind.dev` - `autoscalingrunnersets.actions.summerwind.dev` - `runnerdeployments.actions.summerwind.dev` - `horizontalrunnerautoscalers.actions.summerwind.dev` ## ๐Ÿš€ Enhanced Installation Process The enhanced installation process includes six phases with comprehensive troubleshooting: ### Phase 1: Prerequisites Validation with Issue Detection - โœ… Validates Kubernetes cluster connectivity - โœ… Checks Helm availability and configuration - โœ… Validates GitHub token permissions - โœ… Detects existing installation conflicts - โœ… Performs resource capacity analysis ### Phase 2: Environment Assessment with AI Optimization - โœ… Analyzes cluster topology - โœ… Generates optimal scaling configurations - โœ… Assesses security posture - โœ… Creates intelligent installation plan ### Phase 3: ARC Installation with Real-time Monitoring - โœ… Creates namespace with security policies - โœ… Installs cert-manager with comprehensive validation - โœ… Deploys ARC controller with progress tracking - โœ… Monitors pod startup and health ### Phase 4: Security Hardening with AI Configuration - โœ… Applies enterprise-grade security policies - โœ… Configures network policies - โœ… Sets up proper RBAC - โœ… Enables compliance monitoring ### Phase 5: Validation with Comprehensive Testing - โœ… Validates all components are healthy - โœ… Tests webhook connectivity - โœ… Performs compliance scoring - โœ… Generates security reports ### Phase 6: Runner Guidance with AI Recommendations - โœ… Generates optimal runner configurations - โœ… Provides testing workflows - โœ… Creates next-step recommendations - โœ… Enables conversational management ## ๐Ÿงน Enhanced Cleanup Process The enhanced cleanup process includes six phases with force recovery capabilities: ### Phase 1: Enhanced Validation with Issue Detection - โœ… Detects stuck resources and finalizers - โœ… Identifies namespace terminating issues - โœ… Analyzes resource dependencies - โœ… Performs safety checks ### Phase 2: Comprehensive Troubleshooting - โœ… Automatically applies fixes for known issues - โœ… Removes finalizers from stuck resources - โœ… Resolves namespace terminating problems - โœ… Handles orphaned resources ### Phase 3: Forced Resource Cleanup - โœ… Force removes runner resources with grace period bypass - โœ… Uninstalls Helm releases with timeout handling - โœ… Removes deployments and services - โœ… Cleans up secrets and configurations ### Phase 4: Finalizer Removal - โœ… Systematically removes finalizers from all ARC resources - โœ… Handles multiple resource types - โœ… Uses proper API calls for finalizer management - โœ… Provides recovery tracking ### Phase 5: Namespace Force Deletion - โœ… Attempts graceful namespace deletion first - โœ… Force removes namespace finalizers if needed - โœ… Uses API endpoint for finalizer management - โœ… Waits intelligently for completion ### Phase 6: Final Verification - โœ… Comprehensive verification of cleanup completeness - โœ… Checks for remaining resources across all namespaces - โœ… Validates Custom Resource Definitions - โœ… Provides detailed cleanup report ## ๐Ÿ“Š Real-time Progress Updates All operations provide real-time progress updates in the VS Code chat: ``` ## ๐Ÿš€ ARC Installation Progress Progress: 60% [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘] ๐Ÿ“‹ Installation Phases: โœ… ๐Ÿ” Prerequisites โœ… ๐Ÿ“Š Assessment โšก ๐Ÿš€ Installation โธ๏ธ ๐Ÿ›ก๏ธ Security โธ๏ธ โœ… Validation โธ๏ธ ๐Ÿƒโ€โ™‚๏ธ Runners ๐ŸŽฏ Current Phase: Installing ARC controller with AI optimization... โฑ๏ธ This process typically takes 2-5 minutes ๐ŸŽช Sit back and enjoy the show! ``` ## ๐Ÿง  AI Insights The system provides intelligent insights throughout the process: - ๐Ÿง  Analyzing cluster state for safe cleanup operations - ๐Ÿง  Environment validated - safe to proceed with cleanup - ๐Ÿง  No runner resources found - skipping this phase - ๐Ÿง  Evaluating namespace arc-systems for safe removal - ๐Ÿง  Some components may require manual cleanup - see verification results ## ๐Ÿ›ก๏ธ Safety Features ### Default Safety Mode - Cleanup functionality is disabled by default (`CLEANUP_ARC=false`) - Requires explicit enablement to prevent accidental deletions - Provides dry-run mode for validation ### Intelligent Recovery - Automatic detection of common failure patterns - Self-healing capabilities for known issues - Comprehensive rollback strategies ### Comprehensive Logging - Detailed operation logs with timestamps - AI insights and recommendations - Troubleshooting results and recovery actions ## ๐ŸŽฎ Usage Examples ### Enable Enhanced Installation The enhanced installation is used automatically when calling the installation tool. It includes: - Comprehensive troubleshooting - Real-time progress updates - Automatic issue resolution - Intelligent recovery mechanisms ### Enable Enhanced Cleanup ```bash # Set environment variable to enable cleanup export CLEANUP_ARC=true # Or update your MCP configuration { "args": ["--rm", "-i", "-e", "CLEANUP_ARC=true", ...] } ``` ### Natural Language Commands The system supports natural language for all operations: - "Install ARC with troubleshooting" - "Cleanup the stuck ARC installation" - "Fix the namespace terminating issue" - "Force remove all ARC components" ## ๐Ÿ” Troubleshooting Scenarios Covered | Issue | Severity | Auto-Fix | Description | |-------|----------|----------|-------------| | Namespace Stuck Terminating | High | โœ… | Finalizers blocking namespace deletion | | Image Pull Authentication | Critical | โœ… | GHCR authentication failures | | Cert-Manager Not Ready | High | โœ… | CRDs or webhook issues | | Helm Installation Timeout | Medium | โœ… | Resource constraints or image pulls | | Pod Security Violations | Medium | โœ… | Security context misconfigurations | | GitHub Token Issues | Critical | โœ… | Invalid or expired tokens | | Resource Limit Issues | Medium | โœ… | Insufficient cluster resources | | Network Policy Problems | Medium | โœ… | Connectivity blocked by policies | | CRD Version Conflicts | High | โœ… | Custom Resource Definition issues | | Webhook Configuration | High | โœ… | Admission controller problems | | Runner Registration | Medium | โœ… | GitHub integration failures | ## ๐ŸŽฏ Benefits 1. **Zero Manual Intervention**: All common issues are detected and fixed automatically 2. **Real-world Experience**: Based on actual troubleshooting scenarios 3. **Comprehensive Coverage**: Handles installation, cleanup, and recovery 4. **Intelligent Recovery**: Self-healing capabilities for known issues 5. **Safety First**: Multiple safety mechanisms prevent accidental damage 6. **Visual Feedback**: Real-time progress updates and AI insights 7. **Natural Language**: Conversational interface for all operations ## ๐Ÿ”„ Continuous Improvement The troubleshooting scenarios are continuously updated based on: - Real-world deployment experiences - Community feedback and issues - New ARC versions and changes - Kubernetes platform evolution This ensures the MCP server stays current with the latest challenges and solutions in the ARC ecosystem.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tsviz/arc-config-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server