Skip to main content
Glama

πŸ₯ K8s Doctor MCP

AI-powered Kubernetes cluster diagnostics and intelligent debugging recommendations

npm version npm downloads License Node Kubernetes

English | ν•œκ΅­μ–΄

Demo

K8s Doctor Demo

Why K8s Doctor?

When a Kubernetes issue strikes, developers typically run through an endless loop of:

  • kubectl get pods

  • kubectl logs

  • kubectl describe

  • Frantically searching StackOverflow...

K8s Doctor changes the game. It's not just a kubectl wrapper - it's an AI-powered diagnostic tool that:

  • πŸ” Analyzes root causes - Goes beyond simple status checks

  • 🧠 Detects error patterns - Recognizes common issues (Connection Refused, OOM, DNS failures)

  • πŸ’‘ Provides actionable solutions - Gives you exact kubectl commands to fix problems

  • πŸ“Š Exit code analysis - Explains what exit 137, 143, 1 actually mean

  • 🎯 Log pattern matching - Finds the signal in thousands of log lines

  • πŸ₯ Health scoring - Rates your pod/cluster health 0-100

Features

Tool

Description

diagnose-pod

Comprehensive pod diagnostics

- analyzes status, events, resources, and provides health score

debug-crashloop

CrashLoopBackOff specialist

- decodes exit codes, analyzes logs, finds root cause

analyze-logs

Smart log analysis

- detects error patterns, suggests fixes for common issues

check-resources

Resource usage

- validates CPU/Memory limits, warns about OOM risks

full-diagnosis

Cluster health check

- scans all nodes and pods for issues

check-events

Event analysis

- filters and analyzes Warning events

list-namespaces

Namespace listing

- quick overview of all namespaces

list-pods

Pod listing

- shows problematic pods with status indicators

Installation

npm install -g @zerry_jin/k8s-doctor-mcp

From source

git clone https://github.com/ongjin/k8s-doctor-mcp.git cd k8s-doctor-mcp npm install && npm run build

Setup with Claude Code

# After npm global install claude mcp add k8s-doctor -- k8s-doctor-mcp # Or from source build claude mcp add k8s-doctor -- node /path/to/k8s-doctor-mcp/dist/index.js

Quick Setup (Auto-approve Tools)

Tired of manually approving tool execution every time? Follow these steps to enable auto-approval.

πŸ–₯️ For Claude Desktop App Users

  1. Restart the Claude Desktop App.

  2. Ask your first question using k8s-doctor.

  3. When the permission dialog appears, check the box "Always allow requests from this server" and click Allow. (Future requests will execute automatically without prompts.)

⌨️ For Claude Code (CLI) Users

If you are using the claude terminal command, manage permissions via the interactive menu:

  1. Run claude in your terminal.

  2. Type /permissions in the prompt and press Enter.

  3. Select Global Permissions (or Project Permissions) > Allowed Tools.

  4. Enter mcp__k8s-doctor__* to allow all tools, or add specific tools individually.

πŸ’‘ Tip: For most use cases, allowing diagnose-pod, debug-crashloop, and analyze-logs is sufficient. These three cover 90% of debugging scenarios.

Recommended configuration:

# Balanced approach - allow main diagnostic tools claude config add allowedTools \ "mcp__k8s-doctor__diagnose-pod" \ "mcp__k8s-doctor__debug-crashloop" \ "mcp__k8s-doctor__analyze-logs" \ "mcp__k8s-doctor__full-diagnosis"

Prerequisites

  • kubectl configured and working (kubectl cluster-info should succeed)

  • kubeconfig file in default location (~/.kube/config) or KUBECONFIG env var set

  • Node.js 18 or higher

  • Access to a Kubernetes cluster (local like minikube/kind, or remote)

Usage Examples

Example 1: Diagnose a CrashLooping Pod

You: "My pod 'api-server' in namespace 'production' is CrashLooping. What's wrong?" Claude (using k8s-doctor): πŸ” CrashLoopBackOff 진단 Exit Code: 137 (OOM Killed) Root Cause: Container was killed due to Out Of Memory Solution: Increase memory limit: ```yaml resources: limits: memory: "512Mi" # Increase from current value

Relevant logs:

  • Line 1234: Error: JavaScript heap out of memory

  • Line 1256: FATAL ERROR: Reached heap limit

### Example 2: Analyze Application Logs

You: "Analyze logs for pod 'backend-worker' and tell me what's failing"

Claude (using analyze-logs): πŸ“ Log Analysis

Detected Error Patterns:

πŸ”΄ Database Connection Error (15 occurrences) Possible Causes:

  • DB service not ready

  • Wrong connection string

  • Authentication failed

Solutions:

  • Check DB pod status

  • Verify environment variables (ConfigMap/Secret)

  • Check service endpoints: kubectl get endpoints

🟑 Timeout (8 occurrences) Likely cause: Response time too slow or network delay Solution: Increase timeout values or optimize service performance

### Example 3: Cluster Health Check

You: "Check overall cluster health"

Claude (using full-diagnosis): πŸ₯ Cluster Health Diagnosis

Overall Score: 72/100 πŸ’›

Nodes: 3/3 Ready βœ… Pods: 45/52 Running

  • CrashLoop: 2 πŸ”₯

  • Pending: 5 ⏳

Critical Issues: πŸ”΄ Pod "payment-service" CrashLooping (exit 1) πŸ”΄ Pod "worker-3" OOM Killed

Recommendations:

  • Fix 2 CrashLoop pods immediately

  • Check if pending pods lack resources

## How It Works 1. **Connects to your cluster** via kubeconfig (same as kubectl) 2. **Gathers comprehensive data** - pod status, events, logs, resource usage 3. **Applies pattern matching** - recognizes common error patterns from production experience 4. **Analyzes root causes** - doesn't just show status, explains WHY it's failing 5. **Provides solutions** - gives exact commands and YAML to fix issues ## Error Patterns Detected K8s Doctor recognizes these common patterns: - πŸ”΄ **Connection Refused** - Service not ready, wrong port, network policy - πŸ”΄ **Database Connection Errors** - DB auth, wrong connection strings - πŸ”΄ **Out of Memory** - OOM kills, memory leaks, undersized limits - 🟠 **File Not Found** - ConfigMap not mounted, wrong paths - 🟠 **Permission Denied** - SecurityContext issues, fsGroup problems - 🟠 **DNS Resolution Failed** - CoreDNS issues, wrong service names - 🟑 **Port Already in Use** - Multiple processes on same port - 🟑 **Timeout** - Slow responses, network delays - 🟑 **SSL/TLS Errors** - Expired certs, missing CA bundles ## Architecture

k8s-doctor-mcp/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ index.ts # MCP server with all tools β”‚ β”œβ”€β”€ types.ts # TypeScript type definitions β”‚ β”œβ”€β”€ diagnostics/ β”‚ β”‚ β”œβ”€β”€ pod-diagnostics.ts # Pod health analysis β”‚ β”‚ └── cluster-health.ts # Cluster-wide diagnostics β”‚ β”œβ”€β”€ analyzers/ β”‚ β”‚ └── log-analyzer.ts # Smart log pattern matching β”‚ └── utils/ β”‚ β”œβ”€β”€ k8s-client.ts # Kubernetes API client β”‚ └── formatters.ts # Output formatting utilities └── package.json

## Security Considerations - K8s Doctor uses **read-only** Kubernetes API calls (list, get, describe) - Requires same permissions as `kubectl get/describe/logs` - Never modifies cluster state - kubeconfig credentials stay local - No data sent to external servers ## Troubleshooting ### "kubeconfig not found" ```bash # Verify kubectl works kubectl cluster-info # Check kubeconfig location echo $KUBECONFIG # Test with explicit path export KUBECONFIG=~/.kube/config

"Permission denied"

# Check your cluster permissions kubectl auth can-i get pods --all-namespaces # You need at least read access to: # - pods, events, namespaces, nodes

"Connection refused to cluster"

# Verify cluster connectivity kubectl get nodes # For local clusters (minikube/kind) minikube status kind get clusters

Development

# Clone and install git clone https://github.com/ongjin/k8s-doctor-mcp.git cd k8s-doctor-mcp npm install # Development mode npm run dev # Build npm run build # Test with Claude Code npm run build claude mcp add k8s-doctor-dev -- node $(pwd)/dist/index.js

Contributing

Contributions welcome! Especially:

  • πŸ†• New error pattern detections

  • 🌍 Internationalization (more languages)

  • πŸ“Š Metrics integration (Prometheus, etc.)

  • πŸ§ͺ Test coverage

  • πŸ“– Documentation improvements

Roadmap

  • Metrics Server integration (real-time CPU/Memory usage)

  • Network policy diagnostics

  • Storage/PVC troubleshooting

  • Helm chart analysis

  • Multi-cluster support

  • Interactive debugging mode

  • Export reports (PDF, HTML)

License

MIT Β© zerry

Acknowledgments

Built with:

Star History

If this tool saves you debugging time, please ⭐ star the repo!

Author

zerry

  • GitHub: @zerry

  • Created for the DevOps community who are tired of kubectl hell πŸ˜…


Made with ❀️ for Kubernetes users drowning in logs

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ongjin/k8s-doctor-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server