Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Capabilities
Server capabilities have not been inspected yet.
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| get_cluster_storage_report | Use this FIRST. Fast, high-level summary of storage usage for all nodes or a specific node.
Checks quotas and reported usage from the Kubelet API.
If 'node' is provided, analyzes only that node.
If 'node' is not provided, analyzes all worker nodes. |
| inspect_node_storage_forensics | SLOW operation (10s+). Performs a deep forensic analysis of a node's storage.
Use this tool ONLY when:
1. A specific node is known to be problematic (full disk).
2. 'get_storage_usage' does not reveal the root cause.
This tool runs a debug pod on the node to calculate:
- Real disk usage (df -h).
- Reclaimable space from UNUSED images.
- Growth of container writable layers (indicating log/file issues inside containers). |
| check_persistent_volume_capacity | Monitor Persistent Volume Claim (PVC) capacity usage across the cluster.
Critical for preventing database crashes and data loss due to full disks.
This checks PVCs (persistent storage) which is distinct from ephemeral storage.
Args:
namespace: Optional namespace to filter PVCs. If None, checks all namespaces.
threshold: Alert threshold percentage (default: 85%). PVCs above this will be flagged.
Returns:
Formatted report of PVC usage with warnings for volumes exceeding threshold. |
| get_cluster_resource_balance | Analyze cluster resource balance, focusing on Request vs Usage gaps.
Why Essential:
- Scheduling bottleneck diagnosis: Explains why pods are Pending despite capacity.
- Resource fragmentation detection: Identifies "request vs usage" gaps.
- Cost efficiency: Reveals over-provisioned nodes.
Returns:
Markdown table showing CPU/Memory Requests vs Usage per node. |
| detect_pod_restarts_anomalies | Identify unstable pods experiencing high restart rates.
Why:
- Application stability indicator: High restart rates signal code issues (OOM, panics, misconfigurations)
- Proactive detection: Catches intermittent failures before they become incidents
- Actionable: Directly points to problematic workloads
Args:
threshold: Minimum number of restarts to flag (default: 5).
duration: Window of time to analyze (e.g., '1h', '24h', '10m').
Returns:
Markdown report of unstable pods. |
| get_gpu_utilization | Monitor GPU usage and health across the cluster.
Why:
- Cost efficiency: GPUs are expensive. Low utilization indicates wasted money.
- Resource optimization: Identifies idle GPUs that could be deallocated.
- Hardware health: High error rates indicate hardware issues.
Prerequisites:
- NVIDIA GPU Operator installed (exports DCGM metrics).
Returns:
Markdown report of GPU utilization per node. |
| get_pod_logs | Retrieve container logs from a pod.
Args:
namespace: Pod namespace
pod_name: Pod name
container: Specific container name (if None, gets all containers)
previous: Get logs from previous container instance
tail: Number of recent lines to retrieve (default: 100)
since: Time duration to retrieve logs from (e.g., "1h", "30m")
Returns:
Formatted logs with container separation |
| get_pod_diagnostics | Comprehensive pod health analysis with events, status, and actionable recommendations.
Args:
namespace: Pod namespace
pod_name: Pod name
Returns:
Detailed diagnostic report with recommendations |
| inspect_gpu_pod | Run 'nvidia-smi' inside a GPU-enabled pod to view real-time process and memory details.
Why:
- Debug OOM: See exact memory usage per process.
- Verify allocation: Confirm the pod actually sees the GPU.
- Check processes: Identify zombie processes or unexpected workloads.
Args:
namespace: Pod namespace
pod_name: Pod name
Returns:
Output of nvidia-smi from inside the pod. |
| check_gpu_health | Check for GPU hardware errors (XID) and throttling events across the cluster.
Why:
- Detect Hardware Failures: XID errors often indicate physical GPU faults.
- Explain Performance Issues: Thermal or Power throttling explains why a model is slow.
Returns:
Markdown report of GPU health issues. |
| get_vllm_metrics | Monitor vLLM inference server performance metrics by directly querying pods.
Why:
- Performance monitoring: Track request latency and throughput
- Capacity planning: Monitor queue size and running requests
- Resource optimization: Track GPU cache usage
- Proactive alerting: Detect performance degradation
Args:
namespace: Optional namespace filter
pod_filter: Optional pod name filter (supports partial match)
Returns:
Markdown report of vLLM metrics |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |