Skip to main content

OpenShift MCP Server

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
No arguments

Capabilities

Server capabilities have not been inspected yet.

Tools

Functions exposed to the LLM to take actions

Name	Description
get_cluster_storage_report	Use this FIRST. Fast, high-level summary of storage usage for all nodes or a specific node. Checks quotas and reported usage from the Kubelet API. If 'node' is provided, analyzes only that node. If 'node' is not provided, analyzes all worker nodes.
inspect_node_storage_forensics	SLOW operation (10s+). Performs a deep forensic analysis of a node's storage. Use this tool ONLY when: 1. A specific node is known to be problematic (full disk). 2. 'get_storage_usage' does not reveal the root cause. This tool runs a debug pod on the node to calculate: - Real disk usage (df -h). - Reclaimable space from UNUSED images. - Growth of container writable layers (indicating log/file issues inside containers).
check_persistent_volume_capacity	Monitor Persistent Volume Claim (PVC) capacity usage across the cluster. Critical for preventing database crashes and data loss due to full disks. This checks PVCs (persistent storage) which is distinct from ephemeral storage. Args: namespace: Optional namespace to filter PVCs. If None, checks all namespaces. threshold: Alert threshold percentage (default: 85%). PVCs above this will be flagged. Returns: Formatted report of PVC usage with warnings for volumes exceeding threshold.
get_cluster_resource_balance	Analyze cluster resource balance, focusing on Request vs Usage gaps. Why Essential: - Scheduling bottleneck diagnosis: Explains why pods are Pending despite capacity. - Resource fragmentation detection: Identifies "request vs usage" gaps. - Cost efficiency: Reveals over-provisioned nodes. Returns: Markdown table showing CPU/Memory Requests vs Usage per node.
detect_pod_restarts_anomalies	Identify unstable pods experiencing high restart rates. Why: - Application stability indicator: High restart rates signal code issues (OOM, panics, misconfigurations) - Proactive detection: Catches intermittent failures before they become incidents - Actionable: Directly points to problematic workloads Args: threshold: Minimum number of restarts to flag (default: 5). duration: Window of time to analyze (e.g., '1h', '24h', '10m'). Returns: Markdown report of unstable pods.
get_gpu_utilization	Monitor GPU usage and health across the cluster. Why: - Cost efficiency: GPUs are expensive. Low utilization indicates wasted money. - Resource optimization: Identifies idle GPUs that could be deallocated. - Hardware health: High error rates indicate hardware issues. Prerequisites: - NVIDIA GPU Operator installed (exports DCGM metrics). Returns: Markdown report of GPU utilization per node.
get_pod_logs	Retrieve container logs from a pod. Args: namespace: Pod namespace pod_name: Pod name container: Specific container name (if None, gets all containers) previous: Get logs from previous container instance tail: Number of recent lines to retrieve (default: 100) since: Time duration to retrieve logs from (e.g., "1h", "30m") Returns: Formatted logs with container separation
get_pod_diagnostics	Comprehensive pod health analysis with events, status, and actionable recommendations. Args: namespace: Pod namespace pod_name: Pod name Returns: Detailed diagnostic report with recommendations
inspect_gpu_pod	Run 'nvidia-smi' inside a GPU-enabled pod to view real-time process and memory details. Why: - Debug OOM: See exact memory usage per process. - Verify allocation: Confirm the pod actually sees the GPU. - Check processes: Identify zombie processes or unexpected workloads. Args: namespace: Pod namespace pod_name: Pod name Returns: Output of nvidia-smi from inside the pod.
check_gpu_health	Check for GPU hardware errors (XID) and throttling events across the cluster. Why: - Detect Hardware Failures: XID errors often indicate physical GPU faults. - Explain Performance Issues: Thermal or Power throttling explains why a model is slow. Returns: Markdown report of GPU health issues.
get_vllm_metrics	Monitor vLLM inference server performance metrics by directly querying pods. Why: - Performance monitoring: Track request latency and throughput - Capacity planning: Monitor queue size and running requests - Resource optimization: Track GPU cache usage - Proactive alerting: Detect performance degradation Args: namespace: Optional namespace filter pod_filter: Optional pod name filter (supports partial match) Returns: Markdown report of vLLM metrics

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/junzzhu/openshift-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server