Skip to main content
Glama

issues

Read-only

Identify currently failing Kubernetes resources including crashing pods, missing references, scheduling blockers, and unhealthy CRDs. Returns ranked issues with severity for triage.

Instructions

Use when the agent's decision is 'what's broken right now?' — LIVE OPERATIONAL STATE, not config posture. Returns a ranked list of currently failing resources: failing Deployments/StatefulSets/CronJobs/HPAs/Nodes/Jobs/PVCs, dangling-reference errors like Pod→missing PVC/CM/Secret/SA, HPA→missing scaleTargetRef, Ingress→missing backend Service, RoleBinding→missing Role, webhook→missing Service, pod startup blockers — why a Pod can't reach Running: unschedulable (arch/taint/resources/affinity), admission-rejected (quota/PodSecurity/webhook), or stuck post-bind (CNI/volume), and False .status.conditions on CRDs from Argo/Flux/Knative/Crossplane/cert-manager/KEDA. Severity normalized to critical/warning. This is one curated stream — there is no source filter; each row carries a source label (problem|missing_ref|scheduling|condition) you can slice on via the CEL filter= if needed. Some rows include diagnostic_context: deterministic facts such as explicit missing refs, selected backend issues, or workload rollups; treat these as triage context, not proof of root cause. When recent_changes is present, consider it if the issue list does not explain the reported symptom; recent_changes_reason says why Radar attached it. It lists recent spec/config changes that may explain failures not yet visible as runtime issues, or help distinguish creation-time baseline failures from the active incident. For raw Kubernetes Warning events use get_events; for static best-practice / security-posture findings (runAsRoot, missing PDB, no probes, missing resource limits) use get_cluster_audit — a separate axis that must never be conflated (a healthy pod can have many audit findings; a crashing pod can have zero). Kyverno PolicyReport violations are not in either — they surface per-resource via get_resource's resourceContext policy rollup. After identifying a suspect issue, call diagnose when the affected resource is a workload (Pod/Deployment/StatefulSet/DaemonSet) or GitOps reconciler (Application/Kustomization/HelmRelease). For other non-workload kinds, call get_resource. Use get_neighborhood when the failure likely crosses Services/workloads/Pods/dependencies. Use namespace for app-local triage; omit it when the root may be cluster-scoped or outside the app namespace.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
namespaceNofilter to one namespace
severityNocomma-separated: critical,warning
kindNocomma-separated kind filter (e.g. Deployment,Pod)
limitNomax issues returned (default 200, max 1000)
filterNooptional CEL boolean expression run against each composed Issue. Bindings: severity (critical|warning), category (e.g. crashloop, image_pull_failed, missing_config_ref, gitops_sync_failed), category_group (startup|runtime|scheduling|configuration|networking|storage|scaling|security|control_plane; runtime here is an issue taxonomy group, not issue_timing), source (problem=built-in Radar detector, missing_ref=dangling by-name reference, scheduling=pod startup blocker, condition=False controller/CRD condition), kind, group, ns (the namespace — use 'ns', not 'namespace' which is a CEL reserved word), name, reason, message, cause, action, remediation_kind, remediation_target, count (int, the affected-resource fan-out), grouping_scope (workload|service|node|…), restart_count (int), last_terminated_reason, operation_retry_count (int, a GitOps controller's sync-operation retries — distinct from restart_count), stuck (bool, issue not expected to self-recover), issue_timing (string timing evidence: 'started_at_resource_creation' = evidence places the failing state during resource creation or first reconciliation; 'started_after_resource_was_healthy' = evidence shows a meaningful healthy window before the failing condition appeared; absent = Radar has no clean signal, do NOT infer timing from age alone; this is timing evidence, not a root-cause verdict), issue_timing_basis (string: evidence used — 'condition' | 'owner_condition' | 'pod_creation' | 'deletion' | 'phase' | 'spec'), first_seen + last_seen (unix seconds — prefer first_seen for onset/age; last_seen churns to compose-time). For cross-cluster scoping use clusters= (not a CEL predicate). Examples: 'severity == "critical" && count > 5', 'category_group == "startup"', 'restart_count > 10', 'remediation_kind == "create-namespace"', 'stuck && operation_retry_count >= 5', 'issue_timing == "started_after_resource_was_healthy"', 'first_seen < timestamp("2026-05-01T00:00:00Z").getSeconds()'

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/skyhook-io/radar'

If you have feedback or need assistance with the MCP directory API, please join our Discord server