Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| LOG_LEVEL | No | Logging verbosity (DEBUG, INFO, WARNING, ERROR). | INFO |
| KUBECONFIG | No | Path to kubeconfig file. Useful for multiple clusters or custom kubeconfig locations. | ~/.kube/config |
| K8S_NAMESPACE | No | Alternative namespace variable to KUBERNETES_NAMESPACE. | |
| PROMETHEUS_URL | No | Prometheus server URL for metrics. Custom Prometheus endpoint or non-standard port. Auto-detected if not provided. | |
| PYTHONUNBUFFERED | No | Disable Python output buffering. Recommended to be set to '1' for MCP clients to see real-time logs. | |
| KUBERNETES_NAMESPACE | No | Namespace for K8s mode. Set when running the server inside a Kubernetes pod to enable HTTP streaming transport. | |
| MCP_SERVER_LOG_LEVEL | No | MCP framework log level for troubleshooting MCP protocol issues. | INFO |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| list_namespaces | List all namespaces in the Kubernetes cluster.
Returns:
List[str]: Alphabetically sorted namespace names. Empty list if access denied or cluster unreachable. |
| list_pipelineruns | List Tekton PipelineRuns in a namespace with status and timing details.
Args:
namespace: Kubernetes namespace to query.
Returns:
List[Dict]: PipelineRuns with keys: name, pipeline, status, started_at, completed_at, duration.
Empty list if none found. [{"error": "msg"}] on failure. |
| list_taskruns | List Tekton TaskRuns in a namespace, optionally filtered by a specific PipelineRun.
Args:
namespace: Kubernetes namespace to query.
pipeline_run: Optional PipelineRun name to filter by.
Returns:
List[Dict]: TaskRuns with keys: name, task, pipeline_run, status, started_at, completed_at, duration. |
| list_pods_in_namespace | List all pods in a Kubernetes namespace with status and placement info.
Args:
namespace: Kubernetes namespace to query.
Returns:
List[Dict]: Pods with keys: name, status, ip, node_name, creation_timestamp,
restart_count, container_states (list of waiting/terminated reasons). |
| get_kubernetes_resource | Retrieve details about a Kubernetes/Tekton resource.
Args:
resource_type: Resource type. Supported: pod, service, configmap, secret, pvc, namespace, node,
serviceaccount, endpoints, event, persistentvolume, resourcequota, limitrange,
deployment, replicaset, daemonset, statefulset, job, cronjob, ingress,
storageclass, hpa (horizontalpodautoscaler),
pipelinerun, taskrun, pipeline, task, clustertask,
triggertemplate, triggerbinding, eventlistener,
podmonitor, servicemonitor, prometheusrule, alertmanager.
name: Resource name.
namespace: Namespace (default: "default").
output_format: "summary", "detailed", or "yaml" (default: "summary").
Returns:
str: Formatted resource information. |
| get_pipelinerun_logs | Fetch logs from all pods in a Tekton PipelineRun with adaptive volume management.
Prioritizes failed pods and manages token budgets automatically when no time/line filters specified.
Args:
pipelinerun_name: PipelineRun name.
namespace: Kubernetes namespace.
clean_logs: Clean and format logs (default: True).
tail_lines: Lines from end (optional).
since_seconds: Logs newer than N seconds (optional).
since_time: Logs newer than RFC3339 timestamp (optional).
timestamps: Include timestamps (default: True).
previous: Get logs from previous container instance (default: False).
max_token_budget: Maximum tokens for output (default: 120000). Applies to both adaptive and manual modes.
Returns:
Dict[str, Any]: Pod names as keys, logs as values. Includes "_metadata" with processing info. |
| check_resource_constraints | Check for resource constraints in a namespace that may impact pipelines.
Identifies: pending/unschedulable pods, OOMKilled containers, CrashLoopBackOff,
ImagePullBackOff, high restart counts, and resource quota utilization.
Args:
namespace: Kubernetes namespace to inspect.
Returns:
Dict[str, Any]: Keys: status (Healthy/Warning/Critical/Error), summary, resource_quotas,
pending_pods_due_to_resources, oom_killed_containers, container_issues,
high_utilization_quotas, recommendations. |
| detect_anomalies | Detect anomalies in Tekton PipelineRuns/TaskRuns using z-score statistical analysis.
Identifies unusually long execution times (threshold: 2.5 standard deviations from mean).
Args:
namespace: Kubernetes namespace to analyze.
limit: Max recent PipelineRuns to analyze (default: 50).
Returns:
Dict: Keys: pipeline_anomalies, task_anomalies (lists with anomaly details). |
| smart_get_namespace_events | Adaptive event analysis for a namespace with automatic volume management.
When no constraints specified, automatically: estimates volume, applies smart time windows,
prioritizes errors/warnings, samples within token limits.
Args:
namespace: Kubernetes namespace to analyze.
last_n_events: Exact event count (only if user specifies).
time_period: Exact time window (only if user specifies).
strategy: "auto" for adaptive behavior (default).
focus_areas: Areas to emphasize (default: ["errors", "warnings", "failures"]).
max_context_tokens: Max output tokens (default: 8000).
include_summary: Include summary and insights (default: True).
severity_filter: Filter by severity levels.
resource_filter: Filter by resource type.
Returns:
Dict: Events with adaptive filtering, insights, and recommendations. |
| analyze_logs | Analyze log text to extract error patterns and insights.
Args:
log_text: Log content string (single entry, multiple lines, or full log file).
Returns:
Dict[str, Any]: Keys: error_count, error_patterns, categorized_errors, summary. |
| analyze_failed_pipeline | Perform root cause analysis on a failed Tekton PipelineRun.
Fetches pipeline/task details, analyzes logs for errors, and provides remediation recommendations.
Args:
namespace: Kubernetes namespace of the PipelineRun.
pipeline_run: Name of the failed PipelineRun.
Returns:
Dict[str, Any]: Keys: pipeline_name, pipeline_status, overall_message, failed_task_count,
failed_tasks, probable_root_cause, recommended_actions. |
| list_recent_pipeline_runs | List recent Tekton PipelineRuns across all accessible namespaces, sorted by start time.
Args:
limit: Max PipelineRuns to retrieve (default: 10).
Returns:
Dict[str, List[Dict]]: Namespace to PipelineRun list. Each run has: namespace, name,
start_time, status, pipeline, labels. |
| find_pipeline | Find Tekton pipelines matching a pattern across all accessible namespaces.
Searches PipelineRuns/TaskRuns by name, labels, or annotations using cluster-wide queries.
Args:
pipeline_id_pattern: Pattern to match (partial name, label value, or substring).
include_taskruns: Include TaskRuns in search results (default: False for performance).
max_results: Maximum matching results to return per resource type (default: 100).
namespaces: Optional list of namespaces to search (default: all namespaces).
pipeline_runs_limit: Max PipelineRuns to fetch from API (default: 1000).
task_runs_limit: Max TaskRuns to fetch from API if include_taskruns=True (default: 500).
Returns:
Dict[str, Any]: Keys: pipeline_runs, task_runs, pipelines_as_code, all_namespaces_checked,
diagnostic_info, substring_matches. |
| get_tekton_pipeline_runs_status | Get cluster-wide status summary of all Tekton PipelineRuns and TaskRuns.
Shows running/succeeded/failed counts, recent failures, and long-running pipelines (>1 hour).
Args:
pipeline_runs_limit: Max PipelineRuns to fetch cluster-wide (default: 500).
task_runs_limit_per_namespace: Max TaskRuns to fetch per namespace (default: 100).
max_namespaces: Max namespaces to scan for TaskRuns (default: 20).
recent_failures_limit: Max recent failures to include in output (default: 10).
long_running_limit: Max long-running pipelines to include (default: 5).
Returns:
Dict[str, Any]: Keys: timestamp, sampling_info, pipeline_runs (total, by_status,
recent_failures [top N], failures_by_namespace, long_running [top N]),
task_runs (total, by_status, recent_failures [top N], failures_by_namespace),
insights. |
| detect_log_anomalies | Detect anomalies in log data using error frequency, pattern repetition, and timestamp analysis.
Args:
logs: Raw log content (newline-separated entries).
baseline_patterns: Optional expected error patterns for comparison.
severity_threshold: "low" (most sensitive), "medium", or "high" (least sensitive).
Returns:
Dict[str, Any]: Keys: anomaly_detected (bool), anomaly_details, analysis_summary. |
| search_resources_by_labels | Search Kubernetes resources by labels across multiple resource types and namespaces.
Args:
resource_types: Types to search (e.g., ["pods", "services", "deployments"]).
label_selectors: Criteria list [{"key": str, "value": str, "operator": "equals|exists|not_equals|in|not_in"}].
namespaces: Namespaces to search (default: all).
field_selectors: Additional field selectors.
limit_per_type: Max results per type (default: 100).
include_metadata_only: Return only metadata (default: False).
include_status: Include status info (default: True).
sort_by: "name", "namespace", "creation_time", or "labels" (default: "creation_time").
sort_order: "asc" or "desc" (default: "desc").
Returns:
Dict: Search results with resource details, analysis, and recommendations. |
| prometheus_query | Execute PromQL queries against Prometheus for cluster metrics.
Supports instant and range queries with automatic endpoint discovery and authentication.
Args:
query: PromQL query string.
query_type: "instant" or "range" (default: "instant").
start_time: Start for range queries (ISO 8601 or Unix timestamp).
end_time: End for range queries (ISO 8601 or Unix timestamp).
step: Step interval for range queries (default: "300s").
cluster: Cluster domain override.
format: "json", "table", or "csv" (default: "json").
namespace_filter: Regex to filter by namespace.
limit: Max results to return.
timeout: Query timeout in seconds (default: 30).
Returns:
Dict: Query results, metadata, execution info, and analysis. |
| smart_summarize_pod_logs | Adaptive pod log analysis with automatic volume management and multi-pass processing.
When no time constraints specified, automatically estimates volume and selects optimal time windows.
Args:
namespace: Kubernetes namespace.
pod_name: Pod name to analyze.
container_name: Specific container (if multiple).
summary_level: "brief", "detailed", or "comprehensive" (default: "detailed").
focus_areas: Analysis focus (default: ["errors", "warnings", "performance"]).
time_segments: Time-based segments to analyze (default: 5).
max_context_tokens: Max tokens for analysis (default: 10000).
since_seconds: Only if user specifies exact seconds.
tail_lines: Only if user specifies exact line count.
time_period: Only if user specifies period (e.g., "1h", "30m").
start_time: Only if user specifies exact start time.
end_time: Only if user specifies exact end time.
Returns:
Dict[str, Any]: Log analysis with insights, patterns, and recommendations. |
| investigate_tls_certificate_issues | Investigate TLS/certificate issues across the cluster with targeted search and analysis.
Searches system namespaces for TLS error patterns and correlates with certificate events.
Args:
search_pattern: TLS error pattern (default: "tls: bad certificate").
time_range: Search time range (default: "24h").
max_namespaces: Max namespaces to search (default: 20).
focus_on_system_namespaces: Prioritize system namespaces (default: True).
Returns:
Dict: TLS issues, affected pods, certificate problems, and remediation suggestions. |
| conservative_namespace_overview | Conservative namespace analysis optimized for large namespaces with strict token limits.
Smart-samples critical pods (failed, high-restart, error states) for rapid issue detection.
Args:
namespace: Kubernetes namespace to analyze.
max_pods: Maximum pods to analyze (default: 10).
focus_areas: Areas to focus on (default: ["errors", "warnings"]).
sample_strategy: "smart" for intelligent sampling, "recent" for newest pods.
Returns:
Dict: Analysis results with pod health, issues detected, and recommendations. |
| adaptive_namespace_investigation | Adaptive namespace investigation with progressive analysis and token budget management.
Best for medium namespaces (5-30 pods). Prioritizes failed/error pods, correlates events.
Args:
namespace: Kubernetes namespace to investigate.
investigation_query: What to investigate (default: "investigate all logs and events for potential issues").
max_pods: Maximum pods to analyze (default: 20).
focus_areas: Areas to focus on (default: ["errors", "warnings", "performance"]).
token_budget: Max tokens for investigation (default: 200000).
Returns:
Dict: Pod analysis, event correlation, findings, and recommendations. |
| get_etcd_logs | Retrieve etcd pod logs from Kubernetes/OpenShift with flexible time and line filtering.
Auto-detects cluster type and uses appropriate namespace/label selectors.
Args:
tail_lines: Lines from end of logs (default: 200, None for all).
since_seconds: Logs newer than N seconds (overrides tail_lines).
since_time: Logs newer than RFC3339 timestamp (overrides since_seconds).
until_time: Logs older than RFC3339 timestamp (requires since_time or since_seconds).
follow: Stream logs in real-time (default: False).
timestamps: Include timestamps (default: True).
previous: Get logs from previous container instance (default: False).
clean_logs: Clean/format logs (default: True).
Returns:
Dict[str, str]: Pod names as keys, logs as values. |
| stream_analyze_pod_logs | Stream and analyze pod logs in chunks with progressive pattern detection.
Processes logs in manageable chunks for memory efficiency and real-time insights.
Args:
namespace: Kubernetes namespace.
pod_name: Pod name to stream logs from.
container_name: Specific container (if multiple).
chunk_size: Lines per chunk (default: 5000).
analysis_mode: "errors_only", "errors_and_warnings" (default), "full_analysis", or "custom_patterns".
time_window: Time window for historical logs (e.g., "1h", "6h", "24h").
follow: Stream logs in real-time (default: False).
max_chunks: Max chunks to process (default: 50).
since_seconds: Logs from last N seconds.
tail_lines: Limit to last N lines.
time_period: Time period (e.g., "1h", "30m").
start_time: Start time (ISO format).
end_time: End time (ISO format).
max_context_tokens: Maximum tokens for output (default: 50000).
Returns:
Dict[str, Any]: Keys: chunks, overall_summary, trending_patterns, recommendations, metadata. |
| analyze_pod_logs_hybrid | Hybrid log analyzer with intelligent strategy selection and caching.
Automatically selects best analysis approach based on context and urgency.
Args:
namespace: Kubernetes namespace.
pod_name: Pod name to analyze.
container_name: Specific container (if multiple).
strategy: "auto" (default), "smart_summary", "streaming", or "hybrid".
request_type: "investigation", "troubleshooting", or "monitoring".
urgency: "low", "medium" (default), "high", or "critical".
use_cache: Use intelligent caching (default: True).
custom_params: Custom parameters for strategies.
Returns:
Dict[str, Any]: Keys: strategy_used, analysis_results, supplementary_insights,
performance_metrics, recommendations, cache_info. |
| progressive_event_analysis | Progressive event analysis with multiple detail levels and correlation detection.
Args:
namespace: Kubernetes namespace to analyze.
analysis_level: "overview", "detailed", "correlation", or "deep_dive" (default: "overview").
time_period: Time window (e.g., "2h", "4h", "1d").
event_filters: Filters like {"severity": ["CRITICAL"], "category": ["FAILURE"]}.
seed_event_id: Event ID for correlation analysis.
focus_areas: Areas to emphasize (default: ["errors", "warnings", "failures"]).
Returns:
Dict: Analysis results based on selected level. |
| advanced_event_analytics | Advanced ML-powered event analytics with log/metrics integration and runbook suggestions.
Args:
namespace: Kubernetes namespace to analyze.
time_period: Time window (e.g., "4h", "1d", "12h").
include_ml_patterns: Enable ML pattern detection (default: True).
include_log_correlation: Correlate with log data (default: True).
include_metrics_correlation: Correlate with metrics (default: True).
include_runbook_suggestions: Generate runbook suggestions (default: True).
analysis_depth: "basic", "comprehensive" (default), or "deep".
Returns:
Dict: Advanced analytics with ML insights, correlations, and runbook suggestions. |
| automated_triage_rca_report_generator | Generate automated Root Cause Analysis (RCA) report for pipeline/pod failures.
Performs log analysis, resource checks, event correlation, and provides remediation suggestions.
Args:
failure_identifier: Pipeline run name, pod name, or failure event ID.
namespace: Optional namespace where the failure occurred. If not provided, searches across detected CI/CD namespaces.
investigation_depth: "quick", "standard" (default), or "deep".
include_related_failures: Analyze related recent failures (default: True).
time_window: Time window for related events (default: "2h").
generate_timeline: Generate event timeline (default: True).
include_remediation: Include remediation steps (default: True).
Returns:
Dict: RCA report with summary, timeline, root cause, diagnostics, and remediation. |
| check_cluster_certificate_health | Scan for expiring certificates across the cluster to prevent service disruptions.
Scans TLS secrets, system certificates, and provides renewal recommendations.
Args:
warning_threshold_days: Days before expiration for warning (default: 30).
critical_threshold_days: Days before expiration for critical alert (default: 7).
include_system_certs: Include system certificates (default: True).
include_user_certs: Include user certificates (default: True).
namespaces: Namespaces to scan (default: all accessible).
certificate_types: Types to check: "tls", "ca", "client", "server" (default: all).
Returns:
Dict: Certificate health with expiration timeline, recommendations, and security findings. |
| ci_cd_performance_baselining_tool | Establish performance baselines for pipelines and flag runs deviating from historical norms.
Uses Prometheus metrics from Tekton controller for accurate historical performance data.
Falls back to Kubernetes API if Prometheus is unavailable.
Args:
pipeline_names: Pipelines to analyze (default: all).
baseline_period: "7d", "30d" (default), or "90d".
deviation_threshold: Std deviations to trigger alerts (default: 2.0).
performance_metrics: Metrics: "duration", "cpu", "memory", "success_rate" (default: all).
update_frequency: "daily" (default) or "weekly".
include_task_level: Include task-level analysis (default: True).
Returns:
Dict: Baselines, recent runs analysis, trends, and optimization opportunities. |
| pipeline_tracer | Trace a logical operation (commit, PR, image) as it flows through pipelines.
Correlates pipeline runs using labels, annotations, and artifact references.
Args:
trace_identifier: Commit SHA, PR number, image tag, or custom trace ID.
trace_type: "commit", "pr", "image", or "custom".
start_time: ISO 8601 start timestamp.
end_time: ISO 8601 end timestamp.
include_artifacts: Include artifact details (default: True).
trace_depth: "shallow" or "deep" (default: "deep").
namespaces: Specific namespaces to search (skips auto-detection).
max_namespaces: Maximum namespaces to search when auto-detecting (default: 50).
Returns:
Dict: Pipeline flow, artifacts, bottlenecks, and summary. |
| get_machine_config_pool_status | Monitor OpenShift Machine Config Pools for node configuration and update rollouts.
Analyzes pool status, update progress, and configuration drift.
Args:
pool_names: Pools to monitor (default: all).
include_node_details: Include node status per pool (default: True).
show_config_diff: Show config differences during updates (default: False).
include_update_history: Include update history (default: True).
filter_updating: Only show updating pools (default: False).
Returns:
Dict: Keys: pools_overview, machine_config_pools, recent_config_changes, issues,
update_recommendations. |
| get_openshift_cluster_operator_status | Check health and status of OpenShift cluster operators for platform functionality.
Analyzes operator conditions, versions, and dependencies.
Args:
operator_names: Operators to check (default: all).
include_conditions: Include condition details (default: True).
show_version_info: Include version info (default: True).
filter_degraded: Only show operators with issues (default: False).
include_dependencies: Show operator dependencies (default: False).
Returns:
Dict: Keys: cluster_info, operator_status, health_summary, critical_issues, dependencies. |
| live_system_topology_mapper | Generate real-time dependency graph of Kubernetes/Tekton components and their interconnections.
Maps Services, Deployments, Pipelines, PVCs, and their relationships via ownerReferences and selectors.
Args:
cluster_names: Clusters to map (default: all).
component_types: Filter by types (services, deployments, pipelines, pvcs, etc.). Note: secrets are NOT included by default.
namespace_filter: Regex pattern to filter namespaces.
depth_limit: Max dependency depth (default: 5).
include_metrics: Include resource metrics (default: False).
output_format: "json" (default), "graphviz", or "mermaid".
skip_on_permission_denied: Continue mapping other resources if permission denied (default: True).
Returns:
Dict: Topology graph with nodes, edges, summary, metadata, and permission report. |
| predictive_log_analyzer | Predict failures using ML analysis of historical log patterns before critical outages occur.
Uses anomaly detection algorithms to correlate log patterns with failure events.
Args:
prediction_window: Time window - "1h", "6h", "24h", "7d" (default: "6h").
confidence_threshold: Min confidence for predictions 0.0-1.0 (default: 0.75).
log_sources: Sources to analyze - pods, services, nodes (default: all).
failure_types: Types to predict - pod_crash, resource_exhaustion, network_issues.
historical_data_range: Historical data period (default: "30d").
model_refresh_interval: Model retrain frequency (default: "24h").
namespaces: Specific namespaces to analyze (default: auto-detect active namespaces).
max_namespaces: Maximum namespaces to scan when auto-detecting (default: 20).
Returns:
Dict: Keys: predictions, model_performance, anomaly_scores, trend_analysis. |
| resource_bottleneck_forecaster | Forecast resource bottlenecks by analyzing utilization trends and predicting exhaustion points.
Uses time-series analysis to predict CPU, memory, disk, and network capacity constraints.
Args:
forecast_horizon: Forecast window - "1h", "6h", "24h", "7d", "30d" (default: "24h").
resource_types: Resources to analyze - cpu, memory, disk, network, pvc (default: all).
clusters: Specific clusters to analyze (default: all).
namespaces: Specific namespaces to focus on.
confidence_level: Statistical confidence 0.80-0.99 (default: 0.95).
trend_analysis_period: Historical period for trends (default: "7d").
alerting_threshold: Alert threshold percentage (default: 0.80).
Returns:
Dict: Keys: forecasts, capacity_recommendations, cluster_overview, historical_accuracy. |
| semantic_log_search | Search logs using natural language queries with semantic understanding beyond keyword matching.
Uses NLP for query interpretation, Kubernetes/Tekton entity recognition, and relevance ranking.
Args:
query: Natural language query describing what to search for.
time_range: Time range - "1h", "6h", "24h", "7d" (default: "1h").
namespaces: Specific namespaces to search (default: auto-detect relevant namespaces).
severity_levels: Log severity levels to include.
max_results: Maximum results to return (default: 100).
context_lines: Surrounding lines per match (default: 3).
group_similar: Group similar log entries (default: True).
Returns:
Dict: Keys: query_interpretation, search_results, result_summary, suggestions. |
| what_if_scenario_simulator | Simulate impact of configuration changes before applying to live system with risk assessment.
Uses Monte Carlo simulation and load modeling based on historical data.
Args:
scenario_type: Type - "resource_limits", "scaling", "configuration", "deployment".
changes: Changes to simulate with before/after values.
scope: Simulation scope - clusters, namespaces, components.
simulation_duration: Duration - "1h", "24h", "7d" (default: "24h").
load_profile: Expected load - "current", "peak", "custom" (default: "current").
risk_tolerance: Risk level - "conservative", "moderate", "aggressive" (default: "moderate").
Returns:
Dict: Keys: simulation_id, impact_analysis, risk_assessment, affected_components, recommendations. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |