kubernetes-mcp-server

Overview Schema Related Servers Score Discussions

kubernetes-mcp-server
docs

OTEL.md•16.8 KiB

# OpenTelemetry Observability The kubernetes-mcp-server supports distributed tracing and metrics via OpenTelemetry (OTEL). Observability is **optional** and disabled by default. ## What Gets Traced The server automatically traces all operations through middleware without requiring any code changes to individual tools: 1. **MCP Tool Calls** - Every tool invocation with details: - Tool name - Success/failure status - Duration - Error details (when applicable) 2. **HTTP Requests** - All HTTP endpoints when running in HTTP mode: - Request method and path - Response status - Client information - Duration **Note**: When running in STDIO mode only MCP tool calls are traced since there is no HTTP server. ## Metrics The server collects and exposes metrics through two mechanisms: 1. **Stats Endpoint** (`/stats`) - JSON endpoint for real-time statistics: - Tool call counts by name - Tool call errors - HTTP request counts by method/path/status - Server uptime 2. **OTLP Export** - When an endpoint is configured, metrics are also exported to your OTLP backend every 30 seconds. ## Quick Start ### 1. Run an OTLP Backend Locally **Option A: Jaeger (traces only)** ```bash docker run -d --name jaeger \ -e COLLECTOR_OTLP_ENABLED=true \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ docker.io/jaegertracing/all-in-one:latest ``` Access the Jaeger UI at http://localhost:16686 > **Note**: Jaeger only supports traces, not metrics. To disable metrics export and avoid warnings about `MetricsService` being unimplemented, set `OTEL_METRICS_EXPORTER=none`. **Option B: Grafana LGTM Stack (traces + metrics + logs)** For full observability with metrics support: ```bash docker run -d --name lgtm \ -p 3000:3000 \ -p 4317:4317 \ -p 4318:4318 \ docker.io/grafana/otel-lgtm:latest ``` Access Grafana at http://localhost:3000 (default credentials: admin/admin) ### 2. Enable Tracing ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 # Run the server npx -y kubernetes-mcp-server@latest ``` ### 3. View Traces Make some tool calls through your MCP client, then view traces in the Jaeger UI. ### Example Trace When you call `resources_get` for a Pod, you'll see a trace like this in Jaeger: ``` Trace ID: abc123def456789 Duration: 145ms └─ tools/call resources_get [145ms] ├─ mcp.method.name: tools/call ├─ gen_ai.tool.name: resources_get ├─ gen_ai.operation.name: execute_tool ├─ rpc.jsonrpc.version: 2.0 ├─ network.transport: pipe └─ Status: OK ``` If the tool call triggers an HTTP request (in HTTP mode), you'll also see: ``` Trace ID: abc123def456789 Duration: 150ms ├─ POST /message [150ms] │ ├─ http.request.method: POST │ ├─ url.path: /message │ ├─ http.response.status_code: 200 │ ├─ client.address: 192.168.1.100 │ │ │ └─ tools/call resources_get [145ms] ├─ mcp.method.name: tools/call ├─ gen_ai.tool.name: resources_get ├─ gen_ai.operation.name: execute_tool ├─ rpc.jsonrpc.version: 2.0 ├─ network.transport: tcp └─ Status: OK ``` ## Configuration OpenTelemetry can be configured via **TOML config file** or **environment variables**. Environment variables take precedence over TOML config values. **Note**: Telemetry is automatically enabled when an endpoint is configured. Use `enabled = false` in TOML to explicitly disable it. ### Configuration Reference | TOML Field | Environment Variable | Description | |------------|---------------------|-------------| | `enabled` | - | Explicit enable/disable (overrides all) | | `endpoint` | `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint URL | | `protocol` | `OTEL_EXPORTER_OTLP_PROTOCOL` | Protocol: `grpc` or `http/protobuf` | | `traces_sampler` | `OTEL_TRACES_SAMPLER` | Sampling strategy | | `traces_sampler_arg` | `OTEL_TRACES_SAMPLER_ARG` | Sampling ratio (0.0-1.0) | ### TOML Configuration Add a `[telemetry]` section to your config file: ```toml [telemetry] # Optional: explicitly enable/disable (omit to auto-enable when endpoint is set) enabled = true endpoint = "http://localhost:4317" # Protocol: "grpc" (default) or "http/protobuf" protocol = "grpc" # Trace sampling strategy # Options: "always_on", "always_off", "traceidratio", "parentbased_always_on", "parentbased_always_off", "parentbased_traceidratio" traces_sampler = "traceidratio" # Sampling ratio for ratio-based samplers (0.0 to 1.0) traces_sampler_arg = 0.1 ``` #### TOML Examples **Enable with endpoint:** ```toml [telemetry] endpoint = "http://localhost:4317" ``` **Production with sampling:** ```toml [telemetry] endpoint = "http://tempo-distributor:4317" traces_sampler = "traceidratio" traces_sampler_arg = 0.05 # 5% sampling ``` **Explicitly disable:** ```toml [telemetry] enabled = false ``` ### Environment Variables Environment variables take precedence over TOML config. This allows you to override config file settings at runtime. #### Endpoint ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 ``` **Note**: The server gracefully handles failures. If the endpoint is unreachable, the server logs a warning and continues without tracing. #### Optional Variables ```bash # Service name (defaults to "kubernetes-mcp-server") export OTEL_SERVICE_NAME=kubernetes-mcp-server # Service version (auto-detected from binary, rarely needs manual override) export OTEL_SERVICE_VERSION=1.0.0 # Additional resource attributes (useful for multi-environment deployments) export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production,team=platform" ``` #### Endpoint Protocols The server supports both gRPC and HTTP/protobuf protocols: ```bash # gRPC (default, port 4317) export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 # HTTP/protobuf (port 4318) export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf # Secure endpoints (HTTPS/gRPC with TLS) export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-secure.example.com:4317 # Custom CA certificate (for self-signed certificates) export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca.crt ``` #### Sampling Configuration By default, the server uses **`ParentBased(AlwaysSample)`** sampling: - **Root spans** (no parent): Always sampled (100%) - **Child spans**: Inherit parent's sampling decision This is ideal for development but may generate high trace volumes in production. #### Production Sampling For production with high traffic, use ratio-based sampling: ```bash # Sample 10% of traces export OTEL_TRACES_SAMPLER=traceidratio export OTEL_TRACES_SAMPLER_ARG=0.1 ``` #### Available Samplers - `always_on` - Sample everything (default for root spans) - `always_off` - Disable tracing entirely - `traceidratio` - Sample a percentage (requires `OTEL_TRACES_SAMPLER_ARG` between 0.0 and 1.0) - `parentbased_always_on` - Respect parent span, default to always_on - `parentbased_always_off` - Respect parent span, default to always_off - `parentbased_traceidratio` - Respect parent span, default to ratio #### Sampling Examples ```bash # Development: Sample everything export OTEL_TRACES_SAMPLER=always_on # Production: 5% sampling (good for high-traffic services) export OTEL_TRACES_SAMPLER=traceidratio export OTEL_TRACES_SAMPLER_ARG=0.05 # Temporarily disable tracing export OTEL_TRACES_SAMPLER=always_off # Or just unset the endpoint unset OTEL_EXPORTER_OTLP_ENDPOINT ``` ## Deployment Examples ### Claude Code (STDIO Mode) Add the MCP server to your project's `.mcp.json` or global `~/.claude/settings.json`: ```json { "mcpServers": { "kubernetes": { "command": "npx", "args": ["-y", "kubernetes-mcp-server@latest"], "env": { "OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317", "OTEL_TRACES_SAMPLER": "always_on" } } } } ``` **For Jaeger (traces only)**: Add `"OTEL_METRICS_EXPORTER": "none"` to disable metrics export. **Note**: In STDIO mode, only MCP tool calls are traced (no HTTP request spans). ### Kubernetes Deployment (HTTP Mode) ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: kubernetes-mcp-server spec: template: spec: containers: - name: kubernetes-mcp-server image: quay.io/containers/kubernetes_mcp_server:latest env: # OTLP endpoint (required to enable tracing) - name: OTEL_EXPORTER_OTLP_ENDPOINT value: "http://tempo-distributor.observability:4317" # Sampling (recommended for production) - name: OTEL_TRACES_SAMPLER value: "traceidratio" - name: OTEL_TRACES_SAMPLER_ARG value: "0.1" # 10% sampling # Resource attributes (helps identify this deployment) - name: OTEL_RESOURCE_ATTRIBUTES value: "deployment.environment=production,k8s.cluster.name=prod-us-west-2" # Kubernetes metadata (optional, helps correlate traces with K8s resources) - name: KUBERNETES_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: KUBERNETES_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: KUBERNETES_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName ``` **Note**: The Kubernetes metadata environment variables are optional but recommended for production deployments. They help correlate traces with specific pods, namespaces, and nodes. ### Docker ```bash docker run \ -e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4317 \ -e OTEL_TRACES_SAMPLER=always_on \ quay.io/containers/kubernetes_mcp_server:latest ``` ## Trace Attributes ### MCP Tool Call Spans Each tool call creates a span following MCP and OpenTelemetry semantic conventions: **Span Name Format**: `{mcp.method.name} {target}` (e.g., "tools/call resources_get") **Attributes**: - `mcp.method.name` - MCP protocol method (e.g., "tools/call") **[Required]** - `gen_ai.tool.name` - Name of the tool being called (e.g., "resources_get", "helm_install") **[Required for tool calls]** - `gen_ai.operation.name` - Set to "execute_tool" for tool calls **[Recommended]** - `rpc.jsonrpc.version` - JSON-RPC version (typically "2.0") **[Recommended]** - `network.transport` - Transport protocol: "pipe" for STDIO, "tcp" for HTTP **[Recommended]** - `error.type` - Error classification: "tool_error" for tool failures, "_OTHER" for other errors **[Conditional]** ### HTTP Request Spans HTTP requests create spans following [OpenTelemetry HTTP semantic conventions](https://opentelemetry.io/docs/specs/semconv/http/http-spans/): **Span Name Format**: `{METHOD} {path}` (e.g., "POST /message") **Attributes**: - `http.request.method` - Request method (GET, POST, etc.) **[Required]** - `url.path` - URL path **[Required]** - `url.scheme` - URL scheme (http or https) **[Required]** - `server.address` - Server host **[Recommended]** - `network.protocol.name` - Protocol name (http) **[Recommended]** - `network.protocol.version` - Protocol version (HTTP/1.1, HTTP/2) **[Recommended]** - `client.address` - Client IP address **[Recommended]** - `http.route` - Normalized route pattern (when different from path) **[Conditional]** - `user_agent.original` - User agent string (when present) **[Conditional]** - `http.request.body.size` - Request body size (when present) **[Conditional]** - `http.response.status_code` - Response status code **[Required]** - `error.type` - HTTP status code for 4xx/5xx responses **[Conditional]** **Note**: HTTP spans only appear when running in HTTP mode. STDIO mode (Claude Code) only creates MCP tool call spans. The `/healthz` endpoint is not traced to reduce noise. ## Stats Endpoint When running in HTTP mode, the server exposes a `/stats` endpoint that returns real-time statistics as JSON: ```bash curl http://localhost:8080/stats ``` Example response: ```json { "total_tool_calls": 42, "tool_call_errors": 2, "tool_calls_by_name": { "resources_list": 15, "pods_get": 12, "helm_list": 10, "resources_get": 5 }, "total_http_requests": 100, "http_requests_by_path": { "/mcp": 50, "/sse": 30, "/message": 20 }, "uptime_seconds": 3600.5 } ``` The stats endpoint is useful for: - Health monitoring and alerting - Quick debugging without a full observability stack - Integration with simple monitoring systems **Note**: The `/stats` endpoint is only available in HTTP mode. In STDIO mode, use OTLP export for metrics. ## Metrics Endpoint When running in HTTP mode, the server exposes a `/metrics` endpoint for Prometheus scraping: ```bash curl http://localhost:8080/metrics ``` This endpoint returns metrics in OpenMetrics/Prometheus text format, suitable for scraping by Prometheus or compatible systems. ### Available Metrics | Metric | Type | Description | |--------|------|-------------| | `mcp_tool_calls_total` | Counter | Total MCP tool calls (labeled by `tool_name`) | | `mcp_tool_errors_total` | Counter | Total MCP tool errors (labeled by `tool_name`) | | `mcp_tool_duration_seconds` | Histogram | Tool call duration in seconds | | `http_server_requests_total` | Counter | HTTP requests (labeled by `http_request_method`, `url_path`, `http_response_status_class`) | | `mcp_server_info` | Gauge | Server info (labeled by `version`, `go_version`) | ### Prometheus Scrape Configuration ```yaml scrape_configs: - job_name: 'kubernetes-mcp-server' static_configs: - targets: ['localhost:8080'] metrics_path: /metrics ``` ### Kubernetes ServiceMonitor When deployed in Kubernetes with the Helm chart, enable the ServiceMonitor: ```yaml metrics: serviceMonitor: enabled: true interval: 30s ``` **Note**: The `/metrics` endpoint is only available in HTTP mode. ## Troubleshooting ### Tracing not working? 1. **Check endpoint is set**: ```bash echo $OTEL_EXPORTER_OTLP_ENDPOINT ``` 2. **Check server logs** (increase verbosity): ```bash # Look for "OpenTelemetry tracing initialized successfully" kubernetes-mcp-server -v 2 ``` If tracing fails to initialize, you'll see: ``` Failed to create OTLP exporter, tracing disabled: <error details> ``` 3. **Verify OTLP collector is reachable**: ```bash # For gRPC endpoint (port 4317) telnet localhost 4317 # For HTTP endpoint (port 4318) curl http://localhost:4318/v1/traces ``` ### No traces appearing in backend? 1. **Check sampling** - you might be sampling at 0% or using `always_off`: ```bash echo $OTEL_TRACES_SAMPLER echo $OTEL_TRACES_SAMPLER_ARG ``` 2. **Verify service name**: ```bash echo $OTEL_SERVICE_NAME ``` Search for this service name in your tracing UI (defaults to "kubernetes-mcp-server"). 3. **Check backend configuration** - ensure your OTLP collector is forwarding to the right backend. 4. **Verify protocol compatibility**: - If using HTTP-based backends, ensure you set `OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf` - Check if you need port 4317 (gRPC) or 4318 (HTTP) ### TLS/Certificate Issues If using HTTPS/secure endpoints: 1. **Certificate errors**: ```bash # Provide custom CA certificate export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca.crt ``` 2. **Self-signed certificates**: ```bash # For testing only - not recommended for production export OTEL_EXPORTER_OTLP_INSECURE=true ``` ## Performance Impact Tracing has minimal performance overhead: - **Middleware tracing**: Typically 1-2ms per tool call - **Network overhead**: Spans are batched and exported every 5 seconds - **Memory**: Approximately 1-5MB for span buffers - **CPU**: Negligible (<1% for most workloads) For production deployments with high traffic, use ratio-based sampling to reduce costs while maintaining observability. ## Advanced Topics ### Resource Detection The OpenTelemetry SDK automatically detects and adds resource attributes from the environment: - **Host information**: hostname, OS, architecture - **Process information**: PID, executable name - **Container information**: container ID (when running in containers) - **Kubernetes information**: pod name, namespace (when K8s env vars are present) These are merged with any attributes you set via `OTEL_RESOURCE_ATTRIBUTES`. ### Distributed Tracing When the kubernetes-mcp-server is part of a distributed system: 1. **Parent spans** are automatically detected and respected 2. **Trace context** is propagated via standard W3C Trace Context headers 3. **Sampling decisions** from parent spans are inherited (via ParentBased sampler) This means traces can span multiple services seamlessly. ### Custom Resource Attributes Add custom attributes to help identify and filter traces: ```bash export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=staging,team=platform,region=us-west-2,version=v1.2.3" ``` These attributes appear on **all spans** from this service instance and are useful for: - Filtering traces by environment (prod vs staging) - Analyzing performance by region or deployment - Tracking issues to specific versions or teams

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/containers/kubernetes-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

OTEL.md•16.8 KiB