Prometheus MCP Server

by pab1it0
Verified
# Usage Guide This guide explains how to use the Prometheus MCP Server with AI assistants like Claude. ## Available Tools The Prometheus MCP Server provides several tools that AI assistants can use to interact with your Prometheus data: ### Query Tools #### `execute_query` Executes an instant PromQL query and returns the current value(s). **Parameters:** - `query`: PromQL query string (required) - `time`: Optional RFC3339 or Unix timestamp (defaults to current time) **Example Claude prompt:** ``` Use the execute_query tool to check the current value of the 'up' metric. ``` #### `execute_range_query` Executes a PromQL range query to return values over a time period. **Parameters:** - `query`: PromQL query string (required) - `start`: Start time as RFC3339 or Unix timestamp (required) - `end`: End time as RFC3339 or Unix timestamp (required) - `step`: Query resolution step width (e.g., '15s', '1m', '1h') (required) **Example Claude prompt:** ``` Use the execute_range_query tool to show me the CPU usage over the last hour with 5-minute intervals. Use the query 'rate(node_cpu_seconds_total{mode="user"}[5m])'. ``` ### Discovery Tools #### `list_metrics` Retrieves a list of all available metric names. **Example Claude prompt:** ``` Use the list_metrics tool to show me all available metrics in my Prometheus server. ``` #### `get_metric_metadata` Retrieves metadata about a specific metric. **Parameters:** - `metric`: The name of the metric (required) **Example Claude prompt:** ``` Use the get_metric_metadata tool to get information about the 'http_requests_total' metric. ``` #### `get_targets` Retrieves information about all Prometheus scrape targets. **Example Claude prompt:** ``` Use the get_targets tool to check the health of all monitoring targets. ``` ## Example Workflows ### Basic Monitoring Check ``` Can you check if all my monitored services are up? Also, show me the top 5 CPU-consuming pods if we're monitoring Kubernetes. ``` Claude might use: 1. `execute_query` with `up` to check service health 2. `execute_query` with a more complex PromQL query to find CPU usage ### Performance Analysis ``` Analyze the memory usage pattern of my application over the last 24 hours. Are there any concerning spikes? ``` Claude might use: 1. `execute_range_query` with appropriate time parameters 2. Analyze the data for patterns and anomalies ### Metric Exploration ``` I'm not sure what metrics are available. Can you help me discover metrics related to HTTP requests and then show me their current values? ``` Claude might use: 1. `list_metrics` to get all metrics 2. Filter for HTTP-related metrics 3. `get_metric_metadata` to understand what each metric represents 4. `execute_query` to fetch current values ## Tips for Effective Use 1. **Be specific about time ranges** when asking for historical data 2. **Specify step intervals** appropriate to your time range (e.g., use smaller steps for shorter periods) 3. **Use metric discovery tools** if you're unsure what metrics are available 4. **Start with simple queries** and gradually build more complex ones 5. **Ask for explanations** if you don't understand the returned data ## PromQL Query Examples Here are some useful PromQL queries you can use with the tools: ### Basic Queries - Check if targets are up: `up` - HTTP request rate: `rate(http_requests_total[5m])` - CPU usage: `sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)` - Memory usage: `node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes` ### Kubernetes-specific Queries - Pod CPU usage: `sum(rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[5m])) by (pod)` - Pod memory usage: `sum(container_memory_working_set_bytes{container!="POD",container!=""}) by (pod)` - Pod restart count: `kube_pod_container_status_restarts_total` ## Limitations - The MCP server queries your live Prometheus instance, so it only has access to metrics retained in your Prometheus server's storage - Complex PromQL queries might take longer to execute, especially over large time ranges - Authentication is passed through from your environment variables, so ensure you're using credentials with appropriate access rights