Skip to main content
Glama
ThoTischner

observability-mcp

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
LOKI_URLNoURL of Loki server (e.g., http://localhost:3100).
GRAFANA_TOKENNoGrafana Cloud API token for basic auth.
PROMETHEUS_URLNoURL of Prometheus server (e.g., http://localhost:9090).
GRAFANA_LOKI_USERNoGrafana Cloud Loki instance ID (numeric) for basic auth.
GRAFANA_PROM_USERNoGrafana Cloud Prometheus instance ID (numeric) for basic auth.

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
list_sourcesA

List the configured observability backends (Prometheus, Loki, and any connector) and whether each is currently reachable. When to use: call this first to learn which source names exist and are healthy before passing source to other tools, or to debug why a query returns no data. Behavior: read-only, no side effects. Returns one entry per source with its name, type, configured URL, signal types (metrics/logs), and a live up/down status. Never throws for an unreachable backend — the backend is reported as down instead. Related: use list_services to see what is monitored within these sources.

list_servicesA

Discover the service names that can be queried, aggregated across every connected backend. When to use: call this before query_metrics, query_logs, or get_service_health to obtain the exact, case-sensitive service name those tools require. Behavior: read-only, no side effects. Returns one entry per service with the service name, the source(s) it was discovered in, and which signals are available for it (metrics, logs, or both). Related: list_sources for backend health; get_service_health for a per-service overview.

query_metricsA

Fetch the raw time-series for ONE metric of ONE service over a look-back window, returned together with pre-computed summary statistics. When to use: when you need the actual numeric values or the trend of a known metric. For a 'is this service OK?' verdict use get_service_health; to find which services are misbehaving use detect_anomalies. Prerequisites: get the exact service name from list_services and choose a metric from the list at the end of this description. Behavior: read-only, no side effects. Returns an ordered array of {timestamp, value} points plus a summary {current, average, min, max, trend}. With groupBy set, returns one labelled series per distinct label value under groups instead of a single aggregated series. Units depend on the metric (e.g. CPU as %, latency as ms, rates as per-second). An unknown service/metric or an unreachable backend yields a structured explanatory error, never an exception. Available metrics: No metrics sources configured.

query_logsA

Fetch recent log entries for ONE service over a look-back window, with a pre-computed summary (error/warning counts and the most frequent error patterns). When to use: to inspect what a service actually logged, or to investigate an error spike surfaced by detect_anomalies / get_service_health. For numeric metrics use query_metrics instead. Prerequisites: get the exact service name from list_services (the service must expose a logs signal). Behavior: read-only, no side effects. Returns the matching log entries (newest first, capped by limit) plus a summary with total/error/warn counts and top recurring error patterns. No matches yields an empty result with a zeroed summary; an unreachable backend yields a structured explanatory error, never an exception.

get_service_healthA

Produce a single aggregated health verdict for ONE service by combining its metrics and logs. When to use: the fastest way to answer 'is this service healthy right now and why?'. Use query_metrics/query_logs to drill into the underlying numbers, or detect_anomalies to scan many services at once. Prerequisites: get the exact service name from list_services. Behavior: read-only, no side effects. Returns a weighted health score (0–100), a status of healthy | degraded | critical, the key contributing metrics, a log error summary, detected anomalies, and cross-signal correlations explaining the score. A service with no data yields an explanatory result rather than an exception.

detect_anomaliesA

Scan one or all monitored services for abnormal behavior and return the findings ranked by severity. When to use: the entry point for 'is anything wrong anywhere?' triage. Once a service is flagged, follow up with get_service_health for the verdict or query_metrics/query_logs for the raw evidence. Behavior: read-only, no side effects. Applies z-score analysis to metrics, detects log error-rate spikes, and correlates the two. Returns a list of anomalies, each with the affected service, metric/signal, severity, the deviation (e.g. σ and % change), and a short explanation. No anomalies yields an empty list, not an error. Related: get_service_health (single-service verdict), query_metrics (raw series behind a flagged metric).

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ThoTischner/observability-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server