observability-mcp
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| LOKI_URL | No | URL of Loki server (e.g., http://localhost:3100). | |
| GRAFANA_TOKEN | No | Grafana Cloud API token for basic auth. | |
| PROMETHEUS_URL | No | URL of Prometheus server (e.g., http://localhost:9090). | |
| GRAFANA_LOKI_USER | No | Grafana Cloud Loki instance ID (numeric) for basic auth. | |
| GRAFANA_PROM_USER | No | Grafana Cloud Prometheus instance ID (numeric) for basic auth. |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| prompts | {
"listChanged": true
} |
| resources | {
"listChanged": true
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| list_sourcesA | List the configured observability backends (Prometheus, Loki, and any connector) and whether each is currently reachable. When to use: call this first to learn which source names exist and are healthy before passing |
| list_servicesA | Discover the service names that can be queried, aggregated across every connected backend. When to use: call this before |
| query_metricsA | Fetch the raw time-series for ONE metric of ONE service over a look-back window, returned together with pre-computed summary statistics. When to use: when you need the actual numeric values or the trend of a known metric. For a 'is this service OK?' verdict use |
| query_logsA | Fetch recent log entries for ONE service over a look-back window, with a pre-computed summary (error/warning counts and the most frequent error patterns). When to use: to inspect what a service actually logged, or to investigate an error spike surfaced by |
| get_anomaly_historyA | Replay historical anomaly scores for a service from the TSDB the gateway writes to (omcp_anomaly_score series). When to use: post-mortem reconstruction, trend analysis on detector noise, or pulling context for the LLM when an incident is reviewed after the fact. Prerequisites: the operator must have OMCP_ANOMALY_HISTORY_REMOTE_WRITE configured AND a Prometheus source pointed at the same TSDB so the round-trip closes. Behavior: read-only. Returns the time-series of scores. Empty result means either no anomalies in the window or history is disabled. Related: |
| generate_postmortemA | Stitch the gateway's primitives (anomaly history, blast-radius, traces, log highlights) into a single markdown post-mortem report for one service over a given window. When to use: after an incident, when the operator or LLM wants 'one document the on-call can read in 60 seconds' instead of poking the individual tools. Prerequisites: anomaly history requires OMCP_ANOMALY_HISTORY_REMOTE_WRITE + a Prometheus source. Traces require Tempo / Jaeger. Blast-radius requires a topology provider. Behavior: read-only. Returns markdown by default; pass |
| query_tracesA | Query distributed traces for a service over a given timeframe. Returns ranked trace summaries (duration, span count, error status) with a p50/p95 aggregate across the returned set. When to use: investigate tail-latency outliers, walk call chains across services for a specific time window, or pull traces related to an anomaly that the metric/log tools surfaced first. Prerequisites: get the exact service name from |
| get_service_healthA | Produce a single aggregated health verdict for ONE service by combining its metrics and logs. When to use: the fastest way to answer 'is this service healthy right now and why?'. Use |
| detect_anomaliesA | Scan one or all monitored services for abnormal behavior and return the findings ranked by severity. When to use: the entry point for 'is anything wrong anywhere?' triage. Once a service is flagged, follow up with |
| get_topologyA | Return the infrastructure topology graph (Resources and Edges) from every topology-capable connector. When to use: when an agent needs to reason about which workload runs on which host, who owns whom, or which scope (namespace/project/folder) a resource belongs to. Pair with |
| get_blast_radiusA | Given a resource, return who else fails if its underlying host(s) fail. When to use: cross-cutting RCA — when several services degrade together and you suspect a shared host. Works for any RUNS_ON relationship: pod→node, vm→hypervisor, container→host. Behavior: read-only, no side effects. Resolves |
| enrich_ipsA | Resolve a batch of IPv4 or IPv6 addresses to geo (country/city), ASN/org, and a hosting/proxy flag. When to use: answering 'where are these visitors from?' or 'which of these IPs are bots / datacenter / VPN exit nodes?' over access logs, without an out-of-band geo-API call per IP. Both IPv4 and IPv6 clients are resolved — don't pre-filter v6 out. Behavior: read-only. By default looks each IP up in a LOCAL offline dataset the operator configured (OMCP_IP_ENRICH_FILE) with NO external network call — safe in air-gapped deployments. Optionally, if the operator enabled OMCP_IP_ENRICH_RDAP, IPs the dataset doesn't cover fall back to an online RDAP query (country/org only) and the result carries via:'rdap'; the offline dataset is always preferred. Returns one row per input IP with found=true/false plus any known fields. If neither is configured it returns a clear notice explaining how to enable them. RDAP rate-limits: a row with found=false AND transient:true (error names the cause, e.g. 'rate_limited') is NOT a confirmed negative — the registry throttled or failed the lookup, so the IP may resolve on a later retry or in a smaller batch. Such rows are counted in summary.transient (separate from summary.unmatched) and a top-level |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
| triage-incident | Guided incident triage for one service: health verdict, anomaly scan, blast radius, and the log slice that matters. |
| write-postmortem | Generate and refine a post-incident report for one service over a window. |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
| agent-usage-guide | How to use this gateway effectively as an agent: the proven filter→aggregate→enrich triage recipe, signal-vs-silence behaviours, and the operator flags that unlock optional tools. |
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ThoTischner/observability-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server