Skip to main content
Glama

npm version License: MIT Node ≥18 alimuratkuslu/byok-observability-mcp MCP server


Bring Your Own Keys — credentials stay in env vars on your machine. No clone, no build, runs via npx.

Partial setup — configure only the backends you use. Tools for unconfigured backends are never exposed.


How it works


⚡ Quick Start

Run once, answer a few questions, get a ready-made .mcp.json:

npx byok-observability-mcp --init

The wizard will:

  • Let you pick which backends to configure

  • Ask for credentials per service

  • Test connectivity with your real endpoints before writing anything

  • Write .mcp.json to your project root or ~/.claude/ — your choice

Then just start Claude Code:

claude
TIP

That's it. No clone, no build, no env file. Works in under 60 seconds.


Option B — Manual .mcp.json

Create .mcp.json in your project root. Include only the backends you need.

{
  "mcpServers": {
    "observability-mcp": {
      "command": "npx",
      "args": ["-y", "byok-observability-mcp"],
      "env": {
        "GRAFANA_URL":    "https://grafana.mycompany.internal",
        "GRAFANA_TOKEN":  "glsa_...",
        "PROMETHEUS_URL": "https://prometheus.mycompany.internal",
        "KAFKA_UI_URL":   "https://kafka-ui.mycompany.internal",
        "DD_API_KEY":     "your-datadog-api-key",
        "DD_APP_KEY":     "your-datadog-app-key"
      }
    }
  }
}

Credentials in git? Use the ${VAR} approach instead — see Configuration → Method B.

Start Claude Code:

claude

Claude Code reads .mcp.json automatically. No claude mcp add, no build step.

Verify by asking Claude:

What observability tools do you have available?

🧩 Supported clients

Client

Configuration

Claude Code

.mcp.json in project root (recommended) or claude mcp add CLI

OpenAI Codex CLI

.mcp.json in project root — same format as Claude Code

Both clients read .mcp.json automatically. The Quick Start above works for either.

# Same .mcp.json as above works out of the box
codex

Or add via CLI:

codex mcp add --transport stdio observability-mcp -- npx -y byok-observability-mcp

🔧 Available tools

Always available. Checks connectivity across all configured backends.

Tool

Description

obs_health_check

Unified Health Check. Runs a parallel check on all backends and returns a status table.

Enabled when GRAFANA_URL + GRAFANA_TOKEN are set.

Tool

Description

grafana_health

Check connectivity, version, and database status

grafana_list_datasources

List all datasources (name, type, UID)

grafana_query_metrics

Run a PromQL expression via a Grafana datasource

grafana_list_dashboards

Search and list dashboards by name or tag

grafana_get_dashboard

Get panels and metadata for a dashboard by UID

grafana_list_alerts

List active alerts from Alertmanager (firing/pending)

grafana_get_alert_rules

List all configured alert rules across all folders

Enabled when PROMETHEUS_URL is set.

Tool

Description

prometheus_health

Check connectivity

prometheus_query

Instant PromQL query — current value of a metric

prometheus_query_range

Range PromQL query — metric values over time

prometheus_list_metrics

List all available metric names

prometheus_metric_metadata

Get help text and type for a specific metric

Enabled when KAFKA_UI_URL is set.

Tool

Description

kafka_list_clusters

List configured Kafka clusters and their status

kafka_list_topics

List topics in a cluster

kafka_describe_topic

Get partition count, replication factor, and config

kafka_list_consumer_groups

List consumer groups and their state

kafka_consumer_group_lag

Get per-partition lag for a consumer group

kafka_broker_health

Broker count and disk usage per broker

Enabled when both DD_API_KEY and DD_APP_KEY are set. Proxies the official Datadog MCP server.

Default toolsets: core, apm, alerting. Set DD_TOOLSETS=all to load everything.

Toolset

Covers

core

Metrics, dashboards, monitors, infrastructure

apm

APM services, traces, service map

alerting

Monitors, downtimes, alerts

logs

Log search and analytics

incidents

Incident management

ddsql

SQL-style metric queries

security

Cloud security posture

synthetics

Synthetic test results

networks

Network performance monitoring

dbm

Database monitoring

software-delivery

CI/CD pipelines

llm-obs

LLM observability

cases

Case management

feature-flags

Feature flag tracking


🔑 Getting credentials

  1. Open Grafana → AdministrationUsers and accessService accounts

  2. Click Add service account → set Role to ViewerCreate

  3. On the service account page → Add service account tokenGenerate token

  4. Copy the token (starts with glsa_) — you won't see it again

GRAFANA_URL=https://grafana.mycompany.internal
GRAFANA_TOKEN=glsa_xxxxxxxxxxxxxxxx

If your Grafana uses a self-signed certificate:

GRAFANA_VERIFY_SSL=false

If Prometheus has no authentication:

PROMETHEUS_URL=https://prometheus.mycompany.internal

If Prometheus uses basic auth:

PROMETHEUS_URL=https://prometheus.mycompany.internal
PROMETHEUS_USERNAME=your-username
PROMETHEUS_PASSWORD=your-password

If Kafka UI has no authentication:

KAFKA_UI_URL=https://kafka-ui.mycompany.internal

If Kafka UI requires a login:

KAFKA_UI_URL=https://kafka-ui.mycompany.internal
KAFKA_UI_USERNAME=admin
KAFKA_UI_PASSWORD=your-password

API key: Datadog → Organization SettingsAPI Keys → New Key

Application key: Datadog → Organization SettingsApplication Keys → New Key

DD_SITE — match your Datadog login URL:

Login URL

DD_SITE

app.datadoghq.com

datadoghq.com (default)

app.us3.datadoghq.com

us3.datadoghq.com

app.us5.datadoghq.com

us5.datadoghq.com

app.datadoghq.eu

datadoghq.eu

app.ap1.datadoghq.com

ap1.datadoghq.com

DD_API_KEY=your-api-key
DD_APP_KEY=your-application-key
DD_SITE=datadoghq.com
DD_TOOLSETS=core,apm,alerting

⚙️ Configuration

Method A — Values directly in .mcp.json (simplest)

Put credentials directly in .mcp.json. Works everywhere, no extra steps.

Add .mcp.json to your .gitignore if the repo is shared.

Use ${VAR} placeholders in .mcp.json and put real values in .env.

.mcp.json (safe to commit — contains no secrets):

{
  "mcpServers": {
    "observability-mcp": {
      "command": "npx",
      "args": ["-y", "byok-observability-mcp"],
      "env": {
        "GRAFANA_URL":    "${GRAFANA_URL}",
        "GRAFANA_TOKEN":  "${GRAFANA_TOKEN}",
        "PROMETHEUS_URL": "${PROMETHEUS_URL}",
        "KAFKA_UI_URL":   "${KAFKA_UI_URL}",
        "DD_API_KEY":     "${DD_API_KEY}",
        "DD_APP_KEY":     "${DD_APP_KEY}"
      }
    }
  }
}

.env (add to .gitignore):

GRAFANA_URL=https://grafana.mycompany.internal
GRAFANA_TOKEN=glsa_...

Start Claude with the env loaded:

set -a && source .env && set +a && claude

A ready-made helper script is included:

./scripts/run-claude-with-env.sh

A template .mcp.json with all variables is available as .mcp.json.example.

Add to ~/.claude.json:

{
  "mcpServers": {
    "observability-mcp": {
      "command": "npx",
      "args": ["-y", "byok-observability-mcp"],
      "env": {
        "GRAFANA_URL":   "https://grafana.mycompany.internal",
        "GRAFANA_TOKEN": "glsa_..."
      }
    }
  }
}

📋 Environment variables

Variable

Backend

Required

Description

GRAFANA_URL

Grafana

Base URL of your Grafana instance

GRAFANA_TOKEN

Grafana

Service account token (Viewer role)

GRAFANA_VERIFY_SSL

Grafana

Set to false to skip TLS verification

PROMETHEUS_URL

Prometheus

Base URL of your Prometheus instance

PROMETHEUS_USERNAME

Prometheus

Basic auth username

PROMETHEUS_PASSWORD

Prometheus

Basic auth password

KAFKA_UI_URL

Kafka UI

Base URL of your Kafka UI instance

KAFKA_UI_USERNAME

Kafka UI

Login username

KAFKA_UI_PASSWORD

Kafka UI

Login password

DD_API_KEY

Datadog

Datadog API key

DD_APP_KEY

Datadog

Datadog Application key

DD_SITE

Datadog

Datadog site (default: datadoghq.com)

DD_TOOLSETS

Datadog

Tool groups to load (default: core,apm,alerting)

SLACK_WEBHOOK_URL

Reports

✅*

Slack Incoming Webhook URL for scheduled reports

REPORT_BACKENDS

Reports

Comma-separated backends to include in reports (default: all configured)


📊 Scheduled Reports

Send an automated observability digest to Slack on a schedule — no Claude or Codex instance needs to be running.

How it works

cron / launchd
     │  fires every N minutes
     ▼
npx byok-observability-mcp --report
     │
     │  reads env vars, connects directly to backends
     ▼
Grafana · Prometheus · Kafka UI
     │
     │  categorizes findings → P0 / P1 / P2 / P3
     ▼
Slack Incoming Webhook  →  #your-channel

The command collects data, categorizes every finding by severity, formats a Slack message, sends it, and exits. It is completely stateless.

Severity levels

Level

Meaning

Examples

🔴 P0 — KRİTİK

Service down or unreachable

Grafana alert firing (critical), Kafka cluster offline, backend unreachable

🟠 P1 — YÜKSEK

Degraded, action needed soon

Grafana alert firing (non-critical), Kafka consumer lag > 10 000

🟡 P2 — ORTA

Warning, monitor closely

Grafana alert pending, Kafka consumer lag > 1 000

🟢 P3 — BİLGİ

Informational, all normal

Healthy backends, silenced alerts

Setup

Step 1 — Get a Slack Incoming Webhook URL

  1. Go to api.slack.com/appsCreate New AppFrom scratch

  2. Incoming Webhooks → toggle on → Add New Webhook to Workspace

  3. Pick a channel → Allow → copy the Webhook URL

Step 2 — Set environment variables

export SLACK_WEBHOOK_URL=https://hooks.slack.com/services/XXX/YYY/ZZZ

# Optional: restrict which backends are included (default: all configured)
export REPORT_BACKENDS=grafana,prometheus,kafka

Step 3 — Run a one-off report to verify

npx byok-observability-mcp --report

You should see a message in your Slack channel within seconds.

Step 4 — Schedule with cron

Open your crontab:

crontab -e

Add a line. Examples:

# Every hour at minute 0
0 * * * * SLACK_WEBHOOK_URL=https://hooks.slack.com/... GRAFANA_URL=... GRAFANA_TOKEN=... npx byok-observability-mcp --report >> /tmp/obs-report.log 2>&1

# Every 30 minutes
*/30 * * * * SLACK_WEBHOOK_URL=https://hooks.slack.com/... npx byok-observability-mcp --report >> /tmp/obs-report.log 2>&1
TIP

Put all env vars in a.env file and source it inside the cron command to keep the crontab clean:

0 * * * * bash -c 'source /path/to/.env && npx byok-observability-mcp --report' >> /tmp/obs-report.log 2>&1

Alternative: macOS launchd (runs on login, survives reboots)

Create ~/Library/LaunchAgents/com.observability-mcp.report.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.observability-mcp.report</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/npx</string>
    <string>byok-observability-mcp</string>
    <string>--report</string>
  </array>
  <key>EnvironmentVariables</key>
  <dict>
    <key>SLACK_WEBHOOK_URL</key>
    <string>https://hooks.slack.com/services/XXX/YYY/ZZZ</string>
    <key>GRAFANA_URL</key>
    <string>https://grafana.mycompany.internal</string>
    <key>GRAFANA_TOKEN</key>
    <string>glsa_...</string>
  </dict>
  <key>StartInterval</key>
  <integer>3600</integer>
  <key>StandardOutPath</key>
  <string>/tmp/obs-report.log</string>
  <key>StandardErrorPath</key>
  <string>/tmp/obs-report.log</string>
</dict>
</plist>

Load it:

launchctl load ~/Library/LaunchAgents/com.observability-mcp.report.plist

To stop: launchctl unload ~/Library/LaunchAgents/com.observability-mcp.report.plist


💬 Example prompts

Single-backend queries

Backend

Try asking Claude...

Grafana

"List all datasources and tell me which ones are Prometheus type."

Grafana

"Search for dashboards related to 'kubernetes' — list names and UIDs."

Grafana

Prometheus

Prometheus

Prometheus

Kafka UI

"List all Kafka clusters. Are there any with offline brokers?"

Kafka UI

"Describe the topic 'orders' in cluster 'production' — partitions and replication factor?"

Kafka UI

"Check consumer lag for group 'order-processor'. Which partitions have the highest lag?"

Datadog

"List all Datadog monitors currently in Alert state."

Datadog

"Show APM service performance for the past hour. Which services have the highest error rate?"

Datadog

🛠️ Incident Response (v0.2.0+)

Goal

Try asking Claude...

Health

"Run a health check on all systems."

Alerts

"Are there any firing alerts in Grafana right now?"

Triage

"Show me the alert rules for the 'Production' folder."

Cross-backend queries

Check the health of all configured observability backends and give me a summary.
I'm seeing high error rates. Check Prometheus for http_requests_total with status=500,
then look for related Datadog monitors that might be alerting.

🔒 Security

NOTE

All tools areread-only. No write operations are performed on any backend.

IMPORTANT

Credentials are read from environment variables andnever logged or sent to Anthropic. Tokens are redacted in all error messages.

  • TLS certificate verification is enabled by default

  • The MCP process runs locally — your infrastructure URLs only reach Claude's context window if you type them into the chat

Least-privilege recommendations:

Backend

Recommended role

Grafana

Service account with Viewer role

Prometheus

Network-level read-only access

Kafka UI

Read-only UI user

Datadog

API key + Application key with read scopes


🛠 Development

git clone https://github.com/alimuratkuslu/byok-observability-mcp
cd byok-observability-mcp
npm install
npm run dev          # run with tsx (no build step)
npm run build        # compile to dist/
npm run typecheck    # TypeScript check without emitting

Tested versions

Backend

Tested version

Grafana

v9.x, v10.x, v11.x

Prometheus

v2.x

Kafka UI

provectus/kafka-ui:v0.7.2


License

MIT

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alimuratkuslu/byok-observability-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server