Skip to main content
Glama
k9fr4n

thruk-mcp

by k9fr4n

thruk-mcp

CI codecov PyPI PyPI downloads License: MIT Python ghcr.io GitHub release

Model Context Protocol (MCP) server for Thruk — the unified web frontend for Naemon, Nagios, Icinga and Shinken.

Expose Thruk's REST API to MCP-compatible clients (Claude Desktop, Dust, LibreChat, OpenWebUI...) so that an LLM can query hosts/services, schedule downtimes, acknowledge problems, force rechecks and more in natural language.

Features

  • Read: hosts, services, hostgroups, servicegroups, downtimes, comments, sites, aggregated stats, current problems

  • Write: schedule/delete downtimes, acknowledge & remove acks, force rechecks

  • Escape hatch: thruk_query tool to call any Thruk REST endpoint

  • Multi-backend support (Thruk federated sites): pass backends="prod,dr" to any tool

  • Two transports: stdio (default) or Streamable-HTTP (--listen <port>)

  • Async httpx client with proper error handling and TLS verification

  • Tested with pytest + respx, linted with ruff, packaged with hatchling

Quick start

1. Configure

cp .env.example .env
$EDITOR .env   # set THRUK_BASE_URL and THRUK_API_KEY

An API key can be created from the Thruk user profile page (requires api_keys_enabled in thruk_local.conf) or via the REST API itself.

2a. Run with Docker

docker compose up -d
# MCP Streamable-HTTP endpoint: http://localhost:8001/mcp

2b. Run locally

pip install thruk-mcp        # or: pipx install thruk-mcp

# stdio mode (for Claude Desktop, LibreChat, etc.)
thruk-mcp

# HTTP mode
thruk-mcp --listen 8001

For local development of the project itself, see CONTRIBUTING.md.

3. Wire it to an MCP client

Claude Desktop (~/.config/Claude/claude_desktop_config.json or macOS equivalent):

{
  "mcpServers": {
    "thruk": {
      "command": "thruk-mcp",
      "env": {
        "THRUK_BASE_URL": "https://monitor.example.com/thruk",
        "THRUK_API_KEY": "xxxxxxxx"
      }
    }
  }
}

4. Use with the Docker MCP Gateway

The image at ghcr.io/k9fr4n/thruk-mcp:latest defaults to stdio transport, so it can be spawned natively by the gateway.

Option A — Private local catalog

# 1. Create your private catalog
docker mcp catalog create thruk-private

# 2. Register this server (catalog/server.yaml ships with the repo)
docker mcp catalog add thruk-private thruk-mcp ./catalog/server.yaml

# 3. Configure credentials & enable
docker mcp secret set thruk-mcp.api_key=YOUR_KEY
docker mcp config write thruk-mcp.base_url=https://monitor.example.com/thruk
docker mcp server enable thruk-mcp

# 4. Run the gateway with your catalog
docker mcp gateway run --catalog thruk-private

Then point any MCP client (Claude Desktop, VS Code, Cursor, ...) at the gateway as documented here.

Option B — Submit upstream

catalog/server.yaml, catalog/tools.json and catalog/readme.md follow the docker/mcp-registry schema and can be submitted to the official Docker MCP Catalog via PR.

What's exposed

57 MCP Tools

Read — state thruk_list_hosts, thruk_get_host, thruk_list_services, thruk_get_service, thruk_list_hostgroups, thruk_list_servicegroups, thruk_list_contacts, thruk_get_contact, thruk_problems, thruk_stats, thruk_totals (compact 16-field host+service totals, faster than thruk_stats), thruk_sites.

Read — history & comments thruk_list_logs, thruk_list_alerts, thruk_list_notifications, thruk_notification_summary (notifications grouped by contact/host/service/state/command), thruk_recent_events, thruk_list_comments, thruk_list_downtimes, thruk_get_downtime.

Read — noise & flap analysis thruk_top_noisy_hosts (hosts ranked by alert count over a window), thruk_top_noisy_services (services ranked by alert count), thruk_flap_summary (hosts/services ranked by state transition count).

Read — problem intelligence thruk_oldest_problems (unhandled problems sorted by age, oldest first), thruk_unacked_critical (CRITICAL/DOWN not acknowledged for > N minutes), thruk_stale_acks (acknowledgements older than N days — forgotten problems), thruk_problem_counts (flat aggregate of unhealthy-state counts, filterable by hostgroup, custom vars or any structured filter — replaces the former thruk_problems_by_hostgroup), thruk_stale_checks (surface checks that stopped running — the dangerous "false green").

Read — analytics thruk_alert_heatmap (alert counts bucketed by time, useful for spotting recurring patterns), thruk_notification_heatmap (notification counts bucketed by time — spot mail/paging storms), thruk_concurrent_failures (windows where multiple hosts failed simultaneously), thruk_recurring_problems (hosts/services generating repeated alerts over a window).

Read — availability / SLA thruk_host_availability (uptime % for a single host — time_up_percent, time_down_percent, time_unreachable_percent and scheduled equivalents), thruk_service_availability (ok/warning/critical/unknown % for a single service), thruk_hostgroup_availability (availability for all hosts or services in a hostgroup, sorted worst-first; type = hosts | services | both). All three accept since/until (Thruk relative or ISO) or a timeperiod shortcut (lastmonth, thismonth, last24hours, lastweek, …). thruk_reliability_report (per host/service reliability metrics — MTTR / MTBF / incident counts — derived from the log over a window).

Read — performance data thruk_get_perfdata (fetch and parse performance data for a single host or service), thruk_perfdata_snapshot (parsed perfdata for every service matching a filter, in one call), thruk_perfdata_near_threshold (metrics within within_percent % of breaching their warn/crit range — early-warning signal before an alert fires).

Write — downtime management thruk_schedule_downtime (host/service), thruk_schedule_host_services_downtime (all services of a host), thruk_schedule_propagated_host_downtime (parent+children), thruk_schedule_hostgroup_downtime, thruk_schedule_servicegroup_downtime, thruk_delete_downtime, thruk_delete_active_downtimes, thruk_delete_downtimes_by_filter.

Write — problem handling thruk_acknowledge, thruk_bulk_acknowledge (acknowledge multiple hosts/services in one call), thruk_remove_acknowledgement, thruk_recheck, thruk_add_comment, thruk_delete_comment, thruk_checks (enable/disable active checks for a host or service), thruk_notifications (enable/disable host or service notifications, with optional cascade to all services of a host).

Escape hatches thruk_query (raw call to any REST endpoint), thruk_run_background_query (long-running endpoint via Thruk's ?background=1 mechanism with automatic job polling).

All list-style tools share a consistent limit / offset / sort / columns contract. By default they return a tight subset of columns (~10 fields per row) to keep LLM token consumption low. Pass columns="" to opt out and receive every column the Thruk row contains.

5 MCP Resources

URI templates that MCP clients with a resource browser (Claude Desktop, VS Code, ...) can "open" like files:

URI

Content

thruk://hosts/{name}

Full host JSON

thruk://services/{host}/{service}

Full service JSON

thruk://hostgroups/{name}

Host group config + members

thruk://problems

Current unhandled problems (hosts + services)

thruk://stats

Aggregated host/service stats (cached)

3 MCP Prompts

Pre-canned workflows the user can invoke as a slash-command in the MCP client UI:

Prompt

Arguments

Purpose

investigate_alert

host, optional service

7-step incident triage

schedule_maintenance

target, duration_minutes, kind

Safe downtime workflow with confirmation

diagnose_flapping

host, service

Root-cause a flapping service (uses thruk_flap_summary)

Robustness

  • Connection retrieshttpx.AsyncHTTPTransport(retries=3) handles DNS failures, connection refusals, TLS handshakes.

  • HTTP retries with backoff — 5xx and 429 responses are retried up to 3 times with exponential backoff + jitter (cap 5 s).

  • Opt-in TTL cache — slow-moving endpoints (/sites, /processinfo, /hosts/stats, /services/stats, /contacts, /timeperiods, ...) are cached in-process for 15 s. Any tool can request caching via cache_ttl= on the underlying client. This absorbs the burst of identical calls an LLM agent typically issues across a multi-tool turn.

  • Pagination helperThrukClient.get_all() is an async generator that iterates pages of 500 rows up to a configurable hard limit (default 50 000), so internal callers can scan entire backends without manual offset math.

  • Long-running queries — the thruk_run_background_query tool wraps Thruk's ?background=1 flow and polls /thruk/jobs/<id>/output until the job completes (5 min default timeout).

Environment variables

Connection

Variable

Default

Description

THRUK_BASE_URL

http://localhost/thruk

Thruk URL (no trailing slash)

THRUK_API_KEY

(required)

X-Thruk-Auth-Key header

THRUK_AUTH_USER

Impersonation user (superuser key only)

THRUK_VERIFY_SSL

true

Set false for self-signed certs

THRUK_TIMEOUT

30

HTTP timeout in seconds

THRUK_DEFAULT_BACKENDS

CSV of default backend names (federated Thruk)

Security / multi-tenant (v0.6)

Variable

Default

Description

THRUK_READ_ONLY

false

Strip every write tool (ack, downtime, recheck, ...)

THRUK_ENABLED_TOOLS

Allowlist of tool names. CSV with fnmatch wildcards. Empty = all

THRUK_AUDIT_LOG

true

Emit one JSON audit line on stderr per write tool invocation

THRUK_MAX_CONCURRENT

0

Cap of concurrent in-flight HTTP requests. 0 = unlimited

Security

  • Read-only mode — set THRUK_READ_ONLY=true to remove every write tool (thruk_acknowledge, thruk_schedule_*_downtime, thruk_recheck, thruk_delete_*, thruk_run_background_query) from the MCP server. The LLM literally cannot mutate monitoring state. Use this for general-purpose agents that should only observe.

  • Tool allowlistTHRUK_ENABLED_TOOLS=thruk_list_*,thruk_problems,thruk_stats restricts the exposed surface to the listed tools (fnmatch wildcards supported). Useful when fronting multiple LLM clients with the same gateway but different scopes.

  • Audit log — every write tool invocation emits one JSON line on thruk_mcp.audit (stderr by default):

    {"ts":"2026-05-17T22:00:00+00:00","tool":"thruk_acknowledge","user":"alice",
     "args":{"host":"srv01","comment":"investigating"},"target":"srv01","status":"ok"}

    Disable with THRUK_AUDIT_LOG=false. Sensitive keys (api_key, password, token) are redacted as *** before logging.

  • Rate limitTHRUK_MAX_CONCURRENT=8 caps in-flight HTTP requests with an asyncio.Semaphore. Combined with the v0.3 TTL cache, this protects the Thruk core from an LLM that loops on tools or chains them aggressively.

Development

pip install -e ".[dev]"
pre-commit install                              # one-time setup of git hooks

ruff check src tests && ruff format src tests   # lint + format
mypy src                                        # type-check
pytest -v --cov=thruk_mcp --cov-fail-under=80   # tests with coverage gate

Conventions:

  • Conventional Commits (feat:, fix:, chore:, docs:, refactor:, test:).

  • No direct push to main: branch → PR → squash merge.

  • Any new tool must come with a respx-mocked unit test in tests/test_tools.py and an entry in catalog/tools.json (Docker MCP Registry contract).

  • CI gate: ruff, ruff format --check, mypy, pytest with 80 % coverage minimum.

References

Project docs

  • CHANGELOG.md — what changed in each release.

  • UPGRADING.md — per-version migration notes.

  • SUPPORT.md — supported Python / Thruk / MCP-client versions, security policy, release cadence.

  • CONTRIBUTING.md — dev setup, PR conventions, tool / env-var contribution checklists.

License

MIT — see LICENSE.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
5hResponse time
0dRelease cycle
33Releases (12mo)
Commit activity
Issues opened vs closed

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/k9fr4n/thruk-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server