Skip to main content
Glama
fazalrshah

Triage MCP Server

by fazalrshah

๐Ÿฉบ Triage โ€” a self-healing ops MCP for any Dockerized service

Let an AI agent (or a human) check, diagnose, and recover a service โ€” without a host shell.

Most "give the agent ops powers" setups are bad: you either hand the model a raw shell (now it can roam the whole box and conflate unrelated subsystems), or you wire up dashboards a model can't read. Triage is the third option:

A small MCP server that exposes a handful of health/diagnose/recover tools. Each returns raw evidence AND a plain-English translation, a suggested action, and whether the fix is safe to auto-apply. The agent acts through tools โ€” it never touches the host directly.

The policy that makes it safe

Class

Tools

Behaviour

Auto-fix safe

triage_restart_process, triage_recover

An agent may run these on its own and report after. Infra only โ€” no data touched.

Ask before risky

triage_apply(confirm=true)

Anything that could lose data / change external state. Dry-run unless confirm=true.

Can't self-fix

(reported)

Diagnosed and handed to the human with exact steps โ€” never faked.

The dual raw + layman output is the differentiator: the agent gets structured data to act on, and the human gets a sentence they can actually understand ("Postiz's API engine isn't running โ€” the known cold-boot hiccup. I'll restart it.").

Related MCP server: Docker Manager MCP

Tools

Tool

Kind

What it does

triage_health()

read

Containers + configured in-container processes + optional dependency ping.

triage_diagnose()

read

Health check matched to a runbook โ†’ issues with raw + plain-English + action + can_auto_fix.

triage_logs(lines)

read

Raw service log tail.

triage_restart_process(name)

safe

Restart one in-container process (pm2).

triage_recover()

safe

Recreate the service container from compose. No volumes/data touched.

triage_apply(confirm)

risky

Dry-run by default; runs the configured risky command only on confirm=true.

Configure (zero code changes)

Everything is env-driven โ€” point it at any compose-managed service:

TRIAGE_COMPOSE=/path/to/docker-compose.yaml   # compose file
TRIAGE_SERVICE=app                            # the main container/service name
TRIAGE_LABEL="My App"                         # friendly name used in messages
TRIAGE_PROCS=backend,worker                   # optional: in-container processes to watch
TRIAGE_PROC_MGR=pm2                           # "pm2" | "none"
TRIAGE_DB_PING="docker exec app-db pg_isready"  # optional: rc 0 = dependency healthy
TRIAGE_RISKY_CMD=""                           # optional: a guarded recovery (clear a queue, etc.)
TRIAGE_RISKY_DESC="clear the stuck job queue"
TRIAGE_PORT=9500

See .env.example.

Run

pip install -r requirements.txt
python3 triage.py            # serves an MCP over streamable-http on TRIAGE_PORT

Register it with your agent runtime (any MCP client). For an always-on host service, use the included launchd template com.triage.ops.plist (macOS) โ€” adapt to systemd on Linux.

Hard-won lessons baked in

  • Agents in a container can't see host processes. Give them status tools, not a shell. With shell access a model conflates unrelated subsystems and reports false negatives. Tools keep it honest.

  • Two reports, always. Structured raw for the agent to branch on; a one-sentence layman for the human. A health check the human can't read is half a tool.

  • Encode the safe/risky boundary in the tool, not the prompt. "Don't clear the queue without asking" in a system prompt is a suggestion; a confirm=true-gated dry-run is a guarantee.

  • docker compose ps --format json varies by version (NDJSON vs single array) โ€” handle both.

  • Recover โ‰  restart. A dead process needs a restart; an unhealthy container needs a recreate. Separate tools so the agent escalates correctly.

Built by

Built by KodeKing ยท author Fazal Shah. We build local, private, multi-agent AI systems for teams who can't send their data to the cloud. Issues and PRs welcome.

License

MIT โ€” see LICENSE.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

โ€“Maintainers
โ€“Response time
โ€“Release cycle
โ€“Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/fazalrshah/triage'

If you have feedback or need assistance with the MCP directory API, please join our Discord server