Skip to main content
Glama

πŸ›°οΈ AIOps MCP β€” Multi-Agent Incident Intelligence

Production incidents in 10 seconds, not 60 minutes. A drop-in MCP server + dashboard that turns any LLM β€” Claude, Claude Code, ChatGPT, Cursor, Continue β€” into an autonomous incident-response copilot.

MCP Compatible Claude Code Claude Desktop ChatGPT Cursor License: MIT Python


Why AIOps MCP?

Every production incident starts the same way: an engineer opens five tabs at 2 a.m. β€” CloudWatch, Grafana, GitLab, Confluence, the customer DB β€” and spends 40-60 minutes gathering context before they can even begin fixing the problem. That hour costs $1,000-$10,000/minute in lost revenue for a P1.

We built AIOps MCP for engineers who are tired of being the human glue between observability tools. It treats incident investigation the way Slack treats messaging or k8s treats containers β€” as something the platform should handle, not a thing humans should do by hand. Inspired by the way Resolve.ai and pager-replacement tooling are reshaping on-call, but built MCP-native so it speaks the same protocol every modern LLM client already speaks.

Under the hood: six specialized agents, an LLM-driven supervisor, an opinionated synthesis prompt, and a topology engine that knows what depends on what.


What You Get

Capability

Description

πŸ€– 6 specialized agents

Log, Infra, Change, Docs, Impact, Audit β€” run in parallel, not sequence

🧠 MCP-native

Plug into Claude Desktop, Claude Code, Cursor, Continue, or any MCP client over stdio or HTTP

πŸ”Œ Multi-LLM

Claude, GPT, Gemini, local models via OpenRouter β€” pick your brain, we coordinate

πŸ“Š MCP Dashboard

Chat + live agent traces + topology + log viewer in one tab β€” like Claude.ai for incidents

πŸ•ΈοΈ App topology

Interactive service graph with blast-radius propagation for connected-impact analysis

πŸ“Ž Manual + auto logs

Paste, upload, or auto-pull from CloudWatch / Datadog / Splunk / Loki / Grafana

🧾 Full audit trail

Every agent step, LLM prompt, and one-click action logged β€” compliance-ready

🎫 Auto-Jira

Incident, RCA, evidence, action log β€” created and updated by the Audit Agent

πŸš€ One-click actions

Rollback / restart / scale / flag-flip β€” vetted, parameterized, reversible

βš™οΈ 8 env vars total

Production deployment with mocks-by-default β€” no creds, no problem

🐳 Docker-ready

docker compose up and you have the full stack

πŸ” Zero-trust by default

Per-agent secrets, PII scrubbing on LLM prompts, immutable audit log


Two Installation Paths

MCP Plugin (recommended for LLM users)

Self-hosted CLI (for SREs/platform teams)

Best for

Solo engineers wiring it into Claude Code / Claude Desktop / Cursor

Teams running AIOps MCP as shared infrastructure

Install

claude mcp add aiops -- aiops mcp-stdio

pip install -e . then aiops serve

Transport

stdio

HTTP + MCP-over-HTTP + dashboard at :7878

Config

Single .env next to aiops binary

.env + configs/topology.yaml + Docker

Dashboard

Optional (aiops dashboard)

Always on at http://host:7878

Multi-user

Single user

RBAC via Cognito / Okta / OAuth2

Pick based on the team you're solving for. Both paths use the same agent engine.


Quick Start (60 seconds)

git clone https://github.com/<you>/aiops-mcp.git
cd aiops-mcp
cp .env.example .env          # leave it empty for full mock mode
pip install -e .
aiops serve                   # MCP + HTTP + dashboard on :7878

Open http://localhost:7878 and ask: "Why is checkout slow?"

Or just Docker

docker compose up

The Six Agents

Grouped by what they actually do in an incident:

Observe (data gatherers)

Agent

Sources

What it answers

πŸͺ΅ Log Agent

CloudWatch, Datadog, Splunk, ELK, Loki

"What errors fired in the last 30 min?"

πŸ“Š Infra Agent

Grafana, Prometheus, Datadog Metrics, CloudWatch

"Is the DB at 98% connections? Is upstream healthy?"

🚒 Change Agent

GitHub, GitLab, ArgoCD, Jenkins

"Who deployed what, when?"

Reason (context + impact)

Agent

Sources

What it answers

πŸ“š Docs Agent

Bedrock KB / pgvector / Pinecone over runbooks, postmortems, ADRs

"Have we seen this before? What's the runbook?"

πŸ’Έ Impact Agent

DynamoDB, Snowflake, BigQuery, Mixpanel

"Who's affected? How much revenue is at risk?"

Act (close the loop)

Agent

Sources

What it answers

🧾 Audit Agent

Jira, ServiceNow, Linear

"Create the ticket, attach the RCA, link past incidents."


MCP Tools Exposed

Tool

Purpose

investigate_incident

Full multi-agent investigation β€” returns RCA + suggested actions

query_logs

Search logs in CloudWatch / Datadog / Splunk / Loki / ELK

query_metrics

PromQL / Grafana / Datadog Metrics query

attach_log

Manually attach a log blob (paste or upload) to an active investigation

get_topology

Return service dependency graph + health

correlate_impact

Given a service, list downstream impact + affected customers

recent_deploys

List deploys / merges in a window

find_runbook

RAG search over runbooks and past postmortems

create_jira_ticket

Create / update Jira with full RCA

execute_action

One-click remediation (rollback / restart / scale / flag-flip)

Every tool is callable directly from your LLM client β€” no UI required.


The MCP Dashboard

A single-tab web UI inspired by Resolve.ai and Claude.ai for incident response:

Surface

What it does

πŸ’¬ Chat panel

Natural-language conversation with the orchestrator

🧩 Agent trace

Live cards showing each agent's progress, findings, and citations

πŸ•ΈοΈ Topology graph

Interactive node graph; click a service to see blast radius

πŸ“Ž Log dropzone

Paste / upload / fetch logs with timestamp alignment

⏱️ Incident timeline

Every step with timestamps, audit-ready

🎯 Action panel

One-click rollback / scale / flag-flip with explicit confirmation

Live demo (self-host): http://localhost:7878 after aiops serve.


Architecture

            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚  LLM CLIENT (Claude Code / Desktop / ChatGPT / ...)  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚  MCP (stdio or HTTP)
                                     β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚              AIOps MCP SERVER  (:7878)               β”‚
            β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
            β”‚   β”‚            SUPERVISOR ORCHESTRATOR           β”‚   β”‚
            β”‚   β”‚   plans β†’ fans out β†’ synthesizes β†’ audits    β”‚   β”‚
            β”‚   β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜   β”‚
            β”‚      β–Ό         β–Ό         β–Ό        β–Ό        β–Ό         β”‚
            β”‚   β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”        β”‚
            β”‚   β”‚ LOG β”‚ β”‚INFRA β”‚ β”‚CHANGEβ”‚ β”‚ DOCS β”‚ β”‚IMPACTβ”‚        β”‚
            β”‚   β””β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜        β”‚
            β”‚      β”‚       β”‚        β”‚        β”‚        β”‚            β”‚
            β”‚      β–Ό       β–Ό        β–Ό        β–Ό        β–Ό            β”‚
            β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
            β”‚   β”‚   ADAPTERS (mock-by-default, swappable)  β”‚       β”‚
            β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
            β”‚      β”‚       β”‚        β”‚        β”‚        β”‚            β”‚
            β”‚      β–Ό       β–Ό        β–Ό        β–Ό        β–Ό            β”‚
            β”‚   CloudWatch Grafana GitHub  Vector   Snowflake      β”‚
            β”‚   Datadog   Promet. GitLab  pgvector  BigQuery       β”‚
            β”‚   Splunk    Datadog ArgoCD  RunbookKB DynamoDB       β”‚
            β”‚                                                      β”‚
            β”‚                          β–Ό                           β”‚
            β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
            β”‚            β”‚   SYNTHESIS ENGINE      β”‚               β”‚
            β”‚            β”‚   (Claude Opus 4.7)     β”‚               β”‚
            β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
            β”‚                         β–Ό                            β”‚
            β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
            β”‚            β”‚   AUDIT AGENT β†’ Jira    β”‚               β”‚
            β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
                                     β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚     MCP DASHBOARD (web UI)       β”‚
                  β”‚   Chat Β· Trace Β· Topology Β· Logs β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

You pick the model; AIOps MCP handles coordination.


Configuration β€” ~8 env vars total

All config is via environment variables. Defaults work with mock data so you can run it instantly.

Variable

Required

Purpose

ANTHROPIC_API_KEY

for real LLM

Supervisor + Synthesis (Claude Opus 4.7)

AIOPS_PORT

no

HTTP / MCP port β€” default 7878

AIOPS_DATA_DIR

no

SQLite, uploads, topology cache β€” default ./data

AIOPS_MOCK_MODE

no

Auto-on when no integrations set

DATADOG_API_KEY or SPLUNK_TOKEN+SPLUNK_HOST or AWS creds

optional

Pick the log source you have

GRAFANA_URL + GRAFANA_TOKEN

optional

Metrics

GITHUB_TOKEN or GITLAB_TOKEN

optional

Deploys

JIRA_HOST + JIRA_EMAIL + JIRA_TOKEN

optional

Audit ticketing

That's it. See .env.example for the full annotated list.


Plug Into Any LLM Client

Client

Setup

Config file

Claude Desktop

Merge mcpServers block into claude_desktop_config.json

configs/claude-desktop.json

Claude Code

claude mcp add aiops -- aiops mcp-stdio

configs/claude-code.json

ChatGPT (custom GPT)

Point at http://your-host:7878/openapi.json

configs/chatgpt-openapi-stub.json

Cursor

Add to ~/.cursor/mcp.json (same format as Claude Desktop)

configs/claude-desktop.json

Continue.dev

Add to ~/.continue/config.json MCP section

configs/claude-desktop.json

Custom / any HTTP client

POST to :7878/mcp (JSON-RPC 2.0)

n/a

Every tool the dashboard uses is also callable from the LLM client. The dashboard is just another MCP consumer.


With / Without AIOps MCP

Capability

Without

With AIOps MCP

Time to RCA

40–60 min, 5 tabs

~10 sec, one prompt

Investigation cost

1 engineer-hour per P1

1 LLM call

Documentation

Manual Jira write-up after the fact

Auto-generated mid-incident

Knowledge retention

Lost when the senior leaves

Permanent in RAG corpus

On-call escalation reason

"I don't know who deployed what"

Change agent already answered

Impact estimation

Slack the BI team

Impact agent in 2 seconds

Action execution

SSH, kubectl, prayer

One-click, audited, reversible

Connected-impact view

Mental model in someone's head

Live topology graph


Repository Layout

aiops-mcp/
β”œβ”€β”€ README.md                 # this file
β”œβ”€β”€ .env.example              # annotated env var template
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ main.py               # CLI entry: aiops serve | mcp-stdio | dashboard
β”‚   β”œβ”€β”€ mcp_server.py         # MCP protocol (stdio + HTTP)
β”‚   β”œβ”€β”€ api.py                # FastAPI HTTP API + dashboard host
β”‚   β”œβ”€β”€ orchestrator.py       # Supervisor: plans + fans out
β”‚   β”œβ”€β”€ synthesis.py          # Final LLM correlation call
β”‚   β”œβ”€β”€ topology.py           # Service graph + impact propagation
β”‚   β”œβ”€β”€ config.py             # Env loading + mock fallback
β”‚   └── agents/
β”‚       β”œβ”€β”€ base.py
β”‚       β”œβ”€β”€ log_agent.py
β”‚       β”œβ”€β”€ infra_agent.py
β”‚       β”œβ”€β”€ change_agent.py
β”‚       β”œβ”€β”€ docs_agent.py
β”‚       β”œβ”€β”€ impact_agent.py
β”‚       └── audit_agent.py
β”œβ”€β”€ dashboard/
β”‚   └── index.html            # single-page UI (vanilla JS + vis-network)
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ claude-desktop.json
β”‚   β”œβ”€β”€ claude-code.json
β”‚   β”œβ”€β”€ chatgpt-openapi-stub.json
β”‚   └── topology.example.yaml
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ INSTALLATION.md
β”‚   β”œβ”€β”€ INTEGRATIONS.md
β”‚   └── MCP-USAGE.md
└── tests/
    └── test_basic.py

Documentation

When to read

Doc

First-time install on a new host

docs/INSTALLATION.md

Wiring into Claude / ChatGPT / Cursor / Continue / custom

docs/INTEGRATIONS.md

Building your own MCP client against this server

docs/MCP-USAGE.md

Architecture deep-dive (v1 + v2 roadmap)

docs/aiops-architecture.md


License

MIT β€” see LICENSE. Use it, fork it, run it, ship it.


Support

  • πŸ› Issues / RFCs: GitHub Issues

  • πŸ’¬ Discussions: GitHub Discussions

  • 🏒 Enterprise support (multi-region, SLA, custom adapters): open an issue with enterprise label

Built by people who've carried the pager.

A
license - permissive license
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Elvisaryan/aiops-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server