AIOps MCP
Allows querying recent deployments and changes from ArgoCD via the Change Agent.
Enables querying logs and metrics from Datadog through the Log and Infra Agents.
Allows querying recent deployments, merges, and changes from GitHub via the Change Agent.
Allows querying recent deployments, merges, and changes from GitLab via the Change Agent.
Enables querying metrics from Grafana via the Infra Agent.
Allows querying recent deployments from Jenkins via the Change Agent.
Enables creating and updating Jira tickets with full RCA and evidence via the Audit Agent.
Enables creating and updating Linear tickets with full RCA and evidence via the Audit Agent.
Enables querying impact data (affected customers, revenue) from Mixpanel via the Impact Agent.
Enables querying metrics from Prometheus via the Infra Agent.
Enables querying impact data (revenue, affected customers) from Snowflake via the Impact Agent.
Enables querying logs from Splunk via the Log Agent.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@AIOps MCPWhy is checkout slow?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π°οΈ AIOps MCP β Multi-Agent Incident Intelligence
Production incidents in 10 seconds, not 60 minutes. A drop-in MCP server + dashboard that turns any LLM β Claude, Claude Code, ChatGPT, Cursor, Continue β into an autonomous incident-response copilot.
Why AIOps MCP?
Every production incident starts the same way: an engineer opens five tabs at 2 a.m. β CloudWatch, Grafana, GitLab, Confluence, the customer DB β and spends 40-60 minutes gathering context before they can even begin fixing the problem. That hour costs $1,000-$10,000/minute in lost revenue for a P1.
We built AIOps MCP for engineers who are tired of being the human glue between observability tools. It treats incident investigation the way Slack treats messaging or k8s treats containers β as something the platform should handle, not a thing humans should do by hand. Inspired by the way Resolve.ai and pager-replacement tooling are reshaping on-call, but built MCP-native so it speaks the same protocol every modern LLM client already speaks.
Under the hood: six specialized agents, an LLM-driven supervisor, an opinionated synthesis prompt, and a topology engine that knows what depends on what.
What You Get
Capability | Description |
π€ 6 specialized agents | Log, Infra, Change, Docs, Impact, Audit β run in parallel, not sequence |
π§ MCP-native | Plug into Claude Desktop, Claude Code, Cursor, Continue, or any MCP client over stdio or HTTP |
π Multi-LLM | Claude, GPT, Gemini, local models via OpenRouter β pick your brain, we coordinate |
π MCP Dashboard | Chat + live agent traces + topology + log viewer in one tab β like Claude.ai for incidents |
πΈοΈ App topology | Interactive service graph with blast-radius propagation for connected-impact analysis |
π Manual + auto logs | Paste, upload, or auto-pull from CloudWatch / Datadog / Splunk / Loki / Grafana |
π§Ύ Full audit trail | Every agent step, LLM prompt, and one-click action logged β compliance-ready |
π« Auto-Jira | Incident, RCA, evidence, action log β created and updated by the Audit Agent |
π One-click actions | Rollback / restart / scale / flag-flip β vetted, parameterized, reversible |
βοΈ 8 env vars total | Production deployment with mocks-by-default β no creds, no problem |
π³ Docker-ready |
|
π Zero-trust by default | Per-agent secrets, PII scrubbing on LLM prompts, immutable audit log |
Two Installation Paths
MCP Plugin (recommended for LLM users) | Self-hosted CLI (for SREs/platform teams) | |
Best for | Solo engineers wiring it into Claude Code / Claude Desktop / Cursor | Teams running AIOps MCP as shared infrastructure |
Install |
|
|
Transport | stdio | HTTP + MCP-over-HTTP + dashboard at |
Config | Single |
|
Dashboard | Optional ( | Always on at |
Multi-user | Single user | RBAC via Cognito / Okta / OAuth2 |
Pick based on the team you're solving for. Both paths use the same agent engine.
Quick Start (60 seconds)
git clone https://github.com/<you>/aiops-mcp.git
cd aiops-mcp
cp .env.example .env # leave it empty for full mock mode
pip install -e .
aiops serve # MCP + HTTP + dashboard on :7878Open http://localhost:7878 and ask: "Why is checkout slow?"
Or just Docker
docker compose upThe Six Agents
Grouped by what they actually do in an incident:
Observe (data gatherers)
Agent | Sources | What it answers |
πͺ΅ Log Agent | CloudWatch, Datadog, Splunk, ELK, Loki | "What errors fired in the last 30 min?" |
π Infra Agent | Grafana, Prometheus, Datadog Metrics, CloudWatch | "Is the DB at 98% connections? Is upstream healthy?" |
π’ Change Agent | GitHub, GitLab, ArgoCD, Jenkins | "Who deployed what, when?" |
Reason (context + impact)
Agent | Sources | What it answers |
π Docs Agent | Bedrock KB / pgvector / Pinecone over runbooks, postmortems, ADRs | "Have we seen this before? What's the runbook?" |
πΈ Impact Agent | DynamoDB, Snowflake, BigQuery, Mixpanel | "Who's affected? How much revenue is at risk?" |
Act (close the loop)
Agent | Sources | What it answers |
π§Ύ Audit Agent | Jira, ServiceNow, Linear | "Create the ticket, attach the RCA, link past incidents." |
MCP Tools Exposed
Tool | Purpose |
| Full multi-agent investigation β returns RCA + suggested actions |
| Search logs in CloudWatch / Datadog / Splunk / Loki / ELK |
| PromQL / Grafana / Datadog Metrics query |
| Manually attach a log blob (paste or upload) to an active investigation |
| Return service dependency graph + health |
| Given a service, list downstream impact + affected customers |
| List deploys / merges in a window |
| RAG search over runbooks and past postmortems |
| Create / update Jira with full RCA |
| One-click remediation (rollback / restart / scale / flag-flip) |
Every tool is callable directly from your LLM client β no UI required.
The MCP Dashboard
A single-tab web UI inspired by Resolve.ai and Claude.ai for incident response:
Surface | What it does |
π¬ Chat panel | Natural-language conversation with the orchestrator |
π§© Agent trace | Live cards showing each agent's progress, findings, and citations |
πΈοΈ Topology graph | Interactive node graph; click a service to see blast radius |
π Log dropzone | Paste / upload / fetch logs with timestamp alignment |
β±οΈ Incident timeline | Every step with timestamps, audit-ready |
π― Action panel | One-click rollback / scale / flag-flip with explicit confirmation |
Live demo (self-host): http://localhost:7878 after aiops serve.
Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM CLIENT (Claude Code / Desktop / ChatGPT / ...) β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ
β MCP (stdio or HTTP)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AIOps MCP SERVER (:7878) β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SUPERVISOR ORCHESTRATOR β β
β β plans β fans out β synthesizes β audits β β
β ββββ¬ββββββββββ¬ββββββββββ¬βββββββββ¬βββββββββ¬ββββββ β
β βΌ βΌ βΌ βΌ βΌ β
β βββββββ ββββββββ ββββββββ ββββββββ ββββββββ β
β β LOG β βINFRA β βCHANGEβ β DOCS β βIMPACTβ β
β ββββ¬βββ ββββ¬ββββ ββββ¬ββββ ββββ¬ββββ ββββ¬ββββ β
β β β β β β β
β βΌ βΌ βΌ βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββ β
β β ADAPTERS (mock-by-default, swappable) β β
β ββββββββββββββββββββββββββββββββββββββββββββ β
β β β β β β β
β βΌ βΌ βΌ βΌ βΌ β
β CloudWatch Grafana GitHub Vector Snowflake β
β Datadog Promet. GitLab pgvector BigQuery β
β Splunk Datadog ArgoCD RunbookKB DynamoDB β
β β
β βΌ β
β βββββββββββββββββββββββββββ β
β β SYNTHESIS ENGINE β β
β β (Claude Opus 4.7) β β
β ββββββββββββββ¬βββββββββββββ β
β βΌ β
β βββββββββββββββββββββββββββ β
β β AUDIT AGENT β Jira β β
β βββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β MCP DASHBOARD (web UI) β
β Chat Β· Trace Β· Topology Β· Logs β
ββββββββββββββββββββββββββββββββββββYou pick the model; AIOps MCP handles coordination.
Configuration β ~8 env vars total
All config is via environment variables. Defaults work with mock data so you can run it instantly.
Variable | Required | Purpose |
| for real LLM | Supervisor + Synthesis (Claude Opus 4.7) |
| no | HTTP / MCP port β default |
| no | SQLite, uploads, topology cache β default |
| no | Auto-on when no integrations set |
| optional | Pick the log source you have |
| optional | Metrics |
| optional | Deploys |
| optional | Audit ticketing |
That's it. See .env.example for the full annotated list.
Plug Into Any LLM Client
Client | Setup | Config file |
Claude Desktop | Merge |
|
Claude Code |
|
|
ChatGPT (custom GPT) | Point at |
|
Cursor | Add to |
|
Continue.dev | Add to |
|
Custom / any HTTP client | POST to | n/a |
Every tool the dashboard uses is also callable from the LLM client. The dashboard is just another MCP consumer.
With / Without AIOps MCP
Capability | Without | With AIOps MCP |
Time to RCA | 40β60 min, 5 tabs | ~10 sec, one prompt |
Investigation cost | 1 engineer-hour per P1 | 1 LLM call |
Documentation | Manual Jira write-up after the fact | Auto-generated mid-incident |
Knowledge retention | Lost when the senior leaves | Permanent in RAG corpus |
On-call escalation reason | "I don't know who deployed what" | Change agent already answered |
Impact estimation | Slack the BI team | Impact agent in 2 seconds |
Action execution | SSH, kubectl, prayer | One-click, audited, reversible |
Connected-impact view | Mental model in someone's head | Live topology graph |
Repository Layout
aiops-mcp/
βββ README.md # this file
βββ .env.example # annotated env var template
βββ pyproject.toml
βββ requirements.txt
βββ Dockerfile
βββ docker-compose.yml
βββ server/
β βββ main.py # CLI entry: aiops serve | mcp-stdio | dashboard
β βββ mcp_server.py # MCP protocol (stdio + HTTP)
β βββ api.py # FastAPI HTTP API + dashboard host
β βββ orchestrator.py # Supervisor: plans + fans out
β βββ synthesis.py # Final LLM correlation call
β βββ topology.py # Service graph + impact propagation
β βββ config.py # Env loading + mock fallback
β βββ agents/
β βββ base.py
β βββ log_agent.py
β βββ infra_agent.py
β βββ change_agent.py
β βββ docs_agent.py
β βββ impact_agent.py
β βββ audit_agent.py
βββ dashboard/
β βββ index.html # single-page UI (vanilla JS + vis-network)
βββ configs/
β βββ claude-desktop.json
β βββ claude-code.json
β βββ chatgpt-openapi-stub.json
β βββ topology.example.yaml
βββ docs/
β βββ INSTALLATION.md
β βββ INTEGRATIONS.md
β βββ MCP-USAGE.md
βββ tests/
βββ test_basic.pyDocumentation
When to read | Doc |
First-time install on a new host | |
Wiring into Claude / ChatGPT / Cursor / Continue / custom | |
Building your own MCP client against this server | |
Architecture deep-dive (v1 + v2 roadmap) |
License
MIT β see LICENSE. Use it, fork it, run it, ship it.
Support
π Issues / RFCs: GitHub Issues
π¬ Discussions: GitHub Discussions
π’ Enterprise support (multi-region, SLA, custom adapters): open an issue with
enterpriselabel
Built by people who've carried the pager.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Elvisaryan/aiops-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server