Kubernetes + Prometheus SRE MCP Server
Allows natural language Kubernetes operations including pod management, scaling, runbooks, and monitoring.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Kubernetes + Prometheus SRE MCP ServerList crashlooping pods in production"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π€ Kubernetes + Prometheus SRE MCP Server β natural language cluster ops, SLO monitoring, and PromQL queries via Claude
Natural language Kubernetes operations β powered by Model Context Protocol (MCP)
Built to scale from a single cluster to multi-cluster, multi-team enterprise environments.
π― What Is This?
An MCP (Model Context Protocol) server that exposes Kubernetes SRE operations as tools an AI assistant can call.
You: "Run the high error rate runbook for the production namespace"
Claude: [calls run_runbook β executes org-approved diagnosis sequence]
Step 1: Checked deployments β nginx (3/3), api-service (1/3 β οΈ)
Step 2: Found pod api-service-7f9d β 47 restarts, OOMKilled
Step 3: Warning events β OOMKilled x3 in last 10 minutes
Recommendation: Increase memory limit to 512Mi + scale to 5 replicas⨠What's New in v2.0
Feature | v1 | v2 |
Clusters supported | 1 (hardcoded) | Many (dynamic context switching) |
Write operations | Unrestricted | Policy-checked with guardrails |
Audit trail | None | Full structured JSON log |
Incident diagnosis | Ad-hoc | Encoded runbooks (standardized) |
Operational consistency | Per-engineer | Org-wide enforced |
π οΈ Tools
Read
Tool | Description |
| All clusters in kubeconfig |
| Pod status, restarts, container states |
| CrashLoopBackOff pods across all namespaces |
| Logs including previous crashed container |
| Node readiness and pressure conditions |
| Desired vs ready vs available replicas |
| Warning events β key incident signal |
| All namespaces |
Write (Policy-checked + Audit-logged)
Tool | Policy Enforced |
| Max replicas Β· Blocked namespaces Β· Prod minimums |
SRE Runbooks
Tool | Description |
| Available runbooks with triggers |
| Execute org-standard diagnosis sequence |
Governance
Tool | Description |
| All recent operations with timestamps |
ποΈ Architecture
Claude Desktop (MCP Host)
β
β MCP Protocol (stdio / JSON-RPC)
βΌ
βββββββββββββββββββββββββββββββββββββββ
β SRE MCP Server v2 β
β server.py β entry point β
β cluster_manager β multi-cluster β
β policy.py β write guards β
β audit.py β JSON audit log β
β runbooks.py β SRE runbooks β
ββββββββββββββββ¬βββββββββββββββββββββββ
β kubernetes Python SDK
βΌ
ββββββββββββββββββββββββ
β Kubernetes Clusters β
β (any kubeconfig β
β context) β
ββββββββββββββββββββββββπ Quick Start
git clone https://github.com/ManishMaurya22/sre-mcp-server
cd sre-mcp-server
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txtEdit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"sre-k8s": {
"command": "/Users/<YOUR_USERNAME>/sre-mcp-server/venv/bin/python",
"args": ["/Users/<YOUR_USERNAME>/sre-mcp-server/server.py"]
}
}
}See docs/SETUP.md for full setup guide.
π Policy Configuration
export POLICY_MAX_REPLICAS=30
export POLICY_SCALE_BLOCKED_NS="kube-system,gatekeeper-system"
export POLICY_PROD_NAMESPACES="production,prod"
export POLICY_PROD_MIN_REPLICAS=2You: "Scale nginx to 0 in production"
Claude: β Policy Denied β scaling to 0 not allowed in production (min: 2)
Operation audit-logged.π Encoded Runbooks
Available: high_error_rate Β· node_pressure Β· deployment_rollback
You: "Run the high_error_rate runbook for production"
Claude runs in order:
1. get_deployments β spot unhealthy deployments
2. get_pods β check restart counts
3. get_events β surface warning signals
4. get_crashlooping_pods β cluster-wide check
+ surfaces remediation hintsποΈ Structure
sre-mcp-server/
βββ server.py # Main MCP server
βββ cluster_manager.py # Multi-cluster context management
βββ policy.py # Write operation guardrails
βββ audit.py # Structured audit trail
βββ runbooks.py # Encoded SRE runbooks
βββ requirements.txt
βββ tools/k8s_tools.py
βββ config/claude_desktop_config.example.json
βββ docs/
β βββ SETUP.md
β βββ INTERVIEW_GUIDE.md
βββ .github/workflows/ci.yamlπΊοΈ Roadmap
Prometheus MCP β SLO burn rate queries
PagerDuty MCP β incident acknowledgement
ArgoCD MCP β GitOps sync and triggers
Central MCP Gateway β auth + multi-team routing
π License
MIT β See LICENSE
Built by Manish Maurya β DevOps/SRE Leader | 16+ Years | Abu Dhabi, UAE Website: https://manishmaurya22.github.io/
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ManishMaurya22/sre-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server