vibops-mcp
Officialvibops-mcp is a unified MCP server for managing GPU infrastructure across any cloud or on-prem provider, offering 70+ tools across six capability areas:
Observation & Monitoring
List clusters with GPU utilization summaries, kubectl contexts, gateways, secrets, providers, and pipelines
Get live Kubernetes deployment status, hourly GPU utilization time-series, and workload breakdowns
Monitor job metrics (throughput, latency P50/P95/P99), alerts, and MTTR
Estimate GPU spend based on configured cost rates
Actions & Operations
Scale Kubernetes deployments, deploy AI models (e.g., llama3:8b, mistral:7b), run Helm/kubectl/git commands
Store encrypted secrets, trigger automation pipelines
Submit, monitor, cancel, and retrieve output from Slurm HPC jobs
Manage container registries (Harbor, ECR, GAR): list repos/tags, check images, delete stale tags
Governance & Compliance
Detect and resolve GPU anomalies; track AI Act compliance controls (Art. 9, 12, 13–15, 17) and overall compliance score
Generate and retrieve SOC 2, GDPR, and HIPAA reports
Query and verify the cryptographic integrity of the immutable audit log (HMAC-SHA256 chain)
Manage org-wide policies, configure LDAP/Active Directory, and push audit events to SIEM systems (Splunk, Datadog)
Run LLM-as-judge evaluations on completed jobs against custom rubrics
Agent Infrastructure Control Plane
Create, rotate, and revoke agent machine identities
Set per-agent monthly inference budgets with soft/hard caps and automatic blocking (HTTP 429)
Define model access control rules using glob patterns to restrict agents to specific LLMs
Visualize the full agent dependency graph (agent→model, agent→connector, agent→sub-agent)
GPU FinOps & Cost Management
Track organization-wide GPU budget and consumption; get chargeback reports by tenant and agent
Analyze daily GPU spend trends (up to 90 days) with anomaly flagging
Identify idle/wasted GPU resources with remediation recommendations
Get per-agent LLM inference usage (tokens, GPU cost, request counts) by team or model
Configuration
Set cluster GPU cost rates, register/delete gateways
Allows CrewAI agents to interact with GPU infrastructure for deployment and monitoring.
Allows exporting audit events and GPU metrics to Datadog for monitoring and alerting.
Provides tools for cloning Git repositories as part of deployment workflows.
Provides tools for managing Helm releases on Kubernetes clusters, including upgrade and uninstall operations.
Provides LangChain-based AI agents with tools to observe, operate, and govern GPU clusters.
Enables n8n workflows to manage GPU infrastructure, including deployment, monitoring, and cost optimization.
Allows AI agents to route inference requests through the VibOps proxy to Ollama, with cost attribution and logging.
Supports OpenAI-compatible API for inference, capturing usage data for FinOps.
Allows exporting audit events and GPU metrics to Splunk for security information and event management.
vibops-mcp
The provider-agnostic MCP server for GPU infrastructure — one interface for any cloud, any cluster, any provider.
The problem
Large enterprises and CSPs managing GPU infrastructure deal with fragmentation — AWS, GCP, Azure, on-prem, neoclouds, each with their own API, dashboard, and cost model. Correlating utilisation, cost, workload type, and compliance posture across providers requires jumping between 5 tools.
Related MCP server: Bastion
The solution
vibops-mcp is a single MCP server that abstracts this complexity. One pip install, 70 tools, and your AI assistant can observe, operate, govern, and optimize your entire GPU fleet — regardless of where it runs.
Observe — GPU utilisation, workload breakdown, MTTR, cost estimates, live K8s deployments
Act — deploy models, scale deployments, run Helm/kubectl, trigger pipelines, submit Slurm jobs
Govern — anomaly detection, AI Act compliance, SOC 2/RGPD reports, immutable audit chain, policy management
FinOps — budget tracking, chargeback, spend trends, waste analysis
All operations go through your VibOps instance and are recorded in the audit log.
Installation
pip install git+https://github.com/VibOpsai/vibops-mcp.gitConfiguration
You need two environment variables:
Variable | Description |
| Base URL of your VibOps instance, e.g. |
| API token — create one in VibOps → Settings → API Tokens |
Claude Desktop
Add to ~/.config/claude/claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"vibops": {
"command": "vibops-mcp",
"env": {
"VIBOPS_URL": "https://vibops.example.com",
"VIBOPS_TOKEN": "your-token-here"
}
}
}
}Cursor
Add to .cursor/mcp.json in your project root, or to the global config:
{
"mcpServers": {
"vibops": {
"command": "vibops-mcp",
"env": {
"VIBOPS_URL": "https://vibops.example.com",
"VIBOPS_TOKEN": "your-token-here"
}
}
}
}Claude Code (CLI)
claude mcp add vibops vibops-mcp \
-e VIBOPS_URL=https://vibops.example.com \
-e VIBOPS_TOKEN=your-token-hereAvailable tools
Observation (16 tools — read-only)
Tool | Description |
| List clusters and GPU utilisation |
| List available kubectl contexts |
| Live K8s deployment status for a cluster |
| Get configured GPU cost rate for a cluster |
| List recent jobs with optional filters |
| Get job details and result |
| Job success rate, latency P50/P95/P99, error breakdown |
| Hourly GPU utilisation time-series |
| Job count by workload type |
| Mean Time To Resolve GPU alerts |
| Estimated GPU spend |
| List registered gateways and status |
| List GPU alerts (open or resolved) |
| List secrets (names only, never values) |
| List configured AI/GPU cloud providers |
| List automation pipelines |
Actions (18 tools — write)
Tool | Description |
| Scale a K8s deployment replica count |
| Deploy an AI model onto a GPU cluster |
| Run helm upgrade --install |
| Uninstall a Helm release |
| Run an arbitrary kubectl command |
| Clone a git repository |
| Store an encrypted secret |
| Manually trigger an automation pipeline |
| Get Slurm cluster info and partition details |
| List Slurm jobs with optional filters |
| Get status of a specific Slurm job |
| Retrieve stdout/stderr of a completed Slurm job |
| Submit a new Slurm job |
| Cancel a running or pending Slurm job |
| List container registry repositories |
| List tags for a container image |
| Check image details (size, layers, created date) |
| Delete a stale image tag (requires confirmed=True) |
Configuration (3 tools)
Tool | Description |
| Set GPU cost rate for a cluster (admin only) |
| Register a new gateway (returns one-time token) |
| Revoke a gateway |
Agent Infrastructure Control Plane (12 tools)
The missing layer between your AI agents and your GPU fleet. Works with any framework (n8n, LangChain, CrewAI, Dify) — just point to the VibOps LLM Proxy.
Tool | Description |
FinOps per agent | |
| GPU cost per agent — tokens, requests, cost, GPU-hours. "Which agent costs the most?" |
| Drill-down on one agent — daily breakdown, model distribution, cost trend |
| Current budget + MTD spend for an agent |
| Set monthly spend limit — soft alert at 80%, hard block at 100% (HTTP 429) |
Model access control | |
| List model access rules — which agent can use which LLM |
| Create a rule: glob patterns, deny-first. "RH agents → Mistral only" |
Identity lifecycle | |
| List machine identities for agents |
| Create a new machine identity (key shown once) |
| Rotate the key for an existing identity |
| Revoke an identity immediately |
Dependency graph | |
| Full org-wide graph: agent→model, agent→connector, agent→sub-agent |
| Dependencies for one agent — impact analysis before migration |
Governance & Compliance (21 tools)
Tool | Description |
| List GPU anomalies with optional cluster/status filter |
| Get all currently open anomalies |
| Mark an anomaly as resolved |
| List AI Act compliance controls |
| Get the overall AI Act compliance score |
| Update status, notes, or evidence URL for a control |
| List generated compliance reports |
| Generate a SOC 2, RGPD, or HIPAA report asynchronously |
| Poll/retrieve a generated compliance report |
| Query the immutable audit log with filters |
| Verify HMAC-SHA256 integrity of the full audit chain |
| Get the current organisation policy |
| Replace the organisation policy (immediate effect) |
| List LLM-as-judge evaluation rubrics |
| Trigger LLM-as-judge evaluation for a job |
| Retrieve evaluation results for a job |
| Get LDAP / Active Directory configuration |
| Configure or enable/disable LDAP integration |
| Get SIEM push export configuration |
| Set Splunk/Datadog SIEM destination |
| Export audit events to configured SIEM |
GPU FinOps (4 tools)
Tool | Description |
| Get current GPU budget and consumed spend |
| Get chargeback breakdown by tenant for a given month |
| Get daily GPU spend trend (default: last 30 days) |
| Identify idle GPU resources and cost optimisation opportunities |
LLM Inference Proxy
VibOps includes a transparent OpenAI-compatible proxy (port 8004) that sits between your AI agents and LLM inference servers (vLLM, Ollama, TGI). Every inference request is logged with agent attribution for FinOps.
Your agents point to the proxy instead of the LLM directly:
# Before
OPENAI_BASE_URL=http://vllm:8000/v1
# After
OPENAI_BASE_URL=http://vibops-proxy:8004/v1Add a X-VibOps-Agent-Id header to attribute costs per agent:
curl -X POST http://vibops-proxy:8004/v1/chat/completions \
-H "X-VibOps-Agent-Id: pricing-agent-v2" \
-H "X-VibOps-Team: supply-chain" \
-d '{"model": "mistral:7b", "messages": [...]}'The proxy captures: agent ID, team, model, tokens, latency, GPU cost — visible in the console FinOps dashboard and queryable via get_agent_usage.
Example prompts
"What's our GPU utilisation trend over the last 7 days?"
"Show me the cost breakdown per cluster this week."
"Deploy llama3:8b on vibops-dev with 2 replicas."
"Which clusters have open critical GPU alerts?"
"Scale the inference deployment to 4 replicas on prod-cluster."
"What's our MTTR for critical alerts?"
"Are there any open GPU anomalies right now?"
"What's our AI Act compliance score and which controls are non-compliant?"
"Generate a SOC 2 report for Q1 2026."
"Verify the audit chain hasn't been tampered with."
"Show me the spend trend for the last 7 days and flag any waste."
"Create a machine identity for the pricing-agent with a 1-year expiry."
"Which agents depend on the claude-opus-4-6 model?"
"Which agent costs the most in GPU this month?"
"Show me the inference cost breakdown for the pricing agent."
"What's the GPU spend per team for the last 7 days?"Contributing
See CONTRIBUTING.md. All contributions require a DCO sign-off (git commit -s).
License
MIT — free to use, modify, and distribute. See LICENSE.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/VibOpsai/vibops-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server