Provides tools for querying Kubernetes clusters to retrieve GPU node information, including node status, labels, and InfiniBand topology details for Azure HPC/AI environments
Azure HPC/AI MCP Server
A minimal Model Context Protocol (MCP) server tailored for Azure HPC/AI clusters. It provides tools that query Kubernetes for GPU node information using kubectl. The server is implemented with fastmcp and uses synchronous subprocess calls (no asyncio).
Tools
list_nodes: Lists nodes in the GPU pool with name, labels, and Ready/NotReady status.
get_node_topologies: Returns InfiniBand-related topology labels per node: agentpool, pkey, torset.
Both tools shell out to kubectl and return JSON-serializable Python structures (lists of dicts).
Related MCP server: Azure Resource MCP Server
Run the server
Prerequisites:
Python 3.10+
kubectl configured to access your cluster
Installation
It’s recommended to use a virtual environment.
Create and activate a venv (Linux/macOS):
Install dependencies with pip:
Notes:
fastmcp is required to run the server and is installed via
requirements.txt. Tests don’t need it (they stub it).If fastmcp isn’t on PyPI for your environment, install it from its source per its documentation.
Run:
The server runs over stdio for MCP hosts. You can connect to it with an MCP-compatible client or call the tools locally with the helper script below.
invoke_local helper
The invoke_local.py script lets you execute server tools in-process without an MCP host. It discovers exported tools from server.py, calls them synchronously, and prints pretty JSON.
Examples:
Implementation notes:
No asyncio is used; tool functions call
subprocess.rundirectly and return plain Python data.The script unwraps simple function tools or FastMCP FunctionTool-like wrappers and invokes them with kwargs from
--paramswhen provided.
Tests
The tests are written with pytest and exercise success and error paths without requiring a cluster.
Key points:
subprocess-based: Tests monkeypatch
subprocess.runto simulatekubectloutput and errors. There is no usage of asyncio in code or tests.fastmcp-free: Tests inject a lightweight dummy
FastMCPmodule so importingserver.pydoes not require the real dependency.Coverage: Both tools are validated for JSON parsing, Ready condition handling, missing labels, and
kubectlfailures.
Run tests:
Troubleshooting
kubectl not found: Ensure
kubectlis installed and on PATH for real runs. Tests do not require it.No nodes returned: Confirm your label selectors match your cluster (tools currently expect GPU/IB labels used in Azure HPC/AI pools).
fastmcp import error: Install
fastmcpfor runtime; tests provide a dummy stub so you can runpytestwithout it.