llm-d-bench-mcp
Integrates with Kubernetes clusters to deploy, run, monitor, and manage benchmark workloads, using the user's kubeconfig for authentication.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@llm-d-bench-mcpbenchmark llama 3 on 2x a100"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
llm-d-bench-mcp — the llm-d benchmarking MCP server
Give Claude Code the ability to benchmark llm-d from plain English. Point it at this
server and it can probe a cluster, propose a benchmark plan you approve, deploy an llm-d
stack, run the benchmark, and explain the results — inside the same security sandbox and
approval gates as the llm-d-benchmarking-agent
app.
Supported now: the
claude-agent-sdkprovider (no API key) wired into Claude Code (the CLI) — the only path the installer sets up and verifies. The server speaks standard MCP, so other providers (anthropic,openai) and clients (Claude Desktop, Cursor, VS Code, OpenAI Codex CLI) are planned for a future release.
It is the agent's toolset re-exposed over the Model Context Protocol: 35 tools, 5 workflow prompts, and the agent's entire knowledge base (60+ playbooks & heuristics) as readable resources — so a generic agent behaves like a benchmarking expert, not a blank slate.
Transport is stdio / local single-user: the server runs on your machine against your kubeconfig, trusted like any local tool you launch. There is no network/remote mode (see Security & scope).
How it fits together
This repo is the thin MCP adapter (~500 lines: transport + approval/event adapters + the
knowledge-exposure surface). The engine — the 35 tools, the security allowlist, and the
knowledge/ playbooks — lives in the
llm-d-benchmarking-agent repo, which
the installer clones at its latest main and installs into the same virtualenv. The engine
must run from a real checkout (it reads knowledge/, the allowlist, and the read-only sibling
repos from disk at runtime), which is why it is not a pip dependency.
Related MCP server: Ollama MCP Server
Install (one command)
The installer fetches the agent repo (the engine), clones the read-only sibling repos, builds a
virtualenv, installs the engine + this server into it, configures the claude-agent-sdk
provider, and registers the server with Claude Code (or prints the config to paste yourself):
bash <(curl -fsSL https://raw.githubusercontent.com/TalBenAmii/llm-d-bench-mcp/main/scripts/install.sh)Prefer to clone first? The same script runs from inside a checkout:
git clone https://github.com/TalBenAmii/llm-d-bench-mcp.git
cd llm-d-bench-mcp
./scripts/install.shIt is idempotent (safe to re-run). The claude-agent-sdk provider needs no API key — it
authenticates through your claude CLI login, so the only prerequisite is being logged in to
the claude CLI — the installer offers to install the CLI for you if it's missing.
What your agent gets
Tools (35)
Group | What your agent can do | Examples |
Sense & ground (read-only, auto-run) | Inspect the environment, GPUs, catalog, docs, knowledge |
|
Plan before you spend | Map a use case to a validated plan; check it fits |
|
Deploy & run (approval-gated) | Set up repos, run the CLI, orchestrate Jobs & sweeps |
|
Make sense of results (read-only) | Parse reports, compare runs/harnesses, track trends |
|
Observe & manage | Readiness checks, live cluster metrics, run management |
|
Trust & reproduce | Provenance bundles, reproduce a run |
|
Numbers are only ever reported from a validated Benchmark Report v0.2 — never scraped from logs or invented.
Workflow prompts (5)
Entry points that drop your agent into the right playbook:
Prompt | Arguments | What it sets up |
|
| The full interview → preconditions → plan → run → explain workflow |
|
| Choosing a deploy path + accelerator guidance |
|
| Parsing and explaining a benchmark report |
|
| Designing a design-of-experiments sweep |
|
| Iterative sweep rounds toward an SLO at best goodput |
Resources & instructions
Every knowledge file is exposed as a doc://knowledge/<name> resource, so your agent can read
the same playbooks the standalone agent reasons over. The server also advertises a
role/workflow preamble in its MCP instructions ("probe first, ground in docs, propose a plan,
run only with approval") that capable clients fold into their system prompt.
Manual config (Claude Code)
The installer does this for you; here's the block to wire it up by hand. The launch command is the console entry point created by installing this package — use its absolute path in the agent project's venv (the installer builds everything into that one venv):
claude mcp add llm-d-bench -s user -- /ABS/PATH/llm-d-benchmarking-agent-project/.venv/bin/llm-d-bench-mcp
# verify: claude mcp list (or /mcp inside a session)A gated-model HF_TOKEN is optional — add it with -e HF_TOKEN=hf_xxx; the agent project's
.env already carries the LLM provider config and is loaded regardless of how the server is
launched. The module form works too once both packages are installed in the venv:
claude mcp add llm-d-bench -s user -- /ABS/PATH/.venv/bin/python -m llm_d_bench_mcpSmoke-test it without a client using the official inspector:
npx @modelcontextprotocol/inspector /ABS/PATH/.venv/bin/llm-d-bench-mcpRequirements & scope
Python ≥ 3.11 and
git(the installer handles the venv viauv, orpython3 -m venv).LLM provider:
claude-agent-sdk— no API key, authenticated via yourclaudeCLI login.Client: Claude Code (the CLI).
No cluster needed for the advisory tools and knowledge resources. The deploy/run/orchestrate tools need a reachable Kubernetes cluster +
kubeconfig(andHF_TOKENfor gated models).The engine repo and its read-only siblings (
llm-d,llm-d-benchmark,llm-d-skills) must be on disk — the installer clones them all automatically.
Security & scope
stdio / local single-user only. The server has no network listener and no per-caller auth; it acts with your own kubeconfig. This is acceptable only for local use — HTTP/remote/shared transport is deliberately deferred, and "who may connect, whose credentials, what blast radius" become blocking questions before any such mode.
Approval is re-homed to your client. Every tool call is gated by your MCP client's own tool-permission prompt; the richer
SessionPlanapproval uses MCP elicitation where the client supports it (with a graceful fallback otherwise). Nothing mutating runs without your say-so — never a silent auto-approve.
Design of record and rationale: DESIGN.md. The engine / full agent:
llm-d-benchmarking-agent.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/TalBenAmii/llm-d-bench-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server