Which integrations are available for this server?

Integrates with Kubernetes clusters to deploy, run, monitor, and manage benchmark workloads, using the user's kubeconfig for authentication.

How do I use llm-d-bench-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@llm-d-bench-mcp benchmark llama 3 on 2x a100" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

llm-d-bench-mcp

by TalBenAmii

Overview Schema Related Servers Score Discussions

Python

Local

llm-d-bench-mcp — the llm-d benchmarking MCP server

Give Claude Code the ability to benchmark llm-d from plain English. Point it at this server and it can probe a cluster, propose a benchmark plan you approve, deploy an llm-d stack, run the benchmark, and explain the results — inside the same security sandbox and approval gates as the llm-d-benchmarking-agent app.

Supported now: the claude-agent-sdk provider (no API key) wired into Claude Code (the CLI) — the only path the installer sets up and verifies. The server speaks standard MCP, so other providers (anthropic, openai) and clients (Claude Desktop, Cursor, VS Code, OpenAI Codex CLI) are planned for a future release.

It is the agent's toolset re-exposed over the Model Context Protocol: 35 tools, 5 workflow prompts, and the agent's entire knowledge base (60+ playbooks & heuristics) as readable resources — so a generic agent behaves like a benchmarking expert, not a blank slate.

Transport is stdio / local single-user: the server runs on your machine against your kubeconfig, trusted like any local tool you launch. There is no network/remote mode (see Security & scope).

How it fits together

This repo is the thin MCP adapter (~500 lines: transport + approval/event adapters + the knowledge-exposure surface). The engine — the 35 tools, the security allowlist, and the knowledge/ playbooks — lives in the llm-d-benchmarking-agent repo, which the installer clones at its latest main and installs into the same virtualenv. The engine must run from a real checkout (it reads knowledge/, the allowlist, and the read-only sibling repos from disk at runtime), which is why it is not a pip dependency.

Related MCP server: Ollama MCP Server

Install (one command)

The installer fetches the agent repo (the engine), clones the read-only sibling repos, builds a virtualenv, installs the engine + this server into it, configures the claude-agent-sdk provider, and registers the server with Claude Code (or prints the config to paste yourself):

bash <(curl -fsSL https://raw.githubusercontent.com/TalBenAmii/llm-d-bench-mcp/main/scripts/install.sh)

Prefer to clone first? The same script runs from inside a checkout:

git clone https://github.com/TalBenAmii/llm-d-bench-mcp.git
cd llm-d-bench-mcp
./scripts/install.sh

It is idempotent (safe to re-run). The claude-agent-sdk provider needs no API key — it authenticates through your claude CLI login, so the only prerequisite is being logged in to the claude CLI — the installer offers to install the CLI for you if it's missing.

What your agent gets

Tools (35)

Group	What your agent can do	Examples
Sense & ground (read-only, auto-run)	Inspect the environment, GPUs, catalog, docs, knowledge	`probe_environment`, `advise_accelerators`, `list_catalog`, `discover_stack`, `search_knowledge`, `read_knowledge`, `fetch_key_docs`, `read_repo_doc`
Plan before you spend	Map a use case to a validated plan; check it fits	`propose_session_plan`, `check_capacity`, `estimate_run_duration`, `write_and_validate_config`, `generate_doe_experiment`
Deploy & run (approval-gated)	Set up repos, run the CLI, orchestrate Jobs & sweeps	`ensure_repos`, `run_setup`, `execute_llmdbenchmark`, `orchestrate_benchmark_run`, `orchestrate_sweep`, `provision_hf_secret`
Make sense of results (read-only)	Parse reports, compare runs/harnesses, track trends	`locate_and_parse_report`, `analyze_results`, `compare_reports`, `compare_harness_runs`, `aggregate_runs`, `result_history`
Observe & manage	Readiness checks, live cluster metrics, run management	`check_endpoint_readiness`, `observe_run_metrics`, `manage_orchestrated_runs`, `cancel_run`
Trust & reproduce	Provenance bundles, reproduce a run	`export_run_bundle`, `reproduce_run`

Numbers are only ever reported from a validated Benchmark Report v0.2 — never scraped from logs or invented.

Workflow prompts (5)

Entry points that drop your agent into the right playbook:

Prompt	Arguments	What it sets up
`benchmark_this_model`	`model?`, `goal?`, `slo?`	The full interview → preconditions → plan → run → explain workflow
`pick_deploy_path`	`model?`, `accelerator?`	Choosing a deploy path + accelerator guidance
`interpret_this_report`	`report_path?`	Parsing and explaining a benchmark report
`design_a_sweep`	`objective?`	Designing a design-of-experiments sweep
`goal_seek_to_slo`	`slo`	Iterative sweep rounds toward an SLO at best goodput

Resources & instructions

Every knowledge file is exposed as a doc://knowledge/<name> resource, so your agent can read the same playbooks the standalone agent reasons over. The server also advertises a role/workflow preamble in its MCP instructions ("probe first, ground in docs, propose a plan, run only with approval") that capable clients fold into their system prompt.

Manual config (Claude Code)

The installer does this for you; here's the block to wire it up by hand. The launch command is the console entry point created by installing this package — use its absolute path in the agent project's venv (the installer builds everything into that one venv):

claude mcp add llm-d-bench -s user -- /ABS/PATH/llm-d-benchmarking-agent-project/.venv/bin/llm-d-bench-mcp
# verify:  claude mcp list   (or /mcp inside a session)

A gated-model HF_TOKEN is optional — add it with -e HF_TOKEN=hf_xxx; the agent project's .env already carries the LLM provider config and is loaded regardless of how the server is launched. The module form works too once both packages are installed in the venv:

claude mcp add llm-d-bench -s user -- /ABS/PATH/.venv/bin/python -m llm_d_bench_mcp

Smoke-test it without a client using the official inspector:

npx @modelcontextprotocol/inspector /ABS/PATH/.venv/bin/llm-d-bench-mcp

Requirements & scope

Python ≥ 3.11 and git (the installer handles the venv via uv, or python3 -m venv).
LLM provider: claude-agent-sdk — no API key, authenticated via your claude CLI login.
Client: Claude Code (the CLI).
No cluster needed for the advisory tools and knowledge resources. The deploy/run/orchestrate tools need a reachable Kubernetes cluster + kubeconfig (and HF_TOKEN for gated models).
The engine repo and its read-only siblings (llm-d, llm-d-benchmark, llm-d-skills) must be on disk — the installer clones them all automatically.

Security & scope

stdio / local single-user only. The server has no network listener and no per-caller auth; it acts with your own kubeconfig. This is acceptable only for local use — HTTP/remote/shared transport is deliberately deferred, and "who may connect, whose credentials, what blast radius" become blocking questions before any such mode.
Approval is re-homed to your client. Every tool call is gated by your MCP client's own tool-permission prompt; the richer SessionPlan approval uses MCP elicitation where the client supports it (with a graceful fallback otherwise). Nothing mutating runs without your say-so — never a silent auto-approve.

Design of record and rationale: DESIGN.md. The engine / full agent: llm-d-benchmarking-agent.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/TalBenAmii/llm-d-bench-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server