MCP vLLM Benchmarking Tool

This is proof of concept on how to use MCP to interactively benchmark vLLM.

We are not new to benchmarking, read our blog:

This is just an exploration of possibilities with MCP.

Usage

Clone the repository
Add it to your MCP servers:

{ "mcpServers": { "mcp-vllm": { "command": "uv", "args": [ "run", "/Path/TO/mcp-vllm-benchmarking-tool/server.py" ] } } }

Then you can prompt for example like this:

Do a vllm benchmark for this endpoint: http://10.0.101.39:8888 benchmark the following model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B run the benchmark 3 times with each 32 num prompts, then compare the results, but ignore the first iteration as that is just a warmup.

Related MCP server: mcp-llm