Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP vLLM Benchmarking Toolbenchmark deepseek-ai/DeepSeek-R1-Distill-Llama-8B on http://10.0.101.39:8888 with 32 prompts, 3 runs, skip warmup"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP vLLM Benchmarking Tool
This is proof of concept on how to use MCP to interactively benchmark vLLM.
We are not new to benchmarking, read our blog:
This is just an exploration of possibilities with MCP.
Usage
Clone the repository
Add it to your MCP servers:
Then you can prompt for example like this:
Related MCP server: MCP Prompt Tester
Todo:
Due to some random outputs by vllm it may show that it found some invalid json. I have not really looked into it yet.