run_benchmark
Measure vLLM server performance by running benchmarks with configurable request rates, duration, and datasets to evaluate model efficiency.
Instructions
Run a performance benchmark against the vLLM server using GuideLLM
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| target | No | Target URL (default: from settings) | |
| model | No | Model to benchmark | |
| rate | No | Request rate (requests/sec) or 'sweep' | sweep |
| max_requests | No | Maximum number of requests | |
| max_seconds | No | Maximum duration in seconds | |
| data | No | Dataset ('emulated' or path) | emulated |
| output_path | No | Path to save results |
Implementation Reference
- Main handler function that executes the benchmark. It checks for GuideLLM availability, builds the command with arguments (target, model, rate, max_requests, max_seconds, data, output_path), runs the benchmark as a subprocess, captures stdout/stderr, and formats the results as TextContent.
async def run_benchmark(arguments: dict[str, Any]) -> list[TextContent]: """ Run a benchmark against the vLLM server using GuideLLM. Args: arguments: Dictionary containing: - target: Target URL (default: from settings) - model: Model to benchmark (optional) - rate: Request rate (requests/sec) or "sweep" for rate sweep - max_requests: Maximum number of requests - max_seconds: Maximum duration in seconds - data: Dataset to use ("emulated" or path to dataset) - output_path: Path to save results (optional) Returns: List of TextContent with benchmark results. """ # Check if guidellm is available if not shutil.which("guidellm"): return [ TextContent( type="text", text="Error: GuideLLM is not installed. Install it with:\n" "```bash\n" "pip install guidellm\n" "```", ) ] settings = get_settings() target = arguments.get("target", f"{settings.base_url}/v1") model = arguments.get("model", settings.model) rate = arguments.get("rate", "sweep") max_requests = arguments.get("max_requests") max_seconds = arguments.get("max_seconds", 120) data = arguments.get("data", "emulated") output_path = arguments.get("output_path") # Build command cmd = [ "guidellm", "--target", target, "--rate", str(rate), "--max-seconds", str(max_seconds), "--data", data, ] if model: cmd.extend(["--model", model]) if max_requests: cmd.extend(["--max-requests", str(max_requests)]) if output_path: cmd.extend(["--output-path", output_path]) # Run benchmark try: process = await asyncio.create_subprocess_exec( *cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, ) # Stream output in real-time would be ideal, but for now we wait stdout, stderr = await process.communicate() stdout_str = stdout.decode("utf-8") stderr_str = stderr.decode("utf-8") if process.returncode != 0: return [ TextContent( type="text", text=f"Benchmark failed with exit code {process.returncode}:\n\n" f"**stderr:**\n```\n{stderr_str}\n```\n\n" f"**stdout:**\n```\n{stdout_str}\n```", ) ] # Format results result = "## Benchmark Results\n\n" result += f"**Command:** `{' '.join(cmd)}`\n\n" result += "**Output:**\n```\n" result += stdout_str result += "\n```" if output_path: result += f"\n\nResults saved to: `{output_path}`" return [TextContent(type="text", text=result)] except Exception as e: return [ TextContent( type="text", text=f"Error running benchmark: {str(e)}", ) ] - Tool schema definition that defines the input/output parameters for run_benchmark. Includes target URL, model, rate, max_requests, max_seconds, data, and output_path with their types, descriptions, and defaults.
Tool( name="run_benchmark", description="Run a performance benchmark against the vLLM server using GuideLLM", inputSchema={ "type": "object", "properties": { "target": { "type": "string", "description": "Target URL (default: from settings)", }, "model": { "type": "string", "description": "Model to benchmark", }, "rate": { "type": "string", "description": "Request rate (requests/sec) or 'sweep'", "default": "sweep", }, "max_requests": { "type": "integer", "description": "Maximum number of requests", }, "max_seconds": { "type": "integer", "description": "Maximum duration in seconds", "default": 120, }, "data": { "type": "string", "description": "Dataset ('emulated' or path)", "default": "emulated", }, "output_path": { "type": "string", "description": "Path to save results", }, }, }, ), - src/vllm_mcp_server/server.py:365-366 (registration)Tool registration in the call_tool handler that routes 'run_benchmark' calls to the handler function.
elif name == "run_benchmark": return await run_benchmark(arguments) - src/vllm_mcp_server/tools/__init__.py:14-14 (registration)Import statement that makes run_benchmark available from the tools module.
from vllm_mcp_server.tools.benchmark import run_benchmark - src/vllm_mcp_server/tools/__init__.py:28-28 (registration)Export of run_benchmark in __all__ list for module visibility.
"run_benchmark",