Skip to main content
Glama

run_benchmark

Measure vLLM server performance by running benchmarks with configurable request rates, duration, and datasets to evaluate model efficiency.

Instructions

Run a performance benchmark against the vLLM server using GuideLLM

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
targetNoTarget URL (default: from settings)
modelNoModel to benchmark
rateNoRequest rate (requests/sec) or 'sweep'sweep
max_requestsNoMaximum number of requests
max_secondsNoMaximum duration in seconds
dataNoDataset ('emulated' or path)emulated
output_pathNoPath to save results

Implementation Reference

  • Main handler function that executes the benchmark. It checks for GuideLLM availability, builds the command with arguments (target, model, rate, max_requests, max_seconds, data, output_path), runs the benchmark as a subprocess, captures stdout/stderr, and formats the results as TextContent.
    async def run_benchmark(arguments: dict[str, Any]) -> list[TextContent]:
        """
        Run a benchmark against the vLLM server using GuideLLM.
    
        Args:
            arguments: Dictionary containing:
                - target: Target URL (default: from settings)
                - model: Model to benchmark (optional)
                - rate: Request rate (requests/sec) or "sweep" for rate sweep
                - max_requests: Maximum number of requests
                - max_seconds: Maximum duration in seconds
                - data: Dataset to use ("emulated" or path to dataset)
                - output_path: Path to save results (optional)
    
        Returns:
            List of TextContent with benchmark results.
        """
        # Check if guidellm is available
        if not shutil.which("guidellm"):
            return [
                TextContent(
                    type="text",
                    text="Error: GuideLLM is not installed. Install it with:\n"
                         "```bash\n"
                         "pip install guidellm\n"
                         "```",
                )
            ]
    
        settings = get_settings()
    
        target = arguments.get("target", f"{settings.base_url}/v1")
        model = arguments.get("model", settings.model)
        rate = arguments.get("rate", "sweep")
        max_requests = arguments.get("max_requests")
        max_seconds = arguments.get("max_seconds", 120)
        data = arguments.get("data", "emulated")
        output_path = arguments.get("output_path")
    
        # Build command
        cmd = [
            "guidellm",
            "--target", target,
            "--rate", str(rate),
            "--max-seconds", str(max_seconds),
            "--data", data,
        ]
    
        if model:
            cmd.extend(["--model", model])
    
        if max_requests:
            cmd.extend(["--max-requests", str(max_requests)])
    
        if output_path:
            cmd.extend(["--output-path", output_path])
    
        # Run benchmark
        try:
            process = await asyncio.create_subprocess_exec(
                *cmd,
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE,
            )
    
            # Stream output in real-time would be ideal, but for now we wait
            stdout, stderr = await process.communicate()
    
            stdout_str = stdout.decode("utf-8")
            stderr_str = stderr.decode("utf-8")
    
            if process.returncode != 0:
                return [
                    TextContent(
                        type="text",
                        text=f"Benchmark failed with exit code {process.returncode}:\n\n"
                             f"**stderr:**\n```\n{stderr_str}\n```\n\n"
                             f"**stdout:**\n```\n{stdout_str}\n```",
                    )
                ]
    
            # Format results
            result = "## Benchmark Results\n\n"
            result += f"**Command:** `{' '.join(cmd)}`\n\n"
            result += "**Output:**\n```\n"
            result += stdout_str
            result += "\n```"
    
            if output_path:
                result += f"\n\nResults saved to: `{output_path}`"
    
            return [TextContent(type="text", text=result)]
    
        except Exception as e:
            return [
                TextContent(
                    type="text",
                    text=f"Error running benchmark: {str(e)}",
                )
            ]
  • Tool schema definition that defines the input/output parameters for run_benchmark. Includes target URL, model, rate, max_requests, max_seconds, data, and output_path with their types, descriptions, and defaults.
    Tool(
        name="run_benchmark",
        description="Run a performance benchmark against the vLLM server using GuideLLM",
        inputSchema={
            "type": "object",
            "properties": {
                "target": {
                    "type": "string",
                    "description": "Target URL (default: from settings)",
                },
                "model": {
                    "type": "string",
                    "description": "Model to benchmark",
                },
                "rate": {
                    "type": "string",
                    "description": "Request rate (requests/sec) or 'sweep'",
                    "default": "sweep",
                },
                "max_requests": {
                    "type": "integer",
                    "description": "Maximum number of requests",
                },
                "max_seconds": {
                    "type": "integer",
                    "description": "Maximum duration in seconds",
                    "default": 120,
                },
                "data": {
                    "type": "string",
                    "description": "Dataset ('emulated' or path)",
                    "default": "emulated",
                },
                "output_path": {
                    "type": "string",
                    "description": "Path to save results",
                },
            },
        },
    ),
  • Tool registration in the call_tool handler that routes 'run_benchmark' calls to the handler function.
    elif name == "run_benchmark":
        return await run_benchmark(arguments)
  • Import statement that makes run_benchmark available from the tools module.
    from vllm_mcp_server.tools.benchmark import run_benchmark
  • Export of run_benchmark in __all__ list for module visibility.
    "run_benchmark",

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/micytao/vllm-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server