Skip to main content
Glama

llm_benchmark

Run benchmarks with multiple prompts to evaluate LLM performance metrics, including response time and quality, for model comparison and testing.

Instructions

Ejecuta un benchmark con múltiples prompts para evaluar rendimiento del modelo

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
baseURLNoURL del servidor OpenAI-compatible (ej: http://localhost:1234/v1, http://localhost:11434/v1)
apiKeyNoAPI Key (requerida para OpenAI/Azure, opcional para servidores locales)
promptsYesLista de prompts para el benchmark
modelNoID del modelo
maxTokensNoMax tokens por respuesta (default: 256)
temperatureNoTemperatura (default: 0.7)
topPNoTop P para nucleus sampling
runsNoEjecuciones por prompt (default: 1)

Implementation Reference

  • Main handler function for the llm_benchmark tool. Validates input with BenchmarkSchema, creates an LLMClient instance, executes benchmark via client.runBenchmark, and formats results into a comprehensive markdown report including summary stats and detailed per-prompt metrics.
    async llm_benchmark(args: z.infer<typeof BenchmarkSchema>) { const client = getClient(args); const { results, summary } = await client.runBenchmark(args.prompts, { model: args.model, maxTokens: args.maxTokens, temperature: args.temperature, runs: args.runs, }); let output = `# 📊 Benchmark Results\n\n`; output += `## Resumen\n`; output += `- **Prompts totales:** ${summary.totalPrompts}\n`; output += `- **Latencia promedio:** ${summary.avgLatencyMs.toFixed(2)} ms\n`; output += `- **Tokens/segundo promedio:** ${summary.avgTokensPerSecond.toFixed(2)}\n`; output += `- **Total tokens generados:** ${summary.totalTokensGenerated}\n\n`; output += `## Resultados Detallados\n\n`; results.forEach((r, i) => { output += `### Prompt ${i + 1}\n`; output += `> ${r.prompt.substring(0, 100)}${r.prompt.length > 100 ? "..." : ""}\n\n`; output += `- Latencia: ${r.latencyMs} ms\n`; output += `- Tokens: ${r.completionTokens}\n`; output += `- Velocidad: ${r.tokensPerSecond.toFixed(2)} tok/s\n\n`; }); return { content: [{ type: "text" as const, text: output }] }; },
  • Zod schema defining the input parameters for the llm_benchmark tool, including required prompts array and optional model, token limits, temperature, and run count.
    export const BenchmarkSchema = ConnectionConfigSchema.extend({ prompts: z.array(z.string()).describe("Lista de prompts para el benchmark"), model: z.string().optional().describe("ID del modelo a usar"), maxTokens: z.number().optional().default(256).describe("Máximo de tokens por respuesta"), temperature: z.number().optional().default(0.7).describe("Temperatura"), topP: z.number().optional().describe("Top P para nucleus sampling"), runs: z.number().optional().default(1).describe("Número de ejecuciones por prompt"), });
  • src/tools.ts:116-136 (registration)
    MCP tool registration entry in the exported tools array, specifying the name, description, and inputSchema for llm_benchmark to be returned by ListToolsRequest.
    { name: "llm_benchmark", description: "Ejecuta un benchmark con múltiples prompts para evaluar rendimiento del modelo", inputSchema: { type: "object" as const, properties: { ...connectionProperties, prompts: { type: "array", items: { type: "string" }, description: "Lista de prompts para el benchmark", }, model: { type: "string", description: "ID del modelo" }, maxTokens: { type: "number", description: "Max tokens por respuesta (default: 256)" }, temperature: { type: "number", description: "Temperatura (default: 0.7)" }, topP: { type: "number", description: "Top P para nucleus sampling" }, runs: { type: "number", description: "Ejecuciones por prompt (default: 1)" }, }, required: ["prompts"], }, },
  • src/index.ts:64-65 (registration)
    Dispatch logic in the MCP CallToolRequest handler switch statement that routes execution to the llm_benchmark handler function.
    case "llm_benchmark": return await toolHandlers.llm_benchmark(args as any);
  • src/index.ts:42-44 (registration)
    MCP ListToolsRequest handler that returns the tools array containing the llm_benchmark tool definition.
    server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools }; });

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ramgeart/llm-mcp-bridge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server