llm_benchmark

Run benchmarks with multiple prompts to evaluate LLM performance metrics, including response time and quality, for model comparison and testing.

Instructions

Ejecuta un benchmark con múltiples prompts para evaluar rendimiento del modelo

Input Schema

TableJSON Schema

Name	Required	Description
`baseURL`	No	URL del servidor OpenAI-compatible (ej: http://localhost:1234/v1, http://localhost:11434/v1)
`apiKey`	No	API Key (requerida para OpenAI/Azure, opcional para servidores locales)
`prompts`	Yes	Lista de prompts para el benchmark
`model`	No	ID del modelo
`maxTokens`	No	Max tokens por respuesta (default: 256)
`temperature`	No	Temperatura (default: 0.7)
`topP`	No	Top P para nucleus sampling
`runs`	No	Ejecuciones por prompt (default: 1)

Implementation Reference

src/tools.ts:306-332 (handler)
Main handler function for the llm_benchmark tool. Validates input with BenchmarkSchema, creates an LLMClient instance, executes benchmark via client.runBenchmark, and formats results into a comprehensive markdown report including summary stats and detailed per-prompt metrics.
async llm_benchmark(args: z.infer<typeof BenchmarkSchema>) { const client = getClient(args); const { results, summary } = await client.runBenchmark(args.prompts, { model: args.model, maxTokens: args.maxTokens, temperature: args.temperature, runs: args.runs, }); let output = `# 📊 Benchmark Results\n\n`; output += `## Resumen\n`; output += `- **Prompts totales:** ${summary.totalPrompts}\n`; output += `- **Latencia promedio:** ${summary.avgLatencyMs.toFixed(2)} ms\n`; output += `- **Tokens/segundo promedio:** ${summary.avgTokensPerSecond.toFixed(2)}\n`; output += `- **Total tokens generados:** ${summary.totalTokensGenerated}\n\n`; output += `## Resultados Detallados\n\n`; results.forEach((r, i) => { output += `### Prompt ${i + 1}\n`; output += `> ${r.prompt.substring(0, 100)}${r.prompt.length > 100 ? "..." : ""}\n\n`; output += `- Latencia: ${r.latencyMs} ms\n`; output += `- Tokens: ${r.completionTokens}\n`; output += `- Velocidad: ${r.tokensPerSecond.toFixed(2)} tok/s\n\n`; }); return { content: [{ type: "text" as const, text: output }] }; },
src/tools.ts:31-38 (schema)
Zod schema defining the input parameters for the llm_benchmark tool, including required prompts array and optional model, token limits, temperature, and run count.
export const BenchmarkSchema = ConnectionConfigSchema.extend({ prompts: z.array(z.string()).describe("Lista de prompts para el benchmark"), model: z.string().optional().describe("ID del modelo a usar"), maxTokens: z.number().optional().default(256).describe("Máximo de tokens por respuesta"), temperature: z.number().optional().default(0.7).describe("Temperatura"), topP: z.number().optional().describe("Top P para nucleus sampling"), runs: z.number().optional().default(1).describe("Número de ejecuciones por prompt"), });
src/tools.ts:116-136 (registration)
MCP tool registration entry in the exported tools array, specifying the name, description, and inputSchema for llm_benchmark to be returned by ListToolsRequest.
{ name: "llm_benchmark", description: "Ejecuta un benchmark con múltiples prompts para evaluar rendimiento del modelo", inputSchema: { type: "object" as const, properties: { ...connectionProperties, prompts: { type: "array", items: { type: "string" }, description: "Lista de prompts para el benchmark", }, model: { type: "string", description: "ID del modelo" }, maxTokens: { type: "number", description: "Max tokens por respuesta (default: 256)" }, temperature: { type: "number", description: "Temperatura (default: 0.7)" }, topP: { type: "number", description: "Top P para nucleus sampling" }, runs: { type: "number", description: "Ejecuciones por prompt (default: 1)" }, }, required: ["prompts"], }, },
src/index.ts:64-65 (registration)
Dispatch logic in the MCP CallToolRequest handler switch statement that routes execution to the llm_benchmark handler function.
case "llm_benchmark": return await toolHandlers.llm_benchmark(args as any);
src/index.ts:42-44 (registration)
MCP ListToolsRequest handler that returns the tools array containing the llm_benchmark tool definition.
server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools }; });

LLM MCP Bridge

llm_benchmark

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API