llm_quality_report
Generate comprehensive quality reports for LLM models by evaluating benchmarks, coherence, and capabilities through OpenAI-compatible APIs.
Instructions
Genera un reporte completo de calidad del modelo incluyendo benchmark, coherencia y capacidades
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| baseURL | No | URL del servidor OpenAI-compatible (ej: http://localhost:1234/v1, http://localhost:11434/v1) | |
| apiKey | No | API Key (requerida para OpenAI/Azure, opcional para servidores locales) | |
| model | No | ID del modelo a evaluar |
Implementation Reference
- src/tools.ts:430-486 (handler)The core handler function that implements the llm_quality_report tool. It performs a benchmark, coherence evaluation, capabilities test on the specified model, compiles metrics, and returns a formatted markdown report with an overall quality score.async llm_quality_report(args: z.infer<typeof CapabilitiesSchema>) { const client = getClient(args); let output = `# 📋 Reporte de Calidad del Modelo\n\n`; output += `*Generando reporte completo...*\n\n`; // 1. Benchmark básico const benchmarkPrompts = [ "Explica qué es la inteligencia artificial en una oración.", "¿Cuánto es 25 * 4?", "Traduce 'Hello World' al español.", ]; const benchmark = await client.runBenchmark(benchmarkPrompts, { model: args.model, maxTokens: 100, }); output += `## 📊 Benchmark de Rendimiento\n\n`; output += `- Latencia promedio: **${benchmark.summary.avgLatencyMs.toFixed(0)} ms**\n`; output += `- Velocidad: **${benchmark.summary.avgTokensPerSecond.toFixed(2)} tokens/s**\n`; output += `- Tokens generados: ${benchmark.summary.totalTokensGenerated}\n\n`; // 2. Coherencia const coherence = await client.evaluateCoherence( "¿Cuál es el sentido de la vida?", { model: args.model, runs: 3, temperature: 0.7 } ); output += `## 🎯 Coherencia\n\n`; output += `- Consistencia: **${(coherence.consistency * 100).toFixed(1)}%**\n`; output += `- Longitud promedio de respuesta: ${coherence.avgLength.toFixed(0)} chars\n\n`; // 3. Capacidades const capabilities = await client.testCapabilities({ model: args.model }); output += `## 🧠 Capacidades\n\n`; output += `| Área | Latencia | Velocidad |\n`; output += `|------|----------|----------|\n`; const areas = ["reasoning", "coding", "creative", "factual", "instruction"] as const; for (const area of areas) { const r = capabilities[area]; output += `| ${area} | ${r.latencyMs}ms | ${r.tokensPerSecond.toFixed(1)} tok/s |\n`; } output += `\n## 📈 Puntuación General\n\n`; const avgSpeed = benchmark.summary.avgTokensPerSecond; const speedScore = Math.min(100, avgSpeed * 2); const coherenceScore = coherence.consistency * 100; const overallScore = (speedScore + coherenceScore) / 2; output += `- Velocidad: ${speedScore.toFixed(0)}/100\n`; output += `- Coherencia: ${coherenceScore.toFixed(0)}/100\n`; output += `- **Puntuación Total: ${overallScore.toFixed(0)}/100**\n`; return { content: [{ type: "text" as const, text: output }] }; },
- src/tools.ts:182-194 (schema)MCP tool schema definition for 'llm_quality_report', including name, description, and input schema (model ID and optional connection properties). Part of the exported 'tools' array.{ name: "llm_quality_report", description: "Genera un reporte completo de calidad del modelo incluyendo benchmark, coherencia y capacidades", inputSchema: { type: "object" as const, properties: { ...connectionProperties, model: { type: "string", description: "ID del modelo a evaluar" }, }, required: [], }, }, ];
- src/index.ts:42-44 (registration)Registers the list tools handler which returns the array of tool definitions including llm_quality_report.server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools }; });
- src/index.ts:76-77 (registration)Dispatch case in the CallToolRequestHandler that routes execution to the llm_quality_report handler.case "llm_quality_report": return await toolHandlers.llm_quality_report(args as any);
- src/tools.ts:47-49 (schema)Zod schema used for type inference in the llm_quality_report handler arguments, matching the tool's input schema.export const CapabilitiesSchema = ConnectionConfigSchema.extend({ model: z.string().optional().describe("ID del modelo a usar"), });