llm_evaluate_coherence
Assess LLM response consistency by running the same prompt multiple times to evaluate model coherence and reliability.
Instructions
Evalúa la coherencia del modelo ejecutando el mismo prompt múltiples veces
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| baseURL | No | URL del servidor OpenAI-compatible (ej: http://localhost:1234/v1, http://localhost:11434/v1) | |
| apiKey | No | API Key (requerida para OpenAI/Azure, opcional para servidores locales) | |
| prompt | Yes | Prompt para evaluar | |
| model | No | ID del modelo | |
| runs | No | Número de ejecuciones (default: 3) | |
| temperature | No | Temperatura (default: 0.7) |
Implementation Reference
- src/tools.ts:334-354 (handler)Main tool handler function that evaluates model coherence by calling LLMClient.evaluateCoherence and formatting the results into a markdown report.async llm_evaluate_coherence(args: z.infer<typeof CoherenceSchema>) { const client = getClient(args); const result = await client.evaluateCoherence(args.prompt, { model: args.model, runs: args.runs, temperature: args.temperature, }); let output = `# 🎯 Evaluación de Coherencia\n\n`; output += `**Prompt:** ${args.prompt}\n\n`; output += `**Métricas:**\n`; output += `- Consistencia: ${(result.consistency * 100).toFixed(1)}%\n`; output += `- Longitud promedio: ${result.avgLength.toFixed(0)} caracteres\n\n`; output += `**Respuestas:**\n\n`; result.responses.forEach((r, i) => { output += `---\n**Respuesta ${i + 1}:**\n${r}\n\n`; }); return { content: [{ type: "text" as const, text: output }] }; },
- src/tools.ts:40-45 (schema)Zod schema for input validation of the llm_evaluate_coherence tool parameters.export const CoherenceSchema = ConnectionConfigSchema.extend({ prompt: z.string().describe("Prompt para evaluar coherencia"), model: z.string().optional().describe("ID del modelo a usar"), runs: z.number().optional().default(3).describe("Número de ejecuciones"), temperature: z.number().optional().default(0.7).describe("Temperatura"), });
- src/tools.ts:137-151 (registration)Tool registration entry in the tools array, including name, description, and input schema for MCP list tools.{ name: "llm_evaluate_coherence", description: "Evalúa la coherencia del modelo ejecutando el mismo prompt múltiples veces", inputSchema: { type: "object" as const, properties: { ...connectionProperties, prompt: { type: "string", description: "Prompt para evaluar" }, model: { type: "string", description: "ID del modelo" }, runs: { type: "number", description: "Número de ejecuciones (default: 3)" }, temperature: { type: "number", description: "Temperatura (default: 0.7)" }, }, required: ["prompt"], }, },
- src/index.ts:67-68 (registration)Dispatch case in the main CallToolRequest handler that routes to the tool handler.case "llm_evaluate_coherence": return await toolHandlers.llm_evaluate_coherence(args as any);
- src/llm-client.ts:265-303 (helper)Core helper method in LLMClient that runs the prompt multiple times, computes response consistency based on length variance, and returns metrics.async evaluateCoherence( prompt: string, options: { model?: string; runs?: number; temperature?: number; } = {} ): Promise<{ responses: string[]; consistency: number; avgLength: number; }> { const runs = options.runs || 3; const responses: string[] = []; for (let i = 0; i < runs; i++) { const result = await this.chat(prompt, { model: options.model, temperature: options.temperature ?? 0.7, }); responses.push(result.response); } // Calcular similitud básica entre respuestas const avgLength = responses.reduce((sum, r) => sum + r.length, 0) / responses.length; // Calcular consistencia basada en longitud similar const lengthVariance = responses.reduce((sum, r) => { return sum + Math.pow(r.length - avgLength, 2); }, 0) / responses.length; const consistency = Math.max(0, 1 - (Math.sqrt(lengthVariance) / avgLength)); return { responses, consistency, avgLength, }; }