llm_test_capabilities
Test LLM capabilities across reasoning, coding, creativity, facts, and instruction-following to evaluate model performance and quality.
Instructions
Prueba las capacidades del modelo en diferentes áreas: razonamiento, código, creatividad, hechos, instrucciones
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| baseURL | No | URL del servidor OpenAI-compatible (ej: http://localhost:1234/v1, http://localhost:11434/v1) | |
| apiKey | No | API Key (requerida para OpenAI/Azure, opcional para servidores locales) | |
| model | No | ID del modelo |
Implementation Reference
- src/tools.ts:356-380 (handler)The primary handler function for the 'llm_test_capabilities' tool. It invokes LLMClient.testCapabilities and formats the benchmark results into a structured Markdown report organized by capability categories (reasoning, coding, creative, factual, instruction).async llm_test_capabilities(args: z.infer<typeof CapabilitiesSchema>) { const client = getClient(args); const results = await client.testCapabilities({ model: args.model }); let output = `# 🧠 Test de Capacidades del Modelo\n\n`; const categories = [ { key: "reasoning", name: "Razonamiento", emoji: "🤔" }, { key: "coding", name: "Programación", emoji: "💻" }, { key: "creative", name: "Creatividad", emoji: "🎨" }, { key: "factual", name: "Conocimiento Factual", emoji: "📚" }, { key: "instruction", name: "Seguir Instrucciones", emoji: "📋" }, ]; for (const cat of categories) { const r = results[cat.key as keyof typeof results]; output += `## ${cat.emoji} ${cat.name}\n\n`; output += `**Prompt:** ${r.prompt}\n\n`; output += `**Respuesta:**\n${r.response}\n\n`; output += `*Latencia: ${r.latencyMs}ms | Tokens/s: ${r.tokensPerSecond.toFixed(2)}*\n\n`; output += `---\n\n`; } return { content: [{ type: "text" as const, text: output }] }; },
- src/llm-client.ts:308-334 (helper)Helper method in LLMClient that executes the core capability tests using predefined prompts for reasoning, coding, creativity, factual knowledge, and instruction-following, returning BenchmarkResult objects for each.async testCapabilities( options: { model?: string } = {} ): Promise<{ reasoning: BenchmarkResult; coding: BenchmarkResult; creative: BenchmarkResult; factual: BenchmarkResult; instruction: BenchmarkResult; }> { const tests = { reasoning: "Si todos los gatos tienen bigotes y Fluffy es un gato, ¿tiene Fluffy bigotes? Explica tu razonamiento paso a paso.", coding: "Escribe una función en Python que calcule el factorial de un número de forma recursiva.", creative: "Escribe un haiku sobre la inteligencia artificial.", factual: "¿Cuál es la capital de Francia y cuántos habitantes tiene aproximadamente?", instruction: "Lista 5 consejos para mejorar la productividad en el trabajo. Sé conciso.", }; const results = { reasoning: await this.chat(tests.reasoning, options), coding: await this.chat(tests.coding, options), creative: await this.chat(tests.creative, options), factual: await this.chat(tests.factual, options), instruction: await this.chat(tests.instruction, options), }; return results; }
- src/tools.ts:152-163 (schema)MCP tool definition in the tools array, including name, description, and input schema for connection properties and optional model ID.{ name: "llm_test_capabilities", description: "Prueba las capacidades del modelo en diferentes áreas: razonamiento, código, creatividad, hechos, instrucciones", inputSchema: { type: "object" as const, properties: { ...connectionProperties, model: { type: "string", description: "ID del modelo" }, }, required: [], }, },
- src/index.ts:41-44 (registration)MCP server registration for ListToolsRequest, returning the tools array that includes the llm_test_capabilities tool schema.// Handler para listar herramientas server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools }; });
- src/index.ts:70-71 (registration)Dispatch registration in the CallToolRequest handler switch statement, routing calls to the llm_test_capabilities tool handler.case "llm_test_capabilities": return await toolHandlers.llm_test_capabilities(args as any);