Skip to main content
Glama

llm_test_capabilities

Test LLM capabilities across reasoning, coding, creativity, facts, and instruction-following to evaluate model performance and quality.

Instructions

Prueba las capacidades del modelo en diferentes áreas: razonamiento, código, creatividad, hechos, instrucciones

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
baseURLNoURL del servidor OpenAI-compatible (ej: http://localhost:1234/v1, http://localhost:11434/v1)
apiKeyNoAPI Key (requerida para OpenAI/Azure, opcional para servidores locales)
modelNoID del modelo

Implementation Reference

  • The primary handler function for the 'llm_test_capabilities' tool. It invokes LLMClient.testCapabilities and formats the benchmark results into a structured Markdown report organized by capability categories (reasoning, coding, creative, factual, instruction).
    async llm_test_capabilities(args: z.infer<typeof CapabilitiesSchema>) {
      const client = getClient(args);
      const results = await client.testCapabilities({ model: args.model });
    
      let output = `# 🧠 Test de Capacidades del Modelo\n\n`;
    
      const categories = [
        { key: "reasoning", name: "Razonamiento", emoji: "🤔" },
        { key: "coding", name: "Programación", emoji: "💻" },
        { key: "creative", name: "Creatividad", emoji: "🎨" },
        { key: "factual", name: "Conocimiento Factual", emoji: "📚" },
        { key: "instruction", name: "Seguir Instrucciones", emoji: "📋" },
      ];
    
      for (const cat of categories) {
        const r = results[cat.key as keyof typeof results];
        output += `## ${cat.emoji} ${cat.name}\n\n`;
        output += `**Prompt:** ${r.prompt}\n\n`;
        output += `**Respuesta:**\n${r.response}\n\n`;
        output += `*Latencia: ${r.latencyMs}ms | Tokens/s: ${r.tokensPerSecond.toFixed(2)}*\n\n`;
        output += `---\n\n`;
      }
    
      return { content: [{ type: "text" as const, text: output }] };
    },
  • Helper method in LLMClient that executes the core capability tests using predefined prompts for reasoning, coding, creativity, factual knowledge, and instruction-following, returning BenchmarkResult objects for each.
    async testCapabilities(
      options: { model?: string } = {}
    ): Promise<{
      reasoning: BenchmarkResult;
      coding: BenchmarkResult;
      creative: BenchmarkResult;
      factual: BenchmarkResult;
      instruction: BenchmarkResult;
    }> {
      const tests = {
        reasoning: "Si todos los gatos tienen bigotes y Fluffy es un gato, ¿tiene Fluffy bigotes? Explica tu razonamiento paso a paso.",
        coding: "Escribe una función en Python que calcule el factorial de un número de forma recursiva.",
        creative: "Escribe un haiku sobre la inteligencia artificial.",
        factual: "¿Cuál es la capital de Francia y cuántos habitantes tiene aproximadamente?",
        instruction: "Lista 5 consejos para mejorar la productividad en el trabajo. Sé conciso.",
      };
    
      const results = {
        reasoning: await this.chat(tests.reasoning, options),
        coding: await this.chat(tests.coding, options),
        creative: await this.chat(tests.creative, options),
        factual: await this.chat(tests.factual, options),
        instruction: await this.chat(tests.instruction, options),
      };
    
      return results;
    }
  • MCP tool definition in the tools array, including name, description, and input schema for connection properties and optional model ID.
    {
      name: "llm_test_capabilities",
      description: "Prueba las capacidades del modelo en diferentes áreas: razonamiento, código, creatividad, hechos, instrucciones",
      inputSchema: {
        type: "object" as const,
        properties: {
          ...connectionProperties,
          model: { type: "string", description: "ID del modelo" },
        },
        required: [],
      },
    },
  • src/index.ts:41-44 (registration)
    MCP server registration for ListToolsRequest, returning the tools array that includes the llm_test_capabilities tool schema.
    // Handler para listar herramientas
    server.setRequestHandler(ListToolsRequestSchema, async () => {
      return { tools };
    });
  • src/index.ts:70-71 (registration)
    Dispatch registration in the CallToolRequest handler switch statement, routing calls to the llm_test_capabilities tool handler.
    case "llm_test_capabilities":
      return await toolHandlers.llm_test_capabilities(args as any);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ramgeart/llm-mcp-bridge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server