Skip to main content
Glama

llm_chat

Send prompts to OpenAI-compatible LLM APIs and receive responses with performance metrics like latency and token rates for testing and benchmarking.

Instructions

Envía un prompt al modelo y recibe una respuesta con métricas de rendimiento (latencia, tokens/s)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
baseURLNoURL del servidor OpenAI-compatible (ej: http://localhost:1234/v1, http://localhost:11434/v1)
apiKeyNoAPI Key (requerida para OpenAI/Azure, opcional para servidores locales)
promptYesEl prompt a enviar al modelo
modelNoID del modelo (opcional)
maxTokensNoMáximo de tokens a generar (default: 512)
temperatureNoTemperatura 0-2 (default: 0.7)
topPNoTop P para nucleus sampling (0-1)
topKNoTop K para sampling
repeatPenaltyNoPenalización por repetición
presencePenaltyNoPenalización por presencia (-2 a 2)
frequencyPenaltyNoPenalización por frecuencia (-2 a 2)
stopNoSecuencias de parada
systemPromptNoPrompt de sistema opcional

Implementation Reference

  • Main handler function for llm_chat tool. Validates input with ChatSchema, creates LLM client, calls chat on the client, formats the benchmark result, and returns the response.
    async llm_chat(args: z.infer<typeof ChatSchema>) {
      const client = getClient(args);
      const result = await client.chat(args.prompt, {
        model: args.model,
        maxTokens: args.maxTokens,
        temperature: args.temperature,
        topP: args.topP,
        topK: args.topK,
        repeatPenalty: args.repeatPenalty,
        presencePenalty: args.presencePenalty,
        frequencyPenalty: args.frequencyPenalty,
        stop: args.stop,
        systemPrompt: args.systemPrompt,
      });
    
      return {
        content: [
          {
            type: "text" as const,
            text: formatBenchmarkResult(result),
          },
        ],
      };
    },
  • Zod schema for validating inputs to the llm_chat handler.
    export const ChatSchema = ConnectionConfigSchema.extend({
      prompt: z.string().describe("El prompt a enviar al modelo"),
      model: z.string().optional().describe("ID del modelo a usar (opcional, usa el cargado por defecto)"),
      maxTokens: z.number().optional().default(512).describe("Máximo de tokens a generar"),
      temperature: z.number().optional().default(0.7).describe("Temperatura (0-2)"),
      topP: z.number().optional().describe("Top P para nucleus sampling (0-1)"),
      topK: z.number().optional().describe("Top K para sampling"),
      repeatPenalty: z.number().optional().describe("Penalización por repetición"),
      presencePenalty: z.number().optional().describe("Penalización por presencia (-2 a 2)"),
      frequencyPenalty: z.number().optional().describe("Penalización por frecuencia (-2 a 2)"),
      stop: z.array(z.string()).optional().describe("Secuencias de parada"),
      systemPrompt: z.string().optional().describe("Prompt de sistema opcional"),
    });
  • src/tools.ts:94-115 (registration)
    MCP tool registration entry for llm_chat, including name, description, and JSON input schema.
    {
      name: "llm_chat",
      description: "Envía un prompt al modelo y recibe una respuesta con métricas de rendimiento (latencia, tokens/s)",
      inputSchema: {
        type: "object" as const,
        properties: {
          ...connectionProperties,
          prompt: { type: "string", description: "El prompt a enviar al modelo" },
          model: { type: "string", description: "ID del modelo (opcional)" },
          maxTokens: { type: "number", description: "Máximo de tokens a generar (default: 512)" },
          temperature: { type: "number", description: "Temperatura 0-2 (default: 0.7)" },
          topP: { type: "number", description: "Top P para nucleus sampling (0-1)" },
          topK: { type: "number", description: "Top K para sampling" },
          repeatPenalty: { type: "number", description: "Penalización por repetición" },
          presencePenalty: { type: "number", description: "Penalización por presencia (-2 a 2)" },
          frequencyPenalty: { type: "number", description: "Penalización por frecuencia (-2 a 2)" },
          stop: { type: "array", items: { type: "string" }, description: "Secuencias de parada" },
          systemPrompt: { type: "string", description: "Prompt de sistema opcional" },
        },
        required: ["prompt"],
      },
    },
  • src/index.ts:42-44 (registration)
    MCP server handler for listing tools, which returns the tools array including llm_chat.
    server.setRequestHandler(ListToolsRequestSchema, async () => {
      return { tools };
    });
  • Dispatch in the main CallToolRequestSchema handler to invoke the llm_chat tool handler.
    case "llm_chat":
      return await toolHandlers.llm_chat(args as any);
  • Helper function used by llm_chat to format the benchmark result into a markdown table.
    function formatBenchmarkResult(result: BenchmarkResult): string {
      return `## 💬 Respuesta del Modelo
    
    **Modelo:** ${result.model}
    
    **Respuesta:**
    ${result.response}
    
    ---
    
    ### 📊 Métricas
    | Métrica | Valor |
    |---------|-------|
    | Latencia | ${result.latencyMs} ms |
    | Tokens prompt | ${result.promptTokens} |
    | Tokens respuesta | ${result.completionTokens} |
    | Total tokens | ${result.totalTokens} |
    | Velocidad | ${result.tokensPerSecond.toFixed(2)} tokens/s |
    `;
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions receiving performance metrics (latency, tokens/s), it doesn't describe other critical behaviors: whether this is a read-only or mutating operation, authentication requirements (beyond the apiKey parameter), rate limits, error handling, or what the response structure looks like. For a tool with 13 parameters and no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that clearly states the core functionality and output. It's appropriately sized and front-loaded with the essential action and result, with zero wasted words or redundant information. Every part of the sentence earns its place by conveying purpose and output metrics.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (13 parameters, no annotations, no output schema, and multiple sibling tools), the description is incomplete. It adequately states the purpose but lacks usage guidelines, behavioral details (like mutation status or error handling), and doesn't address how it differs from siblings. Without an output schema, it should ideally describe the response format beyond just mentioning metrics, but it doesn't. This leaves significant gaps for an AI agent to understand the tool fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no parameter-specific information beyond what's already in the schema. With 100% schema description coverage, all 13 parameters are documented in the input schema (e.g., baseURL, apiKey, prompt, model, maxTokens, etc.). The description doesn't explain how parameters interact or provide additional context about their usage, so it meets the baseline of 3 where the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Envía un prompt al modelo y recibe una respuesta con métricas de rendimiento (latencia, tokens/s)' which translates to 'Sends a prompt to the model and receives a response with performance metrics (latency, tokens/s)'. This specifies the verb (send prompt), resource (model), and output (response with metrics). However, it doesn't explicitly differentiate from sibling tools like llm_benchmark or llm_test_capabilities that might also involve model interaction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With sibling tools like llm_benchmark, llm_compare_models, and llm_test_capabilities, there's no indication of whether this is for general chat, performance testing, or other specific contexts. The agent must infer usage from the name and description alone without explicit direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ramgeart/llm-mcp-bridge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server