llm_chat

llm_chat

Send prompts to OpenAI-compatible LLM APIs and receive responses with performance metrics like latency and token rates for testing and benchmarking.

Instructions

Envía un prompt al modelo y recibe una respuesta con métricas de rendimiento (latencia, tokens/s)

Input Schema

TableJSON Schema

Name	Required	Description
`baseURL`	No	URL del servidor OpenAI-compatible (ej: http://localhost:1234/v1, http://localhost:11434/v1)
`apiKey`	No	API Key (requerida para OpenAI/Azure, opcional para servidores locales)
`prompt`	Yes	El prompt a enviar al modelo
`model`	No	ID del modelo (opcional)
`maxTokens`	No	Máximo de tokens a generar (default: 512)
`temperature`	No	Temperatura 0-2 (default: 0.7)
`topP`	No	Top P para nucleus sampling (0-1)
`topK`	No	Top K para sampling
`repeatPenalty`	No	Penalización por repetición
`presencePenalty`	No	Penalización por presencia (-2 a 2)
`frequencyPenalty`	No	Penalización por frecuencia (-2 a 2)
`stop`	No	Secuencias de parada
`systemPrompt`	No	Prompt de sistema opcional

Implementation Reference

src/tools.ts:281-304 (handler)
Main handler function for llm_chat tool. Validates input with ChatSchema, creates LLM client, calls chat on the client, formats the benchmark result, and returns the response.
async llm_chat(args: z.infer<typeof ChatSchema>) { const client = getClient(args); const result = await client.chat(args.prompt, { model: args.model, maxTokens: args.maxTokens, temperature: args.temperature, topP: args.topP, topK: args.topK, repeatPenalty: args.repeatPenalty, presencePenalty: args.presencePenalty, frequencyPenalty: args.frequencyPenalty, stop: args.stop, systemPrompt: args.systemPrompt, }); return { content: [ { type: "text" as const, text: formatBenchmarkResult(result), }, ], }; },
src/tools.ts:17-29 (schema)
Zod schema for validating inputs to the llm_chat handler.
export const ChatSchema = ConnectionConfigSchema.extend({ prompt: z.string().describe("El prompt a enviar al modelo"), model: z.string().optional().describe("ID del modelo a usar (opcional, usa el cargado por defecto)"), maxTokens: z.number().optional().default(512).describe("Máximo de tokens a generar"), temperature: z.number().optional().default(0.7).describe("Temperatura (0-2)"), topP: z.number().optional().describe("Top P para nucleus sampling (0-1)"), topK: z.number().optional().describe("Top K para sampling"), repeatPenalty: z.number().optional().describe("Penalización por repetición"), presencePenalty: z.number().optional().describe("Penalización por presencia (-2 a 2)"), frequencyPenalty: z.number().optional().describe("Penalización por frecuencia (-2 a 2)"), stop: z.array(z.string()).optional().describe("Secuencias de parada"), systemPrompt: z.string().optional().describe("Prompt de sistema opcional"), });
src/tools.ts:94-115 (registration)
MCP tool registration entry for llm_chat, including name, description, and JSON input schema.
{ name: "llm_chat", description: "Envía un prompt al modelo y recibe una respuesta con métricas de rendimiento (latencia, tokens/s)", inputSchema: { type: "object" as const, properties: { ...connectionProperties, prompt: { type: "string", description: "El prompt a enviar al modelo" }, model: { type: "string", description: "ID del modelo (opcional)" }, maxTokens: { type: "number", description: "Máximo de tokens a generar (default: 512)" }, temperature: { type: "number", description: "Temperatura 0-2 (default: 0.7)" }, topP: { type: "number", description: "Top P para nucleus sampling (0-1)" }, topK: { type: "number", description: "Top K para sampling" }, repeatPenalty: { type: "number", description: "Penalización por repetición" }, presencePenalty: { type: "number", description: "Penalización por presencia (-2 a 2)" }, frequencyPenalty: { type: "number", description: "Penalización por frecuencia (-2 a 2)" }, stop: { type: "array", items: { type: "string" }, description: "Secuencias de parada" }, systemPrompt: { type: "string", description: "Prompt de sistema opcional" }, }, required: ["prompt"], }, },
src/index.ts:42-44 (registration)
MCP server handler for listing tools, which returns the tools array including llm_chat.
server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools }; });
src/index.ts:61-62 (handler)
Dispatch in the main CallToolRequestSchema handler to invoke the llm_chat tool handler.
case "llm_chat": return await toolHandlers.llm_chat(args as any);
src/tools.ts:490-508 (helper)
Helper function used by llm_chat to format the benchmark result into a markdown table.
function formatBenchmarkResult(result: BenchmarkResult): string { return `## 💬 Respuesta del Modelo **Modelo:** ${result.model} **Respuesta:** ${result.response} --- ### 📊 Métricas | Métrica | Valor | |---------|-------| | Latencia | ${result.latencyMs} ms | | Tokens prompt | ${result.promptTokens} | | Tokens respuesta | ${result.completionTokens} | | Total tokens | ${result.totalTokens} | | Velocidad | ${result.tokensPerSecond.toFixed(2)} tokens/s | `;

LLM MCP Bridge

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API