chat

Run non-streaming chat completions with message history using local models. Returns the assistant's reply and timing.

Instructions

Run a chat completion against a local model with message history (non-streaming). Returns the assistant's reply plus timing.

Input Schema

TableJSON Schema

Name	Required	Description
`model`	Yes	Model name.
`messages`	Yes	Chat history. Each item: {role: "system"\|"user"\|"assistant", content: string}.
`options`	No	Ollama sampling/decoding options.

Implementation Reference

server.js:211-245 (handler)

The 'chat' tool handler function. Validates inputs (model string, messages array with role/content), sends a POST /api/chat to Ollama (non-streaming), and returns the assistant's reply with timing metadata (eval_count, eval_duration_ms, tokens_per_second, etc.).

async function chat(args) {
  const badModel = requireString(args, 'model');
  if (badModel) return errorResult(badModel);
  if (!Array.isArray(args.messages) || !args.messages.length) {
    return errorResult('messages is required (non-empty array of {role, content} objects)');
  }
  for (const m of args.messages) {
    if (!m || typeof m !== 'object' || typeof m.role !== 'string' || typeof m.content !== 'string') {
      return errorResult('each message must be {role: "system"|"user"|"assistant", content: string}');
    }
  }

  const body = {
    model: args.model,
    messages: args.messages,
    stream: false,
  };
  if (args.options && typeof args.options === 'object') body.options = args.options;

  const r = await httpRequest('POST', '/api/chat', body);
  if (r.error) return errorResult(r.error);
  const d = r.data || {};
  return textResult({
    model: d.model || args.model,
    message: d.message || null,
    done_reason: d.done_reason || null,
    eval_count: d.eval_count || null,
    eval_duration_ms: d.eval_duration ? Math.round(d.eval_duration / 1e6) : null,
    prompt_eval_count: d.prompt_eval_count || null,
    total_duration_ms: d.total_duration ? Math.round(d.total_duration / 1e6) : null,
    tokens_per_second: d.eval_count && d.eval_duration
      ? Math.round((d.eval_count / (d.eval_duration / 1e9)) * 100) / 100
      : null,
  });
}

server.js:327-355 (schema)

The input schema for the 'chat' tool registration. Defines required parameters: 'model' (string), 'messages' (array of {role: enum[system|user|assistant], content: string}), and optional 'options' (object for sampling/decoding settings).

{
  name: 'chat',
  description: 'Run a chat completion against a local model with message history (non-streaming). Returns the assistant\'s reply plus timing.',
  annotations: { title: 'Chat completion', readOnlyHint: false, destructiveHint: false, openWorldHint: true },
  inputSchema: {
    type: 'object',
    properties: {
      model: { type: 'string', description: 'Model name.' },
      messages: {
        type: 'array',
        description: 'Chat history. Each item: {role: "system"|"user"|"assistant", content: string}.',
        items: {
          type: 'object',
          properties: {
            role: { type: 'string', enum: ['system', 'user', 'assistant'] },
            content: { type: 'string' },
          },
          required: ['role', 'content'],
        },
      },
      options: {
        type: 'object',
        description: 'Ollama sampling/decoding options.',
        additionalProperties: true,
      },
    },
    required: ['model', 'messages'],
    additionalProperties: false,
  },

server.js:385-394 (registration)

The HANDLERS map that registers the 'chat' function under the key 'chat', enabling dispatch from the JSON-RPC 'tools/call' handler.

const HANDLERS = {
  ollama_status: ollamaStatus,
  list_models: listModels,
  list_running: listRunning,
  show_model: showModel,
  generate: generate,
  chat: chat,
  pull_model: pullModel,
  delete_model: deleteModel,
};

server.js:275-383 (registration)

The TOOLS array containing all tool definitions exposed via 'tools/list'. Entry at index 5 (lines 327-356) defines the 'chat' tool with name, description, annotations, and inputSchema.

const TOOLS = [
  {
    name: 'ollama_status',
    description: 'Health check: whether the Ollama server is reachable and its version. Use this as a precondition before other tools if you\'re unsure whether Ollama is running.',
    annotations: { title: 'Ollama server status', readOnlyHint: true, destructiveHint: false, openWorldHint: false },
    inputSchema: { type: 'object', properties: {}, additionalProperties: false },
  },
  {
    name: 'list_models',
    description: 'List locally-installed models: name, size in bytes, digest, modified timestamp, family (e.g. llama), parameter size (e.g. 8.0B), and quantization level (e.g. Q4_K_M).',
    annotations: { title: 'List installed models', readOnlyHint: true, destructiveHint: false, openWorldHint: false },
    inputSchema: { type: 'object', properties: {}, additionalProperties: false },
  },
  {
    name: 'list_running',
    description: 'List models currently loaded into VRAM with their size, VRAM footprint, and expiry timestamp. Empty list means Ollama is idle.',
    annotations: { title: 'List running models', readOnlyHint: true, destructiveHint: false, openWorldHint: false },
    inputSchema: { type: 'object', properties: {}, additionalProperties: false },
  },
  {
    name: 'show_model',
    description: 'Show detailed information for a specific model: modelfile excerpt, parameters, template, capabilities, architecture details, quantization level.',
    annotations: { title: 'Show model details', readOnlyHint: true, destructiveHint: false, openWorldHint: false },
    inputSchema: {
      type: 'object',
      properties: {
        name: { type: 'string', description: 'Model name (e.g. "llama3.1:8b" or "forge:b6c1").' },
      },
      required: ['name'],
      additionalProperties: false,
    },
  },
  {
    name: 'generate',
    description: 'Run a one-shot text completion against a local model (non-streaming). Returns the full response text plus timing and tokens/second.',
    annotations: { title: 'Generate text', readOnlyHint: false, destructiveHint: false, openWorldHint: true },
    inputSchema: {
      type: 'object',
      properties: {
        model: { type: 'string', description: 'Model name (e.g. "llama3.1:8b").' },
        prompt: { type: 'string', description: 'Prompt text.' },
        system: { type: 'string', description: 'Optional system prompt.' },
        options: {
          type: 'object',
          description: 'Ollama sampling/decoding options — e.g. {"temperature": 0.7, "num_predict": 100, "top_p": 0.9}.',
          additionalProperties: true,
        },
      },
      required: ['model', 'prompt'],
      additionalProperties: false,
    },
  },
  {
    name: 'chat',
    description: 'Run a chat completion against a local model with message history (non-streaming). Returns the assistant\'s reply plus timing.',
    annotations: { title: 'Chat completion', readOnlyHint: false, destructiveHint: false, openWorldHint: true },
    inputSchema: {
      type: 'object',
      properties: {
        model: { type: 'string', description: 'Model name.' },
        messages: {
          type: 'array',
          description: 'Chat history. Each item: {role: "system"|"user"|"assistant", content: string}.',
          items: {
            type: 'object',
            properties: {
              role: { type: 'string', enum: ['system', 'user', 'assistant'] },
              content: { type: 'string' },
            },
            required: ['role', 'content'],
          },
        },
        options: {
          type: 'object',
          description: 'Ollama sampling/decoding options.',
          additionalProperties: true,
        },
      },
      required: ['model', 'messages'],
      additionalProperties: false,
    },
  },
  {
    name: 'pull_model',
    description: 'Download a model from the Ollama registry. Blocks until complete — can take a long time for multi-GB models. For very large pulls, prefer `ollama pull` in a terminal where you can watch progress.',
    annotations: { title: 'Pull model', readOnlyHint: false, destructiveHint: false, openWorldHint: true },
    inputSchema: {
      type: 'object',
      properties: {
        name: { type: 'string', description: 'Model name to pull (e.g. "llama3.1:8b").' },
      },
      required: ['name'],
      additionalProperties: false,
    },
  },
  {
    name: 'delete_model',
    description: 'Delete a locally-installed model. Does not affect the remote registry copy. Free the disk space of a model you no longer need.',
    annotations: { title: 'Delete model', readOnlyHint: false, destructiveHint: true, openWorldHint: false },
    inputSchema: {
      type: 'object',
      properties: {
        name: { type: 'string', description: 'Model name to delete.' },
      },
      required: ['name'],
      additionalProperties: false,
    },
  },
];

server.js:109-114 (helper)

The requireString helper used by the chat handler to validate that the 'model' argument is a non-empty string.

function requireString(args, field) {
  if (typeof args[field] !== 'string' || !args[field].trim()) {
    return `${field} is required (non-empty string)`;
  }
  return null;
}

claude-ollama-mcp

chat

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API