Skip to main content
Glama

generate

Run a one-shot text completion on a local Ollama model. Returns full response text, timing, and tokens per second for non-streaming queries.

Instructions

Run a one-shot text completion against a local model (non-streaming). Returns the full response text plus timing and tokens/second.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modelYesModel name (e.g. "llama3.1:8b").
promptYesPrompt text.
systemNoOptional system prompt.
optionsNoOllama sampling/decoding options — e.g. {"temperature": 0.7, "num_predict": 100, "top_p": 0.9}.

Implementation Reference

  • The 'generate' tool handler function. Validates required args (model, prompt), builds request body with optional system/options, calls Ollama /api/generate endpoint, and returns formatted response with timing/token metrics.
    async function generate(args) {
      const badModel = requireString(args, 'model');
      if (badModel) return errorResult(badModel);
      const badPrompt = requireString(args, 'prompt');
      if (badPrompt) return errorResult(badPrompt);
    
      const body = {
        model: args.model,
        prompt: args.prompt,
        stream: false,
      };
      if (args.system && typeof args.system === 'string') body.system = args.system;
      if (args.options && typeof args.options === 'object') body.options = args.options;
    
      const r = await httpRequest('POST', '/api/generate', body);
      if (r.error) return errorResult(r.error);
      const d = r.data || {};
      return textResult({
        model: d.model || args.model,
        response: d.response || '',
        done_reason: d.done_reason || null,
        eval_count: d.eval_count || null,
        eval_duration_ms: d.eval_duration ? Math.round(d.eval_duration / 1e6) : null,
        prompt_eval_count: d.prompt_eval_count || null,
        total_duration_ms: d.total_duration ? Math.round(d.total_duration / 1e6) : null,
        tokens_per_second: d.eval_count && d.eval_duration
          ? Math.round((d.eval_count / (d.eval_duration / 1e9)) * 100) / 100
          : null,
      });
    }
  • Input schema registration for the 'generate' tool. Defines required params (model, prompt) and optional (system, options) with descriptions, used in the tools/list response for MCP discovery.
    {
      name: 'generate',
      description: 'Run a one-shot text completion against a local model (non-streaming). Returns the full response text plus timing and tokens/second.',
      annotations: { title: 'Generate text', readOnlyHint: false, destructiveHint: false, openWorldHint: true },
      inputSchema: {
        type: 'object',
        properties: {
          model: { type: 'string', description: 'Model name (e.g. "llama3.1:8b").' },
          prompt: { type: 'string', description: 'Prompt text.' },
          system: { type: 'string', description: 'Optional system prompt.' },
          options: {
            type: 'object',
            description: 'Ollama sampling/decoding options — e.g. {"temperature": 0.7, "num_predict": 100, "top_p": 0.9}.',
            additionalProperties: true,
          },
        },
        required: ['model', 'prompt'],
        additionalProperties: false,
      },
    },
  • server.js:385-394 (registration)
    HANDLERS map that registers the 'generate' function under the 'generate' key, used by the JSON-RPC dispatch to route tool calls.
    const HANDLERS = {
      ollama_status: ollamaStatus,
      list_models: listModels,
      list_running: listRunning,
      show_model: showModel,
      generate: generate,
      chat: chat,
      pull_model: pullModel,
      delete_model: deleteModel,
    };
  • httpRequest helper utility used by the generate handler to make HTTP POST calls to Ollama's /api/generate endpoint. Handles timeouts, errors, and JSON parsing.
    function httpRequest(method, path, body) {
      return new Promise((resolve) => {
        let url;
        try {
          url = new URL(path, OLLAMA_URL);
        } catch (e) {
          resolve({ error: `invalid URL: ${e.message}` });
          return;
        }
        const lib = url.protocol === 'https:' ? https : http;
        const opts = {
          method,
          hostname: url.hostname,
          port: url.port || (url.protocol === 'https:' ? 443 : 80),
          path: url.pathname + url.search,
          headers: { 'accept': 'application/json' },
        };
        let bodyBuf = null;
        if (body !== undefined) {
          bodyBuf = Buffer.from(JSON.stringify(body), 'utf8');
          opts.headers['content-type'] = 'application/json';
          opts.headers['content-length'] = bodyBuf.length;
        }
        const req = lib.request(opts, (res) => {
          let chunks = Buffer.alloc(0);
          res.on('data', (d) => { chunks = Buffer.concat([chunks, d]); });
          res.on('end', () => {
            const text = chunks.toString('utf8');
            if (res.statusCode >= 400) {
              resolve({ status: res.statusCode, error: `HTTP ${res.statusCode}: ${text.slice(0, 500)}` });
              return;
            }
            // Some endpoints return text/plain (e.g. GET /); try JSON first, fall back to text.
            try { resolve({ status: res.statusCode, data: JSON.parse(text) }); }
            catch (_) { resolve({ status: res.statusCode, data: null, text }); }
          });
        });
        req.setTimeout(REQUEST_TIMEOUT_MS, () => {
          req.destroy(new Error(`request timed out after ${REQUEST_TIMEOUT_MS}ms`));
        });
        req.on('error', (e) => {
          // Give a friendly connection-refused message.
          const msg = /ECONNREFUSED|ENOTFOUND/.test(e.code || e.message)
            ? `cannot reach Ollama at ${OLLAMA_URL} — is the server running? Start it with \`ollama serve\` or open the Ollama app.`
            : e.message;
          resolve({ error: msg });
        });
        if (bodyBuf) req.write(bodyBuf);
        req.end();
      });
    }
  • requireString helper used by generate to validate that 'model' and 'prompt' args are non-empty strings.
    function requireString(args, field) {
      if (typeof args[field] !== 'string' || !args[field].trim()) {
        return `${field} is required (non-empty string)`;
      }
      return null;
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare safety profile (non-read-only, non-destructive, open-world). Description adds that it runs locally and returns timing data, but no additional behavioral constraints beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with front-loaded action and resource, no unnecessary details or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, return value, and modality (non-streaming). Does not explain 'options' object but schema provides that; output schema absent but description compensates.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description does not add meaning beyond schema for parameters; output format hinted but not parameter-specific.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb 'run' and resource 'one-shot text completion (non-streaming)'. Distinguishes from sibling 'chat' which is streaming/multi-turn.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage context (non-streaming vs streaming chat), but does not explicitly state when to avoid or name alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LukeLamb/claude-ollama-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server