Skip to main content
Glama

generate

Run a one-shot text completion against a local Ollama model. Returns the full response, timing, and tokens per second.

Instructions

Run a one-shot text completion against a local model (non-streaming). Returns the full response text plus timing and tokens/second.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modelYesModel name (e.g. "llama3.1:8b").
promptYesPrompt text.
systemNoOptional system prompt.
optionsNoOllama sampling/decoding options — e.g. {"temperature": 0.7, "num_predict": 100, "top_p": 0.9}.

Implementation Reference

  • The async function 'generate' that executes the tool logic. It validates required 'model' and 'prompt' string arguments, constructs a request body (with optional 'system' and 'options' fields), sends a POST to '/api/generate' via httpRequest, and returns a textResult with the model's response plus timing metrics (eval_count, eval_duration_ms, tokens_per_second, etc.).
    async function generate(args) {
      const badModel = requireString(args, 'model');
      if (badModel) return errorResult(badModel);
      const badPrompt = requireString(args, 'prompt');
      if (badPrompt) return errorResult(badPrompt);
    
      const body = {
        model: args.model,
        prompt: args.prompt,
        stream: false,
      };
      if (args.system && typeof args.system === 'string') body.system = args.system;
      if (args.options && typeof args.options === 'object') body.options = args.options;
    
      const r = await httpRequest('POST', '/api/generate', body);
      if (r.error) return errorResult(r.error);
      const d = r.data || {};
      return textResult({
        model: d.model || args.model,
        response: d.response || '',
        done_reason: d.done_reason || null,
        eval_count: d.eval_count || null,
        eval_duration_ms: d.eval_duration ? Math.round(d.eval_duration / 1e6) : null,
        prompt_eval_count: d.prompt_eval_count || null,
        total_duration_ms: d.total_duration ? Math.round(d.total_duration / 1e6) : null,
        tokens_per_second: d.eval_count && d.eval_duration
          ? Math.round((d.eval_count / (d.eval_duration / 1e9)) * 100) / 100
          : null,
      });
    }
  • Input schema and description for the 'generate' tool. Defines required properties: 'model' (string), 'prompt' (string), and optional 'system' (string) and 'options' (object for Ollama sampling parameters).
    {
      name: 'generate',
      description: 'Run a one-shot text completion against a local model (non-streaming). Returns the full response text plus timing and tokens/second.',
      annotations: { title: 'Generate text', readOnlyHint: false, destructiveHint: false, openWorldHint: true },
      inputSchema: {
        type: 'object',
        properties: {
          model: { type: 'string', description: 'Model name (e.g. "llama3.1:8b").' },
          prompt: { type: 'string', description: 'Prompt text.' },
          system: { type: 'string', description: 'Optional system prompt.' },
          options: {
            type: 'object',
            description: 'Ollama sampling/decoding options — e.g. {"temperature": 0.7, "num_predict": 100, "top_p": 0.9}.',
            additionalProperties: true,
          },
        },
        required: ['model', 'prompt'],
        additionalProperties: false,
      },
    },
  • server.js:385-394 (registration)
    The HANDLERS mapping object that registers the 'generate' function (line 390) along with all other tool handlers, used by the JSON-RPC dispatch logic.
    const HANDLERS = {
      ollama_status: ollamaStatus,
      list_models: listModels,
      list_running: listRunning,
      show_model: showModel,
      generate: generate,
      chat: chat,
      pull_model: pullModel,
      delete_model: deleteModel,
    };
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-read-only, non-destructive, and open-world. The description adds behavioral details: non-streaming, returns full text plus timing/tokens/second, which goes beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence front-loads the core purpose (one-shot completion) and key behaviors (non-streaming, returns timing/tokens). Every word adds value, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given full schema coverage, annotations, and no output schema, the description sufficiently covers the tool's purpose, behavior, and return values. No critical information missing for an AI agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 4 parameters. The description adds no additional parameter-specific information beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs a one-shot text completion (verb+resource) and specifies it's non-streaming, distinguishing it from siblings like 'chat' which may be streaming or multi-turn.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for one-shot completions and non-streaming needs, providing clear context. However, it does not explicitly mention alternatives or when not to use it, so it misses a full exclusion guideline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LukeLamb/claude-ollama-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server