Skip to main content
Glama
NightTrek

Ollama MCP Server

by NightTrek

chat_completion

Generate AI responses using local Ollama models through an OpenAI-compatible API for chat-based applications.

Instructions

OpenAI-compatible chat completion API

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modelYesName of the Ollama model to use
messagesYesArray of messages in the conversation
temperatureNoSampling temperature (0-2)
timeoutNoTimeout in milliseconds (default: 60000)

Implementation Reference

  • The handler function that implements the core logic of the chat_completion tool. Converts messages to a prompt, calls Ollama generate API, and returns formatted OpenAI-compatible response.
    private async handleChatCompletion(args: any) {
      try {
        // Convert chat messages to a single prompt
        const prompt = args.messages
          .map((msg: any) => {
            switch (msg.role) {
              case 'system':
                return `System: ${msg.content}\n`;
              case 'user':
                return `User: ${msg.content}\n`;
              case 'assistant':
                return `Assistant: ${msg.content}\n`;
              default:
                return '';
            }
          })
          .join('');
    
        // Make request to Ollama API with configurable timeout and raw mode
        const response = await axios.post<OllamaGenerateResponse>(
          `${OLLAMA_HOST}/api/generate`,
          {
            model: args.model,
            prompt,
            stream: false,
            temperature: args.temperature,
            raw: true, // Add raw mode for more direct responses
          },
          {
            timeout: args.timeout || DEFAULT_TIMEOUT,
          }
        );
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                id: 'chatcmpl-' + Date.now(),
                object: 'chat.completion',
                created: Math.floor(Date.now() / 1000),
                model: args.model,
                choices: [
                  {
                    index: 0,
                    message: {
                      role: 'assistant',
                      content: response.data.response,
                    },
                    finish_reason: 'stop',
                  },
                ],
              }, null, 2),
            },
          ],
        };
      } catch (error) {
        if (axios.isAxiosError(error)) {
          throw new McpError(
            ErrorCode.InternalError,
            `Ollama API error: ${error.response?.data?.error || error.message}`
          );
        }
        throw new McpError(ErrorCode.InternalError, `Unexpected error: ${formatError(error)}`);
      }
    }
  • Input schema definition for the chat_completion tool, specifying parameters like model, messages, temperature, and timeout.
    inputSchema: {
      type: 'object',
      properties: {
        model: {
          type: 'string',
          description: 'Name of the Ollama model to use',
        },
        messages: {
          type: 'array',
          items: {
            type: 'object',
            properties: {
              role: {
                type: 'string',
                enum: ['system', 'user', 'assistant'],
              },
              content: {
                type: 'string',
              },
            },
            required: ['role', 'content'],
          },
          description: 'Array of messages in the conversation',
        },
        temperature: {
          type: 'number',
          description: 'Sampling temperature (0-2)',
          minimum: 0,
          maximum: 2,
        },
        timeout: {
          type: 'number',
          description: 'Timeout in milliseconds (default: 60000)',
          minimum: 1000,
        },
      },
      required: ['model', 'messages'],
      additionalProperties: false,
    },
  • src/index.ts:207-249 (registration)
    Registration of the chat_completion tool in the ListTools response, including name, description, and input schema.
    {
      name: 'chat_completion',
      description: 'OpenAI-compatible chat completion API',
      inputSchema: {
        type: 'object',
        properties: {
          model: {
            type: 'string',
            description: 'Name of the Ollama model to use',
          },
          messages: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                role: {
                  type: 'string',
                  enum: ['system', 'user', 'assistant'],
                },
                content: {
                  type: 'string',
                },
              },
              required: ['role', 'content'],
            },
            description: 'Array of messages in the conversation',
          },
          temperature: {
            type: 'number',
            description: 'Sampling temperature (0-2)',
            minimum: 0,
            maximum: 2,
          },
          timeout: {
            type: 'number',
            description: 'Timeout in milliseconds (default: 60000)',
            minimum: 1000,
          },
        },
        required: ['model', 'messages'],
        additionalProperties: false,
      },
    },
  • src/index.ts:274-275 (registration)
    Dispatch in CallToolRequestHandler switch statement that routes chat_completion calls to the handler function.
    case 'chat_completion':
      return await this.handleChatCompletion(request.params.arguments);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/NightTrek/Ollama-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server