Skip to main content
Glama
NightTrek

Ollama MCP Server

by NightTrek

chat_completion

Generate AI responses using local Ollama models through an OpenAI-compatible API for chat-based applications.

Instructions

OpenAI-compatible chat completion API

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modelYesName of the Ollama model to use
messagesYesArray of messages in the conversation
temperatureNoSampling temperature (0-2)
timeoutNoTimeout in milliseconds (default: 60000)

Implementation Reference

  • The handler function that implements the core logic of the chat_completion tool. Converts messages to a prompt, calls Ollama generate API, and returns formatted OpenAI-compatible response.
    private async handleChatCompletion(args: any) {
      try {
        // Convert chat messages to a single prompt
        const prompt = args.messages
          .map((msg: any) => {
            switch (msg.role) {
              case 'system':
                return `System: ${msg.content}\n`;
              case 'user':
                return `User: ${msg.content}\n`;
              case 'assistant':
                return `Assistant: ${msg.content}\n`;
              default:
                return '';
            }
          })
          .join('');
    
        // Make request to Ollama API with configurable timeout and raw mode
        const response = await axios.post<OllamaGenerateResponse>(
          `${OLLAMA_HOST}/api/generate`,
          {
            model: args.model,
            prompt,
            stream: false,
            temperature: args.temperature,
            raw: true, // Add raw mode for more direct responses
          },
          {
            timeout: args.timeout || DEFAULT_TIMEOUT,
          }
        );
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                id: 'chatcmpl-' + Date.now(),
                object: 'chat.completion',
                created: Math.floor(Date.now() / 1000),
                model: args.model,
                choices: [
                  {
                    index: 0,
                    message: {
                      role: 'assistant',
                      content: response.data.response,
                    },
                    finish_reason: 'stop',
                  },
                ],
              }, null, 2),
            },
          ],
        };
      } catch (error) {
        if (axios.isAxiosError(error)) {
          throw new McpError(
            ErrorCode.InternalError,
            `Ollama API error: ${error.response?.data?.error || error.message}`
          );
        }
        throw new McpError(ErrorCode.InternalError, `Unexpected error: ${formatError(error)}`);
      }
    }
  • Input schema definition for the chat_completion tool, specifying parameters like model, messages, temperature, and timeout.
    inputSchema: {
      type: 'object',
      properties: {
        model: {
          type: 'string',
          description: 'Name of the Ollama model to use',
        },
        messages: {
          type: 'array',
          items: {
            type: 'object',
            properties: {
              role: {
                type: 'string',
                enum: ['system', 'user', 'assistant'],
              },
              content: {
                type: 'string',
              },
            },
            required: ['role', 'content'],
          },
          description: 'Array of messages in the conversation',
        },
        temperature: {
          type: 'number',
          description: 'Sampling temperature (0-2)',
          minimum: 0,
          maximum: 2,
        },
        timeout: {
          type: 'number',
          description: 'Timeout in milliseconds (default: 60000)',
          minimum: 1000,
        },
      },
      required: ['model', 'messages'],
      additionalProperties: false,
    },
  • src/index.ts:207-249 (registration)
    Registration of the chat_completion tool in the ListTools response, including name, description, and input schema.
    {
      name: 'chat_completion',
      description: 'OpenAI-compatible chat completion API',
      inputSchema: {
        type: 'object',
        properties: {
          model: {
            type: 'string',
            description: 'Name of the Ollama model to use',
          },
          messages: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                role: {
                  type: 'string',
                  enum: ['system', 'user', 'assistant'],
                },
                content: {
                  type: 'string',
                },
              },
              required: ['role', 'content'],
            },
            description: 'Array of messages in the conversation',
          },
          temperature: {
            type: 'number',
            description: 'Sampling temperature (0-2)',
            minimum: 0,
            maximum: 2,
          },
          timeout: {
            type: 'number',
            description: 'Timeout in milliseconds (default: 60000)',
            minimum: 1000,
          },
        },
        required: ['model', 'messages'],
        additionalProperties: false,
      },
    },
  • src/index.ts:274-275 (registration)
    Dispatch in CallToolRequestHandler switch statement that routes chat_completion calls to the handler function.
    case 'chat_completion':
      return await this.handleChatCompletion(request.params.arguments);

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/NightTrek/Ollama-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server