Skip to main content
Glama
OtotaO

Unsloth MCP Server

by OtotaO

load_model

Load a pretrained model with Unsloth optimizations for faster training and reduced memory usage, supporting quantization and gradient checkpointing.

Instructions

Load a pretrained model with Unsloth optimizations

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
model_nameYesName of the model to load (e.g., "unsloth/Llama-3.2-1B")
max_seq_lengthNoMaximum sequence length for the model
load_in_4bitNoWhether to load the model in 4-bit quantization
use_gradient_checkpointingNoWhether to use gradient checkpointing to save memory

Implementation Reference

  • The handler function for the 'load_model' tool. It destructures input arguments, builds a Python script that uses Unsloth's FastLanguageModel.from_pretrained to load the model with specified options, executes the script, parses the JSON output containing model info, and returns a success response or throws an error.
              case 'load_model': {
                const { model_name, max_seq_length = 2048, load_in_4bit = true, use_gradient_checkpointing = true } = args as {
                  model_name: string;
                  max_seq_length?: number;
                  load_in_4bit?: boolean;
                  use_gradient_checkpointing?: boolean;
                };
    
                const script = `
    import json
    try:
        from unsloth import FastLanguageModel
        
        # Load the model
        model, tokenizer = FastLanguageModel.from_pretrained(
            model_name="${model_name}",
            max_seq_length=${max_seq_length},
            load_in_4bit=${load_in_4bit ? 'True' : 'False'},
            use_gradient_checkpointing=${use_gradient_checkpointing ? '"unsloth"' : 'False'}
        )
        
        # Get model info
        model_info = {
            "model_name": "${model_name}",
            "max_seq_length": ${max_seq_length},
            "load_in_4bit": ${load_in_4bit},
            "use_gradient_checkpointing": ${use_gradient_checkpointing},
            "vocab_size": tokenizer.vocab_size,
            "model_type": model.config.model_type,
            "success": True
        }
        
        print(json.dumps(model_info))
    except Exception as e:
        print(json.dumps({"error": str(e), "success": False}))
    `;
                const result = await this.executeUnslothScript(script);
                
                try {
                  const modelInfo = JSON.parse(result);
                  if (!modelInfo.success) {
                    throw new Error(modelInfo.error);
                  }
                  
                  return {
                    content: [
                      {
                        type: 'text',
                        text: `Successfully loaded model: ${model_name}\n\n${JSON.stringify(modelInfo, null, 2)}`,
                      },
                    ],
                  };
                } catch (error: any) {
                  throw new Error(`Error loading model: ${error.message}`);
                }
              }
  • Input schema for the 'load_model' tool, defining the expected parameters: model_name (required string), optional max_seq_length (number), load_in_4bit (boolean), use_gradient_checkpointing (boolean).
    inputSchema: {
      type: 'object',
      properties: {
        model_name: {
          type: 'string',
          description: 'Name of the model to load (e.g., "unsloth/Llama-3.2-1B")',
        },
        max_seq_length: {
          type: 'number',
          description: 'Maximum sequence length for the model',
        },
        load_in_4bit: {
          type: 'boolean',
          description: 'Whether to load the model in 4-bit quantization',
        },
        use_gradient_checkpointing: {
          type: 'boolean',
          description: 'Whether to use gradient checkpointing to save memory',
        },
      },
      required: ['model_name'],
    },
  • src/index.ts:86-111 (registration)
    Registration of the 'load_model' tool in the ListTools response, including name, description, and inputSchema.
    {
      name: 'load_model',
      description: 'Load a pretrained model with Unsloth optimizations',
      inputSchema: {
        type: 'object',
        properties: {
          model_name: {
            type: 'string',
            description: 'Name of the model to load (e.g., "unsloth/Llama-3.2-1B")',
          },
          max_seq_length: {
            type: 'number',
            description: 'Maximum sequence length for the model',
          },
          load_in_4bit: {
            type: 'boolean',
            description: 'Whether to load the model in 4-bit quantization',
          },
          use_gradient_checkpointing: {
            type: 'boolean',
            description: 'Whether to use gradient checkpointing to save memory',
          },
        },
        required: ['model_name'],
      },
    },

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/OtotaO/unsloth-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server