Skip to main content
Glama
OtotaO
by OtotaO

load_model

Load a pretrained model with Unsloth optimizations to enable faster training and reduced memory usage. Specify model name, sequence length, 4-bit quantization, and gradient checkpointing for efficient deployment.

Instructions

Load a pretrained model with Unsloth optimizations

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
load_in_4bitNoWhether to load the model in 4-bit quantization
max_seq_lengthNoMaximum sequence length for the model
model_nameYesName of the model to load (e.g., "unsloth/Llama-3.2-1B")
use_gradient_checkpointingNoWhether to use gradient checkpointing to save memory

Implementation Reference

  • Handler for the 'load_model' tool. Parses arguments, constructs and executes a Python script using Unsloth to load the model with specified parameters, parses the result, and returns success message with model info or error.
    case 'load_model': { const { model_name, max_seq_length = 2048, load_in_4bit = true, use_gradient_checkpointing = true } = args as { model_name: string; max_seq_length?: number; load_in_4bit?: boolean; use_gradient_checkpointing?: boolean; }; const script = ` import json try: from unsloth import FastLanguageModel # Load the model model, tokenizer = FastLanguageModel.from_pretrained( model_name="${model_name}", max_seq_length=${max_seq_length}, load_in_4bit=${load_in_4bit ? 'True' : 'False'}, use_gradient_checkpointing=${use_gradient_checkpointing ? '"unsloth"' : 'False'} ) # Get model info model_info = { "model_name": "${model_name}", "max_seq_length": ${max_seq_length}, "load_in_4bit": ${load_in_4bit}, "use_gradient_checkpointing": ${use_gradient_checkpointing}, "vocab_size": tokenizer.vocab_size, "model_type": model.config.model_type, "success": True } print(json.dumps(model_info)) except Exception as e: print(json.dumps({"error": str(e), "success": False})) `; const result = await this.executeUnslothScript(script); try { const modelInfo = JSON.parse(result); if (!modelInfo.success) { throw new Error(modelInfo.error); } return { content: [ { type: 'text', text: `Successfully loaded model: ${model_name}\n\n${JSON.stringify(modelInfo, null, 2)}`, }, ], }; } catch (error: any) { throw new Error(`Error loading model: ${error.message}`); } }
  • Input schema for the 'load_model' tool defining parameters: model_name (required), max_seq_length, load_in_4bit, use_gradient_checkpointing.
    inputSchema: { type: 'object', properties: { model_name: { type: 'string', description: 'Name of the model to load (e.g., "unsloth/Llama-3.2-1B")', }, max_seq_length: { type: 'number', description: 'Maximum sequence length for the model', }, load_in_4bit: { type: 'boolean', description: 'Whether to load the model in 4-bit quantization', }, use_gradient_checkpointing: { type: 'boolean', description: 'Whether to use gradient checkpointing to save memory', }, }, required: ['model_name'], },
  • src/index.ts:86-111 (registration)
    Registration of the 'load_model' tool in the tools list returned by ListToolsRequestHandler.
    { name: 'load_model', description: 'Load a pretrained model with Unsloth optimizations', inputSchema: { type: 'object', properties: { model_name: { type: 'string', description: 'Name of the model to load (e.g., "unsloth/Llama-3.2-1B")', }, max_seq_length: { type: 'number', description: 'Maximum sequence length for the model', }, load_in_4bit: { type: 'boolean', description: 'Whether to load the model in 4-bit quantization', }, use_gradient_checkpointing: { type: 'boolean', description: 'Whether to use gradient checkpointing to save memory', }, }, required: ['model_name'], }, },

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/OtotaO/unsloth-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server