load_model
Load a pretrained model with Unsloth optimizations for faster training and reduced memory usage, supporting quantization and gradient checkpointing.
Instructions
Load a pretrained model with Unsloth optimizations
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model_name | Yes | Name of the model to load (e.g., "unsloth/Llama-3.2-1B") | |
| max_seq_length | No | Maximum sequence length for the model | |
| load_in_4bit | No | Whether to load the model in 4-bit quantization | |
| use_gradient_checkpointing | No | Whether to use gradient checkpointing to save memory |
Implementation Reference
- src/index.ts:306-361 (handler)The handler function for the 'load_model' tool. It destructures input arguments, builds a Python script that uses Unsloth's FastLanguageModel.from_pretrained to load the model with specified options, executes the script, parses the JSON output containing model info, and returns a success response or throws an error.case 'load_model': { const { model_name, max_seq_length = 2048, load_in_4bit = true, use_gradient_checkpointing = true } = args as { model_name: string; max_seq_length?: number; load_in_4bit?: boolean; use_gradient_checkpointing?: boolean; }; const script = ` import json try: from unsloth import FastLanguageModel # Load the model model, tokenizer = FastLanguageModel.from_pretrained( model_name="${model_name}", max_seq_length=${max_seq_length}, load_in_4bit=${load_in_4bit ? 'True' : 'False'}, use_gradient_checkpointing=${use_gradient_checkpointing ? '"unsloth"' : 'False'} ) # Get model info model_info = { "model_name": "${model_name}", "max_seq_length": ${max_seq_length}, "load_in_4bit": ${load_in_4bit}, "use_gradient_checkpointing": ${use_gradient_checkpointing}, "vocab_size": tokenizer.vocab_size, "model_type": model.config.model_type, "success": True } print(json.dumps(model_info)) except Exception as e: print(json.dumps({"error": str(e), "success": False})) `; const result = await this.executeUnslothScript(script); try { const modelInfo = JSON.parse(result); if (!modelInfo.success) { throw new Error(modelInfo.error); } return { content: [ { type: 'text', text: `Successfully loaded model: ${model_name}\n\n${JSON.stringify(modelInfo, null, 2)}`, }, ], }; } catch (error: any) { throw new Error(`Error loading model: ${error.message}`); } }
- src/index.ts:89-110 (schema)Input schema for the 'load_model' tool, defining the expected parameters: model_name (required string), optional max_seq_length (number), load_in_4bit (boolean), use_gradient_checkpointing (boolean).inputSchema: { type: 'object', properties: { model_name: { type: 'string', description: 'Name of the model to load (e.g., "unsloth/Llama-3.2-1B")', }, max_seq_length: { type: 'number', description: 'Maximum sequence length for the model', }, load_in_4bit: { type: 'boolean', description: 'Whether to load the model in 4-bit quantization', }, use_gradient_checkpointing: { type: 'boolean', description: 'Whether to use gradient checkpointing to save memory', }, }, required: ['model_name'], },
- src/index.ts:86-111 (registration)Registration of the 'load_model' tool in the ListTools response, including name, description, and inputSchema.{ name: 'load_model', description: 'Load a pretrained model with Unsloth optimizations', inputSchema: { type: 'object', properties: { model_name: { type: 'string', description: 'Name of the model to load (e.g., "unsloth/Llama-3.2-1B")', }, max_seq_length: { type: 'number', description: 'Maximum sequence length for the model', }, load_in_4bit: { type: 'boolean', description: 'Whether to load the model in 4-bit quantization', }, use_gradient_checkpointing: { type: 'boolean', description: 'Whether to use gradient checkpointing to save memory', }, }, required: ['model_name'], }, },