load_model
Load a pretrained model with Unsloth optimizations to enable faster training and reduced memory usage. Specify model name, sequence length, 4-bit quantization, and gradient checkpointing for efficient deployment.
Instructions
Load a pretrained model with Unsloth optimizations
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| load_in_4bit | No | Whether to load the model in 4-bit quantization | |
| max_seq_length | No | Maximum sequence length for the model | |
| model_name | Yes | Name of the model to load (e.g., "unsloth/Llama-3.2-1B") | |
| use_gradient_checkpointing | No | Whether to use gradient checkpointing to save memory |
Implementation Reference
- src/index.ts:306-361 (handler)Handler for the 'load_model' tool. Parses arguments, constructs and executes a Python script using Unsloth to load the model with specified parameters, parses the result, and returns success message with model info or error.case 'load_model': { const { model_name, max_seq_length = 2048, load_in_4bit = true, use_gradient_checkpointing = true } = args as { model_name: string; max_seq_length?: number; load_in_4bit?: boolean; use_gradient_checkpointing?: boolean; }; const script = ` import json try: from unsloth import FastLanguageModel # Load the model model, tokenizer = FastLanguageModel.from_pretrained( model_name="${model_name}", max_seq_length=${max_seq_length}, load_in_4bit=${load_in_4bit ? 'True' : 'False'}, use_gradient_checkpointing=${use_gradient_checkpointing ? '"unsloth"' : 'False'} ) # Get model info model_info = { "model_name": "${model_name}", "max_seq_length": ${max_seq_length}, "load_in_4bit": ${load_in_4bit}, "use_gradient_checkpointing": ${use_gradient_checkpointing}, "vocab_size": tokenizer.vocab_size, "model_type": model.config.model_type, "success": True } print(json.dumps(model_info)) except Exception as e: print(json.dumps({"error": str(e), "success": False})) `; const result = await this.executeUnslothScript(script); try { const modelInfo = JSON.parse(result); if (!modelInfo.success) { throw new Error(modelInfo.error); } return { content: [ { type: 'text', text: `Successfully loaded model: ${model_name}\n\n${JSON.stringify(modelInfo, null, 2)}`, }, ], }; } catch (error: any) { throw new Error(`Error loading model: ${error.message}`); } }
- src/index.ts:89-110 (schema)Input schema for the 'load_model' tool defining parameters: model_name (required), max_seq_length, load_in_4bit, use_gradient_checkpointing.inputSchema: { type: 'object', properties: { model_name: { type: 'string', description: 'Name of the model to load (e.g., "unsloth/Llama-3.2-1B")', }, max_seq_length: { type: 'number', description: 'Maximum sequence length for the model', }, load_in_4bit: { type: 'boolean', description: 'Whether to load the model in 4-bit quantization', }, use_gradient_checkpointing: { type: 'boolean', description: 'Whether to use gradient checkpointing to save memory', }, }, required: ['model_name'], },
- src/index.ts:86-111 (registration)Registration of the 'load_model' tool in the tools list returned by ListToolsRequestHandler.{ name: 'load_model', description: 'Load a pretrained model with Unsloth optimizations', inputSchema: { type: 'object', properties: { model_name: { type: 'string', description: 'Name of the model to load (e.g., "unsloth/Llama-3.2-1B")', }, max_seq_length: { type: 'number', description: 'Maximum sequence length for the model', }, load_in_4bit: { type: 'boolean', description: 'Whether to load the model in 4-bit quantization', }, use_gradient_checkpointing: { type: 'boolean', description: 'Whether to use gradient checkpointing to save memory', }, }, required: ['model_name'], }, },