retry_job
Resume a failed fine-tuning job from its last checkpoint to save GPU time. Each retry is billed separately. Present cost estimate and obtain user approval before retry.
Instructions
Retry a failed fine-tuning job from its last checkpoint. Creates a new job that resumes training where the failed one stopped, saving GPU time. Each retry is billed separately.
IMPORTANT: This tool fetches a cost estimate and includes it in the response. You MUST show the estimate to the user and get their explicit approval before considering the retry confirmed. The retry is submitted automatically (the server validates balance), but always present the cost to the user.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | ID of the failed job to retry | |
| github_token | No | GitHub Personal Access Token (required if original job used a private repo). Not stored — only sent to the training backend. |