retry_job
Resume a failed fine-tuning job from its last checkpoint to save GPU time. Creates a new job that continues training where the previous one stopped.
Instructions
Retry a failed fine-tuning job from its last checkpoint. Creates a new job that resumes training where the failed one stopped, saving GPU time. Each retry is billed separately.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | ID of the failed job to retry | |
| github_token | No | GitHub Personal Access Token (required if original job used a private repo). Not stored — only sent to the training backend. |
Implementation Reference
- src/client.ts:70-74 (handler)The core handler logic that sends the API request to retry a job.
async retryJob(jobId: string, githubToken?: string): Promise<any> { const body: Record<string, string> = {}; if (githubToken) body.github_token = githubToken; return this.request("POST", `/api/v1/jobs/${jobId}/retry`, Object.keys(body).length ? body : undefined); } - src/mcp.ts:189-208 (registration)Tool registration in the MCP server, including its schema definition.
name: "retry_job", description: "Retry a failed fine-tuning job from its last checkpoint. Creates a new job that resumes training where the failed one stopped, saving GPU time. Each retry is billed separately.", inputSchema: { type: "object" as const, properties: { job_id: { type: "string", description: "ID of the failed job to retry", }, github_token: { type: "string", description: "GitHub Personal Access Token (required if original job used a private repo). Not stored — only sent to the training backend.", }, }, required: ["job_id"], }, }, {