Tuning Engines

retry_job

Resume a failed fine-tuning job from its last checkpoint to save GPU time. Each retry is billed separately. Present cost estimate and obtain user approval before retry.

Instructions

Retry a failed fine-tuning job from its last checkpoint. Creates a new job that resumes training where the failed one stopped, saving GPU time. Each retry is billed separately.

IMPORTANT: This tool fetches a cost estimate and includes it in the response. You MUST show the estimate to the user and get their explicit approval before considering the retry confirmed. The retry is submitted automatically (the server validates balance), but always present the cost to the user.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`job_id`	Yes	ID of the failed job to retry
`github_token`	No	GitHub Personal Access Token (required if original job used a private repo). Not stored — only sent to the training backend.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses key behaviors: creates a new job, bills separately, fetches a cost estimate, and requires user approval. Without annotations, the description carries full burden and covers the most critical behavioral aspects, though it omits details about whether the original job remains unchanged.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise paragraphs: first paragraph states purpose and key benefit (saving GPU time); second paragraph has critical usage guidance. Every sentence adds value, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (billing, user approval, retry logic), the description covers the essential flow: retry mechanism, cost estimate, and required approval. No output schema is present, but the response structure (estimate inclusion) is mentioned. Lacks details on error handling or edge cases, but adequate for the main use case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds minimal new parameter-specific meaning beyond what the schema provides (e.g., job_id and github_token are already well-described in the schema). The user approval context is related but does not enhance parameter semantics directly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Retry a failed fine-tuning job from its last checkpoint,' specifying the action (retry), resource (failed fine-tuning job), and distinctive behavior (resume from checkpoint). It distinguishes itself from sibling tools like create_job or cancel_job.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs the agent to show the cost estimate to the user and get explicit approval before proceeding, with 'MUST' emphasis. Provides clear when-to-use context, though it does not explicitly mention when not to use this tool (e.g., if the job is not failed).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cerebrixos-org/tuning-engines-cli'

If you have feedback or need assistance with the MCP directory API, please join our Discord server