create_job
Fine-tune a language model on a GitHub repository to create a custom model that learns code patterns, style, and conventions. Choose between code autocomplete training or bug-fix specialist training based on your needs.
Instructions
Fine-tune an LLM on a GitHub repository using Tuning Engines. This trains a custom model that learns from the code patterns, style, and conventions in the repo. Choose an agent to control the training approach:
AVAILABLE AGENTS:
agent='code_repo' (Cody) — LoRA-based code fine-tuning using QLoRA (4-bit quantized LoRA) via the Axolotl framework. Trains on your repo's code patterns, naming conventions, and project structure to produce a fast, lightweight adapter. Best for: code autocomplete, inline suggestions, tab-complete, code style matching.
agent='sera_code_repo' (SIERA) — Bug-fix specialist using the Open Coding Agents approach from AllenAI. Generates synthetic error-resolution training pairs from your repo, producing a model that understands your codebase's failure patterns and fix conventions. Best for: debugging, error resolution, patch generation, root cause analysis. Supports quality_tier='low' (faster) or quality_tier='high' (deeper analysis, more training data).
SUPPORTED BASE MODELS (by size):
3B: Qwen/Qwen2.5-Coder-3B-Instruct
7B: codellama/CodeLlama-7b-hf, deepseek-ai/deepseek-coder-7b-instruct-v1.5, Qwen/Qwen2.5-Coder-7B-Instruct
13-15B: codellama/CodeLlama-13b-Instruct-hf, bigcode/starcoder2-15b, Qwen/Qwen2.5-Coder-14B-Instruct
32-34B: deepseek-ai/deepseek-coder-33b-instruct, codellama/CodeLlama-34b-Instruct-hf, Qwen/Qwen2.5-Coder-32B-Instruct
70-72B: codellama/CodeLlama-70b-Instruct-hf, meta-llama/Llama-3.1-70B-Instruct, Qwen/Qwen2.5-72B-Instruct
TYPICAL WORKFLOW: estimate_job first to check cost, then create_job, then job_status to monitor progress.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| base_model | No | HuggingFace model ID to fine-tune (e.g. 'Qwen/Qwen2.5-Coder-7B-Instruct'). Required unless base_user_model_id is provided. Use list_supported_models to see all options. | |
| base_user_model_id | No | ID of a previously trained model to fine-tune further (iterative training). The base model is resolved automatically. Use list_models to find IDs. | |
| output_name | Yes | Name for the resulting fine-tuned model (e.g. 'my-project-cody-7b') | |
| repo_url | Yes | GitHub repository URL to train on (e.g. 'https://github.com/org/repo') | |
| branch | No | Git branch to use (default: main) | |
| num_epochs | No | Number of training epochs (more = better quality but higher cost) | |
| max_examples | No | Maximum training examples to extract from the repo (minimum: 2) | |
| agent | No | Training agent to use. 'code_repo' (Cody) = QLoRA-based fine-tuning for code autocomplete and inline suggestions. 'sera_code_repo' (SIERA) = bug-fix specialist using AllenAI's Open Coding Agents approach. Default: 'code_repo'. | |
| quality_tier | No | Quality tier (SIERA agent only). 'low' = faster, fewer synthetic pairs. 'high' = deeper analysis, more training data, better results. Default: 'low'. | |
| s3_output_bucket | No | S3 bucket to export the trained model to. If omitted, model is stored in Tuning Engines cloud storage. | |
| s3_access_key_id | No | AWS access key ID for S3 export | |
| s3_secret_access_key | No | AWS secret access key for S3 export | |
| s3_region | No | AWS region for S3 export (e.g. us-east-1) |
Implementation Reference
- src/client.ts:43-60 (handler)The actual API call to create a job.
async createJob(params: { base_model?: string; output_name: string; repo_url?: string; branch?: string; github_token?: string; num_epochs?: number; max_examples?: number; base_user_model_id?: string; s3_output_bucket?: string; s3_access_key_id?: string; s3_secret_access_key?: string; s3_region?: string; agent?: string; quality_tier?: string; }): Promise<any> { return this.request("POST", "/api/v1/jobs", params); } - src/mcp.ts:388-410 (registration)Tool handler in the MCP server for "create_job".
case "create_job": if (!args?.base_model && !args?.base_user_model_id) { return { content: [{ type: "text", text: "Error: either base_model or base_user_model_id is required" }], isError: true, }; } result = await client.createJob({ base_model: args?.base_model as string | undefined, base_user_model_id: args?.base_user_model_id as string | undefined, output_name: args!.output_name as string, repo_url: args?.repo_url as string | undefined, branch: args?.branch as string | undefined, num_epochs: args?.num_epochs as number | undefined, max_examples: args?.max_examples as number | undefined, s3_output_bucket: args?.s3_output_bucket as string | undefined, s3_access_key_id: args?.s3_access_key_id as string | undefined, s3_secret_access_key: args?.s3_secret_access_key as string | undefined, s3_region: args?.s3_region as string | undefined, agent: args?.agent as string | undefined, quality_tier: args?.quality_tier as string | undefined, }); break; - src/mcp.ts:74-162 (schema)Tool registration and input schema for "create_job".
name: "create_job", description: "Fine-tune an LLM on a GitHub repository using Tuning Engines. " + "This trains a custom model that learns from the code patterns, style, and conventions in the repo. " + "Choose an agent to control the training approach:\n\n" + "AVAILABLE AGENTS:\n" + "- agent='code_repo' (Cody) — LoRA-based code fine-tuning using QLoRA (4-bit quantized LoRA) via the Axolotl framework. " + "Trains on your repo's code patterns, naming conventions, and project structure to produce a fast, lightweight adapter. " + "Best for: code autocomplete, inline suggestions, tab-complete, code style matching.\n" + "- agent='sera_code_repo' (SIERA) — Bug-fix specialist using the Open Coding Agents approach from AllenAI. " + "Generates synthetic error-resolution training pairs from your repo, producing a model that understands your " + "codebase's failure patterns and fix conventions. Best for: debugging, error resolution, patch generation, root cause analysis. " + "Supports quality_tier='low' (faster) or quality_tier='high' (deeper analysis, more training data).\n\n" + "SUPPORTED BASE MODELS (by size):\n" + "- 3B: Qwen/Qwen2.5-Coder-3B-Instruct\n" + "- 7B: codellama/CodeLlama-7b-hf, deepseek-ai/deepseek-coder-7b-instruct-v1.5, Qwen/Qwen2.5-Coder-7B-Instruct\n" + "- 13-15B: codellama/CodeLlama-13b-Instruct-hf, bigcode/starcoder2-15b, Qwen/Qwen2.5-Coder-14B-Instruct\n" + "- 32-34B: deepseek-ai/deepseek-coder-33b-instruct, codellama/CodeLlama-34b-Instruct-hf, Qwen/Qwen2.5-Coder-32B-Instruct\n" + "- 70-72B: codellama/CodeLlama-70b-Instruct-hf, meta-llama/Llama-3.1-70B-Instruct, Qwen/Qwen2.5-72B-Instruct\n\n" + "TYPICAL WORKFLOW: estimate_job first to check cost, then create_job, then job_status to monitor progress.", inputSchema: { type: "object" as const, properties: { base_model: { type: "string", description: "HuggingFace model ID to fine-tune (e.g. 'Qwen/Qwen2.5-Coder-7B-Instruct'). Required unless base_user_model_id is provided. Use list_supported_models to see all options.", }, base_user_model_id: { type: "string", description: "ID of a previously trained model to fine-tune further (iterative training). The base model is resolved automatically. Use list_models to find IDs.", }, output_name: { type: "string", description: "Name for the resulting fine-tuned model (e.g. 'my-project-cody-7b')", }, repo_url: { type: "string", description: "GitHub repository URL to train on (e.g. 'https://github.com/org/repo')", }, branch: { type: "string", description: "Git branch to use (default: main)", }, num_epochs: { type: "number", description: "Number of training epochs (more = better quality but higher cost)", }, max_examples: { type: "number", description: "Maximum training examples to extract from the repo (minimum: 2)", }, agent: { type: "string", enum: ["code_repo", "sera_code_repo"], description: "Training agent to use. 'code_repo' (Cody) = QLoRA-based fine-tuning for code autocomplete and inline suggestions. " + "'sera_code_repo' (SIERA) = bug-fix specialist using AllenAI's Open Coding Agents approach. " + "Default: 'code_repo'.", }, quality_tier: { type: "string", enum: ["low", "high"], description: "Quality tier (SIERA agent only). 'low' = faster, fewer synthetic pairs. 'high' = deeper analysis, more training data, better results. Default: 'low'.", }, s3_output_bucket: { type: "string", description: "S3 bucket to export the trained model to. If omitted, model is stored in Tuning Engines cloud storage.", }, s3_access_key_id: { type: "string", description: "AWS access key ID for S3 export", }, s3_secret_access_key: { type: "string", description: "AWS secret access key for S3 export", }, s3_region: { type: "string", description: "AWS region for S3 export (e.g. us-east-1)", }, }, required: ["output_name", "repo_url"], },