upload_caption
Upload SRT or WebVTT caption tracks to YouTube videos. Set language, track name, and format. Use draft mode to iterate without exposing captions to viewers.
Instructions
Upload a caption track (SRT or WebVTT) to a video. Creates a new track — use a distinct name per language/track, or is_draft=true while iterating.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| video_id | Yes | Video ID the caption belongs to. | |
| language | Yes | BCP-47 language code, e.g. 'en', 'en-US', 'es', 'ja'. Must match a language the video supports. | |
| name | No | Caption track name shown in the player's caption menu. Empty string for the default track. | |
| caption_text | Yes | Caption content as a string (SRT or WebVTT format). Source this from a file or the model's output. | |
| format | No | Content type of caption_text: 'srt' (SubRip, application/x-subrip) or 'vtt' (WebVTT, text/vtt). | srt |
| is_draft | No | Draft captions aren't visible to viewers. Useful while reviewing auto-translations. |
Implementation Reference
- src/tools/captions.ts:79-117 (handler)The 'upload_caption' tool handler: registers an MCP tool that accepts video_id, language, name, caption_text, format, and is_draft. It converts the caption text to a Uint8Array and calls client.insertCaption, then returns a result summary.
server.tool( "upload_caption", "Upload a caption track (SRT or WebVTT) to a video. Creates a new track — use a distinct `name` per language/track, or `is_draft=true` while iterating.", uploadCaptionSchema, async (args) => { const contentType = args.format === "vtt" ? "text/vtt" : "application/x-subrip"; const bytes = new Uint8Array(Buffer.from(args.caption_text, "utf-8")); const result = (await client.insertCaption({ videoId: args.video_id, language: args.language, name: args.name, isDraft: args.is_draft, body: bytes, captionContentType: contentType, })) as { id?: string; snippet?: { status?: string }; }; return { content: [ { type: "text" as const, text: [ `Uploaded caption track: ${result.id ?? "(unknown id)"}`, ` video: ${args.video_id}`, ` language: ${args.language}`, ` name: "${args.name}"`, ` format: ${args.format}`, ` status: ${result.snippet?.status ?? "?"}`, args.is_draft ? " (draft — not visible to viewers)" : "", ] .filter(Boolean) .join("\n"), }, ], }; }, ); - src/tools/captions.ts:6-36 (schema)Zod schema for upload_caption parameters: video_id, language, name (default ''), caption_text (SRT/VTT string), format (srt|vtt, default srt), is_draft (boolean, default false).
const uploadCaptionSchema = { video_id: z.string().describe("Video ID the caption belongs to."), language: z .string() .describe( "BCP-47 language code, e.g. 'en', 'en-US', 'es', 'ja'. Must match a language the video supports.", ), name: z .string() .default("") .describe( "Caption track name shown in the player's caption menu. Empty string for the default track.", ), caption_text: z .string() .describe( "Caption content as a string (SRT or WebVTT format). Source this from a file or the model's output.", ), format: z .enum(["srt", "vtt"]) .default("srt") .describe( "Content type of caption_text: 'srt' (SubRip, application/x-subrip) or 'vtt' (WebVTT, text/vtt).", ), is_draft: z .boolean() .default(false) .describe( "Draft captions aren't visible to viewers. Useful while reviewing auto-translations.", ), }; - src/server.ts:51-51 (registration)Registration of registerCaptionTools on the MCP server, which registers the 'upload_caption' tool (among others).
registerCaptionTools(s, youtube); - src/tools/captions.ts:46-135 (registration)The registerCaptionTools export function that registers 'upload_caption' (line 79-117), 'list_captions', and 'delete_caption' tools on the MCP server.
export function registerCaptionTools( server: McpServer, client: YouTubeClient, ): void { server.tool( "list_captions", "List caption tracks on a video with their language, name, status, and whether they are drafts.", listCaptionsSchema, async (args) => { const res = await client.listCaptions(args.video_id); if (res.items.length === 0) { return { content: [ { type: "text" as const, text: `Video ${args.video_id} has no caption tracks.`, }, ], }; } const lines = [ `Found ${res.items.length} caption track(s):`, ...res.items.map((c) => { const s = c.snippet ?? {}; const draft = s.isDraft ? " [draft]" : ""; const kind = s.trackKind ? ` (${s.trackKind})` : ""; return ` ${c.id} — ${s.language ?? "?"} "${s.name ?? ""}"${kind} [${s.status ?? "?"}]${draft}`; }), ]; return { content: [{ type: "text" as const, text: lines.join("\n") }] }; }, ); server.tool( "upload_caption", "Upload a caption track (SRT or WebVTT) to a video. Creates a new track — use a distinct `name` per language/track, or `is_draft=true` while iterating.", uploadCaptionSchema, async (args) => { const contentType = args.format === "vtt" ? "text/vtt" : "application/x-subrip"; const bytes = new Uint8Array(Buffer.from(args.caption_text, "utf-8")); const result = (await client.insertCaption({ videoId: args.video_id, language: args.language, name: args.name, isDraft: args.is_draft, body: bytes, captionContentType: contentType, })) as { id?: string; snippet?: { status?: string }; }; return { content: [ { type: "text" as const, text: [ `Uploaded caption track: ${result.id ?? "(unknown id)"}`, ` video: ${args.video_id}`, ` language: ${args.language}`, ` name: "${args.name}"`, ` format: ${args.format}`, ` status: ${result.snippet?.status ?? "?"}`, args.is_draft ? " (draft — not visible to viewers)" : "", ] .filter(Boolean) .join("\n"), }, ], }; }, ); server.tool( "delete_caption", "Delete a caption track by ID. Use list_captions to find the track ID first.", deleteCaptionSchema, async (args) => { await client.deleteCaption(args.caption_id); return { content: [ { type: "text" as const, text: `Deleted caption track ${args.caption_id}.`, }, ], }; }, ); } - src/youtube/client.ts:230-274 (helper)YouTubeClient.insertCaption helper: builds a multipart/related request with JSON metadata and caption body bytes, POSTs to the YouTube API's captions endpoint.
/** Upload a caption track for a video. Body is typically SRT or WebVTT text. */ async insertCaption(params: { videoId: string; language: string; name: string; isDraft: boolean; body: Uint8Array; captionContentType: string; }): Promise<unknown> { const boundary = `youtube-mcp-${Date.now().toString(16)}`; const metadata = JSON.stringify({ snippet: { videoId: params.videoId, language: params.language, name: params.name, isDraft: params.isDraft, }, }); const opening = Buffer.from( `--${boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n${metadata}\r\n--${boundary}\r\nContent-Type: ${params.captionContentType}\r\n\r\n`, "utf-8", ); const closing = Buffer.from(`\r\n--${boundary}--\r\n`, "utf-8"); const body = Buffer.concat([opening, Buffer.from(params.body), closing]); const url = new URL(`${UPLOAD_API}/captions`); url.searchParams.set("part", "snippet"); url.searchParams.set("uploadType", "multipart"); const token = await this.ensureAccessToken(); const res = await fetch(url.toString(), { method: "POST", headers: { Authorization: `Bearer ${token}`, "Content-Type": `multipart/related; boundary=${boundary}`, "Content-Length": String(body.length), }, body, }); if (!res.ok) { throw new Error( `YouTube caption insert failed: ${res.status} ${await res.text()}`, ); } return res.json(); }