youtube-mcp

upload_caption

Add caption tracks to YouTube videos using SRT or WebVTT files. Set language, track name, and mark as draft for review.

Instructions

Upload a caption track (SRT or WebVTT) to a video. Creates a new track — use a distinct name per language/track, or is_draft=true while iterating.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`video_id`	Yes	Video ID the caption belongs to.
`language`	Yes	BCP-47 language code, e.g. 'en', 'en-US', 'es', 'ja'. Must match a language the video supports.
`name`	No	Caption track name shown in the player's caption menu. Empty string for the default track.
`caption_text`	Yes	Caption content as a string (SRT or WebVTT format). Source this from a file or the model's output.
`format`	No	Content type of caption_text: 'srt' (SubRip, application/x-subrip) or 'vtt' (WebVTT, text/vtt).	srt
`is_draft`	No	Draft captions aren't visible to viewers. Useful while reviewing auto-translations.

Implementation Reference

src/tools/captions.ts:83-116 (handler)

The handler function for the 'upload_caption' tool. It converts the caption text to bytes, determines the content type (SRT or VTT), calls client.insertCaption, and returns a formatted success message.

async (args) => {
  const contentType =
    args.format === "vtt" ? "text/vtt" : "application/x-subrip";
  const bytes = new Uint8Array(Buffer.from(args.caption_text, "utf-8"));
  const result = (await client.insertCaption({
    videoId: args.video_id,
    language: args.language,
    name: args.name,
    isDraft: args.is_draft,
    body: bytes,
    captionContentType: contentType,
  })) as {
    id?: string;
    snippet?: { status?: string };
  };
  return {
    content: [
      {
        type: "text" as const,
        text: [
          `Uploaded caption track: ${result.id ?? "(unknown id)"}`,
          `  video: ${args.video_id}`,
          `  language: ${args.language}`,
          `  name: "${args.name}"`,
          `  format: ${args.format}`,
          `  status: ${result.snippet?.status ?? "?"}`,
          args.is_draft ? "  (draft — not visible to viewers)" : "",
        ]
          .filter(Boolean)
          .join("\n"),
      },
    ],
  };
},

src/tools/captions.ts:6-36 (schema)

The Zod schema for 'upload_caption' inputs: video_id, language, name (default ''), caption_text, format (srt/vtt, default 'srt'), and is_draft (default false).

const uploadCaptionSchema = {
  video_id: z.string().describe("Video ID the caption belongs to."),
  language: z
    .string()
    .describe(
      "BCP-47 language code, e.g. 'en', 'en-US', 'es', 'ja'. Must match a language the video supports.",
    ),
  name: z
    .string()
    .default("")
    .describe(
      "Caption track name shown in the player's caption menu. Empty string for the default track.",
    ),
  caption_text: z
    .string()
    .describe(
      "Caption content as a string (SRT or WebVTT format). Source this from a file or the model's output.",
    ),
  format: z
    .enum(["srt", "vtt"])
    .default("srt")
    .describe(
      "Content type of caption_text: 'srt' (SubRip, application/x-subrip) or 'vtt' (WebVTT, text/vtt).",
    ),
  is_draft: z
    .boolean()
    .default(false)
    .describe(
      "Draft captions aren't visible to viewers. Useful while reviewing auto-translations.",
    ),
};

src/tools/captions.ts:79-117 (registration)

The registration call on server.tool(...) that binds the name 'upload_caption' to its description, schema, and handler.

server.tool(
  "upload_caption",
  "Upload a caption track (SRT or WebVTT) to a video. Creates a new track — use a distinct `name` per language/track, or `is_draft=true` while iterating.",
  uploadCaptionSchema,
  async (args) => {
    const contentType =
      args.format === "vtt" ? "text/vtt" : "application/x-subrip";
    const bytes = new Uint8Array(Buffer.from(args.caption_text, "utf-8"));
    const result = (await client.insertCaption({
      videoId: args.video_id,
      language: args.language,
      name: args.name,
      isDraft: args.is_draft,
      body: bytes,
      captionContentType: contentType,
    })) as {
      id?: string;
      snippet?: { status?: string };
    };
    return {
      content: [
        {
          type: "text" as const,
          text: [
            `Uploaded caption track: ${result.id ?? "(unknown id)"}`,
            `  video: ${args.video_id}`,
            `  language: ${args.language}`,
            `  name: "${args.name}"`,
            `  format: ${args.format}`,
            `  status: ${result.snippet?.status ?? "?"}`,
            args.is_draft ? "  (draft — not visible to viewers)" : "",
          ]
            .filter(Boolean)
            .join("\n"),
        },
      ],
    };
  },
);

src/server.ts:51-51 (registration)
Where registerCaptionTools is called from the main server setup, passing the MCP server and YouTube client.
```
registerCaptionTools(s, youtube);
```

src/youtube/client.ts:231-273 (helper)

The YouTube API helper that performs the multipart upload of the caption track (insertCaption). Constructs a multipart/related body with JSON metadata and caption content.

async insertCaption(params: {
  videoId: string;
  language: string;
  name: string;
  isDraft: boolean;
  body: Uint8Array;
  captionContentType: string;
}): Promise<unknown> {
  const boundary = `youtube-mcp-${Date.now().toString(16)}`;
  const metadata = JSON.stringify({
    snippet: {
      videoId: params.videoId,
      language: params.language,
      name: params.name,
      isDraft: params.isDraft,
    },
  });
  const opening = Buffer.from(
    `--${boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n${metadata}\r\n--${boundary}\r\nContent-Type: ${params.captionContentType}\r\n\r\n`,
    "utf-8",
  );
  const closing = Buffer.from(`\r\n--${boundary}--\r\n`, "utf-8");
  const body = Buffer.concat([opening, Buffer.from(params.body), closing]);

  const url = new URL(`${UPLOAD_API}/captions`);
  url.searchParams.set("part", "snippet");
  url.searchParams.set("uploadType", "multipart");
  const token = await this.ensureAccessToken();
  const res = await fetch(url.toString(), {
    method: "POST",
    headers: {
      Authorization: `Bearer ${token}`,
      "Content-Type": `multipart/related; boundary=${boundary}`,
      "Content-Length": String(body.length),
    },
    body,
  });
  if (!res.ok) {
    throw new Error(
      `YouTube caption insert failed: ${res.status} ${await res.text()}`,
    );
  }
  return res.json();

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'creates a new track' implying mutation, but lacks details on idempotency, conflict behavior (e.g., does it overwrite or fail if track with same name+language exists?), authentication needs, or rate limits. This is a significant gap for a creation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two clear, front-loaded sentences with no redundant information. The first sentence states the primary purpose, and the second adds practical usage tips. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of siblings for deletion and listing, the description is fairly complete for a creation tool. It covers formats and draft usage. However, it lacks details on error handling, file size limits, or post-creation behavior, leaving some gaps for a comprehensive understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters have descriptions. The description adds value beyond the schema by providing usage context for `name` and `is_draft`, but for other parameters, it repeats schema info. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it uploads a caption track (SRT or WebVTT) to a video, specifying that it creates a new track. This distinguishes it from siblings like list_captions (listing) and delete_caption (deletion), providing specific verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives actionable guidance: use a distinct `name` per language/track, or `is_draft=true` while iterating. However, it does not explicitly state when not to use it (e.g., for updates, consider deleting first, referencing the sibling delete_caption), so some exclusions are missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/miller-joe/youtube-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server