Skip to main content
Glama

transcribe_video

Idempotent

Transcribe YouTube videos that lack captions. Submit a video URL to start an async AI Whisper transcription; get a task ID to check status and retrieve the result.

Instructions

Start an asynchronous AI ASR (Whisper) transcription of a YouTube video. Returns immediately with a task_id and estimated_wait_seconds; the actual transcription runs in the background. Poll status with get_asr_task. Use this when fetch_transcript returned NO_CAPTIONS or when the video has no captions. Costs 5 credits, debited only on successful completion.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
video_urlYesYouTube URL (watch, youtu.be, shorts, or embed form). Full URL preferred.
langNoOptional language hint (ISO 639-1, e.g. 'en', 'zh'). Omit to auto-detect.

Implementation Reference

  • Schema definition for the transcribe_video tool, declaring input parameters (video_url required, lang optional) and metadata. This is a proxy server so the schema is declared inline.
    {
      name: "transcribe_video",
      description:
        "Start an asynchronous AI ASR (Whisper) transcription of a YouTube video. Returns immediately with a task_id and estimated_wait_seconds; the actual transcription runs in the background. Poll status with get_asr_task. Use this when fetch_transcript returned NO_CAPTIONS or when the video has no captions. Costs 5 credits, debited only on successful completion.",
      annotations: { title: "Transcribe YouTube Video (Async)", ...ANN.ASR_START },
      inputSchema: {
        type: "object",
        properties: {
          video_url: {
            type: "string",
            description:
              "YouTube URL (watch, youtu.be, shorts, or embed form). Full URL preferred.",
            minLength: 5,
          },
          lang: {
            type: "string",
            description:
              "Optional language hint (ISO 639-1, e.g. 'en', 'zh'). Omit to auto-detect.",
          },
        },
        required: ["video_url"],
      },
    },
  • src/index.js:73-406 (registration)
    The TOOLS array (lines 73-406) that registers transcribe_video along with all other tools. The array is returned by the ListToolsRequestSchema handler on line 448.
    const TOOLS = [
      {
        name: "search_youtube",
        description:
          "Search YouTube globally for videos, channels, or playlists on any topic. Returns up to 50 results with metadata. Use this for topic-based discovery when the user has not specified a channel — for searching within a known channel use search_channel_videos instead.",
        annotations: { title: "Search YouTube", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            q: {
              type: "string",
              description:
                "Search query (same syntax as YouTube's search bar, e.g. 'rust async tutorial', 'lex fridman dario amodei').",
              minLength: 1,
            },
            type: {
              type: "string",
              description:
                "Search type: 'video', 'channel', or 'playlist'. Default: 'video'.",
              enum: ["video", "channel", "playlist"],
            },
            limit: {
              type: "number",
              description: "Max results (1-50, default 20).",
              minimum: 1,
              maximum: 50,
            },
          },
          required: ["q"],
        },
      },
      {
        name: "fetch_video_info",
        description:
          "Fetch consolidated YouTube video metadata with numeric types — title, channel, duration, view count, publish date, thumbnail, description, captions availability. Does NOT include the transcript itself; call fetch_transcript or transcribe_video for that. Cheap, fast, free.",
        annotations: { title: "Fetch YouTube Video Info", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            video_id: {
              type: "string",
              description:
                "11-char YouTube video ID (e.g. 'dQw4w9WgXcQ') or full URL (watch, youtu.be, shorts, embed, live).",
              minLength: 5,
            },
          },
          required: ["video_id"],
        },
      },
      {
        name: "fetch_transcript",
        description:
          "Fetch the existing official transcript (subtitles/captions) of a YouTube video, with per-segment timestamps and language detected. Errors with NO_CAPTIONS if the video has no captions — fall back to transcribe_video in that case to generate one with AI ASR. This call is free.",
        annotations: { title: "Fetch YouTube Video Transcript", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            video_id: {
              type: "string",
              description: "YouTube video ID (e.g. 'dQw4w9WgXcQ') or full YouTube URL.",
              minLength: 5,
            },
            lang: {
              type: "string",
              description:
                "ISO 639-1 language code to select among multilingual captions (e.g. 'en', 'zh', 'ja'). Omit for the video's default language.",
            },
            save: {
              type: "boolean",
              description:
                "When true, also save the video to the user's Library in the same call. Bookmarks the meta row and flips has_asr when the transcript was produced by our ASR. Does NOT upload a summary — use save_to_library with kind='summary' or kind='both' for that.",
            },
          },
          required: ["video_id"],
        },
      },
      {
        name: "transcribe_video",
        description:
          "Start an asynchronous AI ASR (Whisper) transcription of a YouTube video. Returns immediately with a task_id and estimated_wait_seconds; the actual transcription runs in the background. Poll status with get_asr_task. Use this when fetch_transcript returned NO_CAPTIONS or when the video has no captions. Costs 5 credits, debited only on successful completion.",
        annotations: { title: "Transcribe YouTube Video (Async)", ...ANN.ASR_START },
        inputSchema: {
          type: "object",
          properties: {
            video_url: {
              type: "string",
              description:
                "YouTube URL (watch, youtu.be, shorts, or embed form). Full URL preferred.",
              minLength: 5,
            },
            lang: {
              type: "string",
              description:
                "Optional language hint (ISO 639-1, e.g. 'en', 'zh'). Omit to auto-detect.",
            },
          },
          required: ["video_url"],
        },
      },
      {
        name: "get_asr_task",
        description:
          "Poll the status of an ASR task created by transcribe_video. Returns one of `queued`, `downloading`, `transcribing`, `finalizing`, `done`, or `failed`. When status is `done`, includes the full transcript with timestamps. Recommended polling interval: 3-5 seconds. Free — does not consume credits.",
        annotations: { title: "Get ASR Task Status", ...ANN.ASR_POLL },
        inputSchema: {
          type: "object",
          properties: {
            task_id: {
              type: "string",
              description: "Task ID returned by transcribe_video.",
              minLength: 1,
            },
          },
          required: ["task_id"],
        },
      },
      {
        name: "resolve_channel",
        description:
          "Resolve a YouTube @handle, channel URL, video URL, or raw channel ID into canonical channel info (channel ID, name, handle, subscriber count, video count, avatar). Call this first when you only have a handle or URL but need a channel ID for the other channel-scoped tools.",
        annotations: { title: "Resolve YouTube Channel", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            input: {
              type: "string",
              description:
                "@handle (e.g. '@MrBeast'), channel URL, video URL, or UC... channel ID. All common forms are accepted.",
              minLength: 1,
            },
          },
          required: ["input"],
        },
      },
      {
        name: "list_channel_videos",
        description:
          "List all videos from a YouTube channel ordered by publish date (newest first), with pagination. Returns up to 30 per page plus a `continuation` token if more results exist. For just the most recent handful, prefer get_channel_latest_videos for simplicity.",
        annotations: { title: "List Channel Videos", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            channel: {
              type: "string",
              description:
                "@handle, channel URL, or UC... channel ID. Required for the first page; omit on subsequent pages and pass `continuation` instead.",
            },
            continuation: {
              type: "string",
              description:
                "Pagination token from a previous response's `continuation` field. Omit for the first page.",
            },
          },
        },
      },
      {
        name: "get_channel_latest_videos",
        description:
          "Get the most recent videos from a YouTube channel — convenience wrapper over list_channel_videos with no pagination. Best for 'what did this creator publish recently?' style queries.",
        annotations: { title: "Get Channel Latest Videos", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            channel: {
              type: "string",
              description: "@handle (e.g. '@mkbhd'), channel URL, or UC... channel ID.",
              minLength: 1,
            },
          },
          required: ["channel"],
        },
      },
      {
        name: "search_channel_videos",
        description:
          "Search for specific videos within a single YouTube channel. Restricts results to the given channel. Use after resolve_channel if starting from a handle. Useful for 'find Karpathy's video about backpropagation' style queries.",
        annotations: { title: "Search Channel Videos", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            channel: {
              type: "string",
              description: "@handle, channel URL, or UC... channel ID.",
              minLength: 1,
            },
            q: {
              type: "string",
              description: "Search query (matched against video title and description within the channel).",
              minLength: 1,
            },
            limit: {
              type: "number",
              description: "Max results (1-50, default 30).",
              minimum: 1,
              maximum: 50,
            },
          },
          required: ["channel", "q"],
        },
      },
      {
        name: "list_playlist_videos",
        description:
          "List videos in a YouTube playlist in order, with pagination. Returns video metadata and position within the playlist. Works for any public or unlisted playlist exposed by its URL/ID.",
        annotations: { title: "List Playlist Videos", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            playlist: {
              type: "string",
              description:
                "Playlist URL or ID (typically starts with 'PL', 'UU', 'LL', or 'FL'). Required for the first page.",
            },
            continuation: {
              type: "string",
              description: "Pagination token from a previous response's `continuation` field. Omit for the first page.",
            },
          },
        },
      },
      {
        name: "save_to_library",
        description:
          "Save a video to the authenticated user's Library. Three modes via `kind`: 'asr' bookmarks the video and flips has_asr (use after a successful transcribe_video → fetch_transcript flow); 'summary' uploads a summary blob; 'both' does both at once. Idempotent: saving the same video twice updates the existing entry.",
        annotations: { title: "Save to Library", ...ANN.LIB_WRITE },
        inputSchema: {
          type: "object",
          properties: {
            video_id: {
              type: "string",
              description: "YouTube video ID (11 chars).",
              minLength: 5,
            },
            kind: {
              type: "string",
              description:
                "'asr' (bookmark + flip has_asr), 'summary' (upload summary text), or 'both'.",
              enum: ["asr", "summary", "both"],
            },
            title: {
              type: "string",
              description: "Video title (for display in the user's Library list).",
            },
            author: {
              type: "string",
              description: "Channel / author name.",
            },
            thumbnail: {
              type: "string",
              description: "Thumbnail URL.",
            },
            video_url: {
              type: "string",
              description: "Full YouTube URL.",
            },
            language: {
              type: "string",
              description: "Video language code (ISO 639-1).",
            },
            text: {
              type: "string",
              description:
                "Summary text. REQUIRED when kind='summary' or kind='both'. Plain text or markdown — use the `format` param to declare which.",
            },
            locale: {
              type: "string",
              description:
                "Summary locale (e.g. 'en', 'zh'). Used with kind='summary' or kind='both'.",
            },
            format: {
              type: "string",
              description:
                "Summary format: 'markdown' (default) or 'text'. Use 'markdown' if your text contains **bold**, bullets, headings, or code fences so the web UI renders it; use 'text' for plain prose.",
              enum: ["markdown", "text"],
            },
            model: {
              type: "string",
              description: "Optional model identifier, e.g. 'claude-opus-4'.",
            },
          },
          required: ["video_id", "kind"],
        },
      },
      {
        name: "list_library",
        description:
          "List videos the user has saved to their Library (transcripts + summaries). Supports substring search on title/author, favorites filter, and pagination. Returns recently saved items first. Scoped to the calling user's data only.",
        annotations: { title: "List Saved Library", ...ANN.LIB_READ },
        inputSchema: {
          type: "object",
          properties: {
            favorite: {
              type: "boolean",
              description: "When true, return only items the user has favorited.",
            },
            q: {
              type: "string",
              description: "Substring match on title and author.",
            },
            limit: {
              type: "number",
              description: "Max items per page (1-100, default 20).",
              minimum: 1,
              maximum: 100,
            },
            offset: {
              type: "number",
              description: "Pagination offset (number of items to skip).",
              minimum: 0,
            },
          },
        },
      },
      {
        name: "get_library_item",
        description:
          "Read a saved Library item with its transcript and AI summary inline (when available). Use after list_library to fetch the full content the user saved. Free.",
        annotations: { title: "Get Library Item", ...ANN.LIB_READ },
        inputSchema: {
          type: "object",
          properties: {
            id: {
              type: "number",
              description: "Library item id (returned by list_library or save_to_library).",
            },
            locale: {
              type: "string",
              description: "Summary locale to fetch (e.g. 'en', 'zh'). Defaults to 'en'.",
            },
          },
          required: ["id"],
        },
      },
    ];
  • Generic handler for all tool calls including transcribe_video. It forwards the tool name and arguments to the upstream SubDownload MCP server via callUpstream.
    server.setRequestHandler(CallToolRequestSchema, async (request) => {
      try {
        return await callUpstream(
          request.params.name,
          request.params.arguments || {}
        );
      } catch (err) {
        return {
          content: [{ type: "text", text: err.message || String(err) }],
          isError: true,
        };
      }
    });
  • Helper function callUpstream that forwards all tool calls (including transcribe_video) to the upstream SubDownload MCP API via HTTP POST with authentication.
    async function callUpstream(name, args) {
      if (!API_KEY) {
        throw new Error(
          "SUBDOWNLOAD_API_KEY env var is not set. Get one at https://subdownload.com/account, then run with -e SUBDOWNLOAD_API_KEY=<your-key>."
        );
      }
      const res = await fetch(UPSTREAM_URL, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          Accept: "application/json, text/event-stream",
          Authorization: `Bearer ${API_KEY}`,
        },
        body: JSON.stringify({
          jsonrpc: "2.0",
          id: Date.now(),
          method: "tools/call",
          params: { name, arguments: args },
        }),
      });
      const text = await res.text();
      let body;
      try {
        body = JSON.parse(text);
      } catch {
        throw new Error(
          `Upstream returned non-JSON response (HTTP ${res.status}): ${text.slice(0, 200)}`
        );
      }
      if (body.error) {
        throw new Error(body.error.message || JSON.stringify(body.error));
      }
      return body.result;
    }
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds async behavior, immediate return with task_id and estimated_wait_seconds, polling with get_asr_task, cost of 5 credits debited only on success. Annotations already indicate idempotent and non-destructive, but description complements with actionable details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with purpose, then async behavior, usage context, and cost. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given low parameter count, no output schema, and rich annotations, the description covers return values, polling, and cost, making it complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already covers both parameters with descriptions. Description adds context (URL forms for video_url, ISO 639-1 for lang) but does not significantly exceed schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it starts an async AI ASR transcription of a YouTube video, with specific verb and resource. Distinguishes from siblings like fetch_transcript and get_asr_task.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (when fetch_transcript returns NO_CAPTIONS or video has no captions). Implies not to use if captions exist, but does not explicitly state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SubDownload/subdownload-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server