Skip to main content
Glama

transcribe_video

Idempotent

Transcribe YouTube videos without captions using AI speech recognition. Starts background processing and returns a task ID to check progress.

Instructions

Start an asynchronous AI ASR (Whisper) transcription of a YouTube video. Returns immediately with a task_id and estimated_wait_seconds; the actual transcription runs in the background. Poll status with get_asr_task. Use this when fetch_transcript returned NO_CAPTIONS or when the video has no captions. Costs 5 credits, debited only on successful completion.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
video_urlYesYouTube URL (watch, youtu.be, shorts, or embed form). Full URL preferred.
langNoOptional language hint (ISO 639-1, e.g. 'en', 'zh'). Omit to auto-detect.

Implementation Reference

  • Schema definition for the 'transcribe_video' tool, accepting 'video_url' (required) and 'lang' (optional). Registered as an ASR_START annotation.
    {
      name: "transcribe_video",
      description:
        "Start an asynchronous AI ASR (Whisper) transcription of a YouTube video. Returns immediately with a task_id and estimated_wait_seconds; the actual transcription runs in the background. Poll status with get_asr_task. Use this when fetch_transcript returned NO_CAPTIONS or when the video has no captions. Costs 5 credits, debited only on successful completion.",
      annotations: { title: "Transcribe YouTube Video (Async)", ...ANN.ASR_START },
      inputSchema: {
        type: "object",
        properties: {
          video_url: {
            type: "string",
            description:
              "YouTube URL (watch, youtu.be, shorts, or embed form). Full URL preferred.",
            minLength: 5,
          },
          lang: {
            type: "string",
            description:
              "Optional language hint (ISO 639-1, e.g. 'en', 'zh'). Omit to auto-detect.",
          },
        },
        required: ["video_url"],
      },
    },
  • src/index.js:73-406 (registration)
    The TOOLS array containing all tool definitions; 'transcribe_video' is one of the registered tools.
    const TOOLS = [
      {
        name: "search_youtube",
        description:
          "Search YouTube globally for videos, channels, or playlists on any topic. Returns up to 50 results with metadata. Use this for topic-based discovery when the user has not specified a channel — for searching within a known channel use search_channel_videos instead.",
        annotations: { title: "Search YouTube", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            q: {
              type: "string",
              description:
                "Search query (same syntax as YouTube's search bar, e.g. 'rust async tutorial', 'lex fridman dario amodei').",
              minLength: 1,
            },
            type: {
              type: "string",
              description:
                "Search type: 'video', 'channel', or 'playlist'. Default: 'video'.",
              enum: ["video", "channel", "playlist"],
            },
            limit: {
              type: "number",
              description: "Max results (1-50, default 20).",
              minimum: 1,
              maximum: 50,
            },
          },
          required: ["q"],
        },
      },
      {
        name: "fetch_video_info",
        description:
          "Fetch consolidated YouTube video metadata with numeric types — title, channel, duration, view count, publish date, thumbnail, description, captions availability. Does NOT include the transcript itself; call fetch_transcript or transcribe_video for that. Cheap, fast, free.",
        annotations: { title: "Fetch YouTube Video Info", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            video_id: {
              type: "string",
              description:
                "11-char YouTube video ID (e.g. 'dQw4w9WgXcQ') or full URL (watch, youtu.be, shorts, embed, live).",
              minLength: 5,
            },
          },
          required: ["video_id"],
        },
      },
      {
        name: "fetch_transcript",
        description:
          "Fetch the existing official transcript (subtitles/captions) of a YouTube video, with per-segment timestamps and language detected. Errors with NO_CAPTIONS if the video has no captions — fall back to transcribe_video in that case to generate one with AI ASR. This call is free.",
        annotations: { title: "Fetch YouTube Video Transcript", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            video_id: {
              type: "string",
              description: "YouTube video ID (e.g. 'dQw4w9WgXcQ') or full YouTube URL.",
              minLength: 5,
            },
            lang: {
              type: "string",
              description:
                "ISO 639-1 language code to select among multilingual captions (e.g. 'en', 'zh', 'ja'). Omit for the video's default language.",
            },
            save: {
              type: "boolean",
              description:
                "When true, also save the video to the user's Library in the same call. Bookmarks the meta row and flips has_asr when the transcript was produced by our ASR. Does NOT upload a summary — use save_to_library with kind='summary' or kind='both' for that.",
            },
          },
          required: ["video_id"],
        },
      },
      {
        name: "transcribe_video",
        description:
          "Start an asynchronous AI ASR (Whisper) transcription of a YouTube video. Returns immediately with a task_id and estimated_wait_seconds; the actual transcription runs in the background. Poll status with get_asr_task. Use this when fetch_transcript returned NO_CAPTIONS or when the video has no captions. Costs 5 credits, debited only on successful completion.",
        annotations: { title: "Transcribe YouTube Video (Async)", ...ANN.ASR_START },
        inputSchema: {
          type: "object",
          properties: {
            video_url: {
              type: "string",
              description:
                "YouTube URL (watch, youtu.be, shorts, or embed form). Full URL preferred.",
              minLength: 5,
            },
            lang: {
              type: "string",
              description:
                "Optional language hint (ISO 639-1, e.g. 'en', 'zh'). Omit to auto-detect.",
            },
          },
          required: ["video_url"],
        },
      },
      {
        name: "get_asr_task",
        description:
          "Poll the status of an ASR task created by transcribe_video. Returns one of `queued`, `downloading`, `transcribing`, `finalizing`, `done`, or `failed`. When status is `done`, includes the full transcript with timestamps. Recommended polling interval: 3-5 seconds. Free — does not consume credits.",
        annotations: { title: "Get ASR Task Status", ...ANN.ASR_POLL },
        inputSchema: {
          type: "object",
          properties: {
            task_id: {
              type: "string",
              description: "Task ID returned by transcribe_video.",
              minLength: 1,
            },
          },
          required: ["task_id"],
        },
      },
      {
        name: "resolve_channel",
        description:
          "Resolve a YouTube @handle, channel URL, video URL, or raw channel ID into canonical channel info (channel ID, name, handle, subscriber count, video count, avatar). Call this first when you only have a handle or URL but need a channel ID for the other channel-scoped tools.",
        annotations: { title: "Resolve YouTube Channel", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            input: {
              type: "string",
              description:
                "@handle (e.g. '@MrBeast'), channel URL, video URL, or UC... channel ID. All common forms are accepted.",
              minLength: 1,
            },
          },
          required: ["input"],
        },
      },
      {
        name: "list_channel_videos",
        description:
          "List all videos from a YouTube channel ordered by publish date (newest first), with pagination. Returns up to 30 per page plus a `continuation` token if more results exist. For just the most recent handful, prefer get_channel_latest_videos for simplicity.",
        annotations: { title: "List Channel Videos", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            channel: {
              type: "string",
              description:
                "@handle, channel URL, or UC... channel ID. Required for the first page; omit on subsequent pages and pass `continuation` instead.",
            },
            continuation: {
              type: "string",
              description:
                "Pagination token from a previous response's `continuation` field. Omit for the first page.",
            },
          },
        },
      },
      {
        name: "get_channel_latest_videos",
        description:
          "Get the most recent videos from a YouTube channel — convenience wrapper over list_channel_videos with no pagination. Best for 'what did this creator publish recently?' style queries.",
        annotations: { title: "Get Channel Latest Videos", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            channel: {
              type: "string",
              description: "@handle (e.g. '@mkbhd'), channel URL, or UC... channel ID.",
              minLength: 1,
            },
          },
          required: ["channel"],
        },
      },
      {
        name: "search_channel_videos",
        description:
          "Search for specific videos within a single YouTube channel. Restricts results to the given channel. Use after resolve_channel if starting from a handle. Useful for 'find Karpathy's video about backpropagation' style queries.",
        annotations: { title: "Search Channel Videos", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            channel: {
              type: "string",
              description: "@handle, channel URL, or UC... channel ID.",
              minLength: 1,
            },
            q: {
              type: "string",
              description: "Search query (matched against video title and description within the channel).",
              minLength: 1,
            },
            limit: {
              type: "number",
              description: "Max results (1-50, default 30).",
              minimum: 1,
              maximum: 50,
            },
          },
          required: ["channel", "q"],
        },
      },
      {
        name: "list_playlist_videos",
        description:
          "List videos in a YouTube playlist in order, with pagination. Returns video metadata and position within the playlist. Works for any public or unlisted playlist exposed by its URL/ID.",
        annotations: { title: "List Playlist Videos", ...ANN.YT_READ },
        inputSchema: {
          type: "object",
          properties: {
            playlist: {
              type: "string",
              description:
                "Playlist URL or ID (typically starts with 'PL', 'UU', 'LL', or 'FL'). Required for the first page.",
            },
            continuation: {
              type: "string",
              description: "Pagination token from a previous response's `continuation` field. Omit for the first page.",
            },
          },
        },
      },
      {
        name: "save_to_library",
        description:
          "Save a video to the authenticated user's Library. Three modes via `kind`: 'asr' bookmarks the video and flips has_asr (use after a successful transcribe_video → fetch_transcript flow); 'summary' uploads a summary blob; 'both' does both at once. Idempotent: saving the same video twice updates the existing entry.",
        annotations: { title: "Save to Library", ...ANN.LIB_WRITE },
        inputSchema: {
          type: "object",
          properties: {
            video_id: {
              type: "string",
              description: "YouTube video ID (11 chars).",
              minLength: 5,
            },
            kind: {
              type: "string",
              description:
                "'asr' (bookmark + flip has_asr), 'summary' (upload summary text), or 'both'.",
              enum: ["asr", "summary", "both"],
            },
            title: {
              type: "string",
              description: "Video title (for display in the user's Library list).",
            },
            author: {
              type: "string",
              description: "Channel / author name.",
            },
            thumbnail: {
              type: "string",
              description: "Thumbnail URL.",
            },
            video_url: {
              type: "string",
              description: "Full YouTube URL.",
            },
            language: {
              type: "string",
              description: "Video language code (ISO 639-1).",
            },
            text: {
              type: "string",
              description:
                "Summary text. REQUIRED when kind='summary' or kind='both'. Plain text or markdown — use the `format` param to declare which.",
            },
            locale: {
              type: "string",
              description:
                "Summary locale (e.g. 'en', 'zh'). Used with kind='summary' or kind='both'.",
            },
            format: {
              type: "string",
              description:
                "Summary format: 'markdown' (default) or 'text'. Use 'markdown' if your text contains **bold**, bullets, headings, or code fences so the web UI renders it; use 'text' for plain prose.",
              enum: ["markdown", "text"],
            },
            model: {
              type: "string",
              description: "Optional model identifier, e.g. 'claude-opus-4'.",
            },
          },
          required: ["video_id", "kind"],
        },
      },
      {
        name: "list_library",
        description:
          "List videos the user has saved to their Library (transcripts + summaries). Supports substring search on title/author, favorites filter, and pagination. Returns recently saved items first. Scoped to the calling user's data only.",
        annotations: { title: "List Saved Library", ...ANN.LIB_READ },
        inputSchema: {
          type: "object",
          properties: {
            favorite: {
              type: "boolean",
              description: "When true, return only items the user has favorited.",
            },
            q: {
              type: "string",
              description: "Substring match on title and author.",
            },
            limit: {
              type: "number",
              description: "Max items per page (1-100, default 20).",
              minimum: 1,
              maximum: 100,
            },
            offset: {
              type: "number",
              description: "Pagination offset (number of items to skip).",
              minimum: 0,
            },
          },
        },
      },
      {
        name: "get_library_item",
        description:
          "Read a saved Library item with its transcript and AI summary inline (when available). Use after list_library to fetch the full content the user saved. Free.",
        annotations: { title: "Get Library Item", ...ANN.LIB_READ },
        inputSchema: {
          type: "object",
          properties: {
            id: {
              type: "number",
              description: "Library item id (returned by list_library or save_to_library).",
            },
            locale: {
              type: "string",
              description: "Summary locale to fetch (e.g. 'en', 'zh'). Defaults to 'en'.",
            },
          },
          required: ["id"],
        },
      },
    ];
  • The callUpstream function handles all tool invocations. When 'transcribe_video' is called, it proxies the request to the upstream SubDownload MCP API with a Bearer token.
    async function callUpstream(name, args) {
      if (!API_KEY) {
        throw new Error(
          "SUBDOWNLOAD_API_KEY env var is not set. Get one at https://subdownload.com/account, then run with -e SUBDOWNLOAD_API_KEY=<your-key>."
        );
      }
      const res = await fetch(UPSTREAM_URL, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          Accept: "application/json, text/event-stream",
          Authorization: `Bearer ${API_KEY}`,
        },
        body: JSON.stringify({
          jsonrpc: "2.0",
          id: Date.now(),
          method: "tools/call",
          params: { name, arguments: args },
        }),
      });
      const text = await res.text();
      let body;
      try {
        body = JSON.parse(text);
      } catch {
        throw new Error(
          `Upstream returned non-JSON response (HTTP ${res.status}): ${text.slice(0, 200)}`
        );
      }
      if (body.error) {
        throw new Error(body.error.message || JSON.stringify(body.error));
      }
      return body.result;
    }
  • src/index.js:448-462 (registration)
    Request handlers that dispatch tool calls. ListToolsRequestSchema returns the TOOLS array (including transcribe_video), and CallToolRequestSchema forwards calls to callUpstream.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: TOOLS }));
    
    server.setRequestHandler(CallToolRequestSchema, async (request) => {
      try {
        return await callUpstream(
          request.params.name,
          request.params.arguments || {}
        );
      } catch (err) {
        return {
          content: [{ type: "text", text: err.message || String(err) }],
          isError: true,
        };
      }
    });
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses async behavior (returns immediately with task_id and estimated_wait_seconds), background processing, need to poll with get_asr_task, and cost of 5 credits debited on success. These go beyond annotations which only provide hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each earning its place. First sentence states action and nature, second sentence tells what returns and follow-up, third sentence gives condition and cost. No waste, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers essential aspects: action, async nature, return values, next steps, usage condition, and cost. Missing some edge cases like error handling or timeouts, but overall sufficient for a tool with annotations and schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with detailed descriptions for both parameters. The tool description adds no new parameter meaning beyond what the schema already provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Start an asynchronous AI ASR (Whisper) transcription of a YouTube video.' Specifies verb, resource, and async nature. Distinguishes from siblings like fetch_transcript and get_asr_task.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this when fetch_transcript returned NO_CAPTIONS or when the video has no captions.' Provides context on when to use versus alternatives, and instructs to poll with get_asr_task.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SubDownload/subdownload-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server