Skip to main content
Glama

get_transcript

Retrieve a YouTube video transcript with customizable language, format, timestamps, and metadata.

Instructions

Fetch a YouTube video's transcript.

Args: url: YouTube video URL or video ID languages: Preferred languages in priority order (e.g. ["en", "de"]). Defaults to English. format: Output format — one of: text, json, pretty, webvtt, srt preserve_formatting: Keep HTML formatting tags in the transcript text include_timestamps: When True with format="text", prefix each line with [HH:MM:SS]. Ignored for json/srt/webvtt/pretty (those formats already include timestamps). include_metadata: When True (default), prepend a [METADATA] block (title, channel, published, duration, views, description) before the transcript. Pass False for transcript-only output.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
languagesNo
formatNotext
preserve_formattingNo
include_timestampsNo
include_metadataNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • main.py:260-302 (handler)
    The main handler function for the 'get_transcript' MCP tool. It extracts the video ID, fetches the transcript via YouTubeTranscriptApi, optionally adds timestamps, formats the output (text/json/pretty/webvtt/srt), and optionally prepends a metadata block.
    @mcp.tool()
    def get_transcript(
        url: str,
        languages: list[str] | None = None,
        format: str = "text",
        preserve_formatting: bool = False,
        include_timestamps: bool = False,
        include_metadata: bool = True,
    ) -> str:
        """Fetch a YouTube video's transcript.
    
        Args:
            url: YouTube video URL or video ID
            languages: Preferred languages in priority order (e.g. ["en", "de"]). Defaults to English.
            format: Output format — one of: text, json, pretty, webvtt, srt
            preserve_formatting: Keep HTML formatting tags in the transcript text
            include_timestamps: When True with format="text", prefix each line with [HH:MM:SS]. Ignored for json/srt/webvtt/pretty (those formats already include timestamps).
            include_metadata: When True (default), prepend a [METADATA] block (title, channel, published, duration, views, description) before the transcript. Pass False for transcript-only output.
        """
        try:
            video_id = extract_video_id(url)
        except ValueError as e:
            return f"Error: {e}"
    
        langs = languages or ["en"]
        try:
            transcript = api.fetch(
                video_id,
                languages=langs,
                preserve_formatting=preserve_formatting,
            )
            if include_timestamps and format == "text":
                body = _format_transcript_with_timestamps(transcript)
            else:
                body = _format_transcript(transcript, format)
        except Exception as e:
            return _handle_transcript_error(e, video_id, langs)
    
        if not include_metadata:
            return body
    
        meta = _fetch_metadata(video_id)
        return f"{_format_metadata_block(meta)}\n\n[TRANSCRIPT]\n{body}"
  • main.py:260-260 (registration)
    The @mcp.tool() decorator registers 'get_transcript' as an MCP tool on the FastMCP server instance.
    @mcp.tool()
  • main.py:40-48 (helper)
    Helper function extract_video_id() used to parse a YouTube URL or bare video ID from the user-provided URL argument.
    def extract_video_id(url_or_id: str) -> str:
        """Extract a YouTube video ID from a URL or bare ID string."""
        url_or_id = url_or_id.strip()
        if BARE_ID_REGEX.match(url_or_id):
            return url_or_id
        match = VIDEO_ID_REGEX.search(url_or_id)
        if match:
            return match.group(1)
        raise ValueError(
  • main.py:66-71 (helper)
    Helper _format_transcript() used to format the transcript using the selected formatter (text, json, pretty, webvtt, srt).
    def _format_transcript(transcript, fmt: str) -> str:
        """Format a FetchedTranscript using the specified formatter."""
        formatter = FORMATTERS.get(fmt)
        if formatter is None:
            return f"Error: Unknown format '{fmt}'. Choose from: {', '.join(FORMATTERS)}"
        return formatter.format_transcript(transcript)
  • Helper _format_transcript_with_timestamps() used to render transcript as text with [HH:MM:SS] prefixes when include_timestamps=True.
    def _format_transcript_with_timestamps(transcript) -> str:
        """Render a FetchedTranscript as text with an [HH:MM:SS] prefix on each line."""
        return "\n".join(
            f"{_format_timestamp(snippet.start)} {snippet.text}" for snippet in transcript
        )
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It explains the behavior of all parameters, including interactions (e.g., 'include_timestamps' ignored for certain formats) and defaults. However, it does not disclose potential failure cases (e.g., invalid URL or unavailable transcript).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear opening sentence and a bullet-style Args list. It is informative without excessive verbosity, though the Args section could be slightly condensed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown), the description does not need to detail return values. It covers all parameters and their behavior comprehensively. Minor missing context includes error handling and prerequisites (e.g., valid URL).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining all 6 parameters with defaults, accepted values (e.g., format options), and behavioral nuances (e.g., 'include_timestamps' interaction with format). This adds significant meaning beyond the schema's bare titles and defaults.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Fetch a YouTube video's transcript,' which is a specific verb and resource. It distinguishes itself from siblings like 'list_transcripts' (which lists available transcripts) and 'summarize_transcript' (which summarizes).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives like 'list_transcripts' or 'summarize_transcript.' It provides parameter details but lacks context on exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zlatkoc/youtube-summarize'

If you have feedback or need assistance with the MCP directory API, please join our Discord server