Skip to main content
Glama

get_transcript

Retrieve transcript text from any YouTube video. Supports auto-detection or selection of language, and optional timestamps. Get clean text for accessibility, analysis, or content creation.

Instructions

Get transcript text from a YouTube video.

Args: url: YouTube URL or video ID language: Language code (default: auto-detect) with_timestamps: Include timestamps in output

Returns: Dict containing video_id, language, and transcript text

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
languageNoauto
with_timestampsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The get_transcript function: extracts a YouTube video transcript using youtube-transcript-api. Accepts a URL/video ID, optional language (default auto), and with_timestamps flag. Returns a dict with video_id, language, and transcript text.
    def get_transcript(
        url: str,
        language: str = "auto",
        with_timestamps: bool = False,
    ) -> dict[str, Any]:
        """Get transcript text from a YouTube video.
    
        Args:
            url: YouTube URL or video ID
            language: Language code (default: auto-detect)
            with_timestamps: Include timestamps in output
    
        Returns:
            Dict containing video_id, language, and transcript text
        """
        video_id = extract_video_id(url)
        api = YouTubeTranscriptApi()
    
        # Get available transcripts
        transcript_list = api.list(video_id)
    
        # Determine language priority (English first, then Japanese for auto; otherwise specified + English)
        preferred_languages = ["en", "ja"] if language == "auto" else [language, "en"]
    
        # Find transcript
        transcript = None
        used_language = None
    
        # Try manual transcripts first
        for lang in preferred_languages:
            try:
                transcript = transcript_list.find_transcript([lang])
                used_language = lang
                break
            except NoTranscriptFound:
                continue
    
        # Fall back to any available transcript
        if transcript is None:
            for t in transcript_list:
                transcript = t
                used_language = t.language_code
                break
    
        if transcript is None:
            return {
                "video_id": video_id,
                "language": None,
                "transcript": None,
                "error": "No transcript found",
            }
    
        # Fetch and format transcript
        entries = transcript.fetch()
    
        if with_timestamps:
            lines = [f"{format_timestamp(entry.start)} {entry.text}" for entry in entries]
        else:
            lines = [entry.text for entry in entries]
    
        return {
            "video_id": video_id,
            "language": used_language,
            "transcript": "\n".join(lines),
        }
  • Function signature defines the input schema: url (str), language (str, default 'auto'), with_timestamps (bool, default False). Return type is dict[str, Any].
    def get_transcript(
        url: str,
        language: str = "auto",
        with_timestamps: bool = False,
    ) -> dict[str, Any]:
  • Tool registration: mcp.tool()(get_transcript) registers get_transcript as an MCP tool on the FastMCP 'youtube' server.
    mcp.tool()(get_transcript)
  • Helper function format_timestamp: converts seconds to [H:MM:SS] or [M:SS] format, used when with_timestamps is True.
    def format_timestamp(seconds: float) -> str:
        """Format seconds as [H:MM:SS] or [M:SS]."""
        hours = int(seconds // 3600)
        minutes = int((seconds % 3600) // 60)
        secs = int(seconds % 60)
        if hours > 0:
            return f"[{hours}:{minutes:02d}:{secs:02d}]"
        return f"[{minutes}:{secs:02d}]"
  • Helper function extract_video_id: parses various YouTube URL formats (standard, shortened, embed, shorts, live, mobile) to extract the 11-character video ID.
    def extract_video_id(url: str) -> str:
        """Extract video ID from various forms of YouTube URLs.
    
        Handles standard, shortened, embed, shorts, live URLs,
        as well as regional domains and mobile versions.
    
        Args:
            url: YouTube URL or video ID
    
        Returns:
            The extracted video ID
    
        Raises:
            ValueError: If the video ID cannot be extracted
        """
        if not url:
            raise ValueError("URL is required")
    
        # Handle case where input is just the 11-character video ID
        if re.match(r"^[a-zA-Z0-9_-]{11}$", url):
            return url
    
        # Prepend scheme if not present
        if not re.match(r"https?://", url):
            url = "https://" + url
    
        parsed = urlparse(url)
        hostname = parsed.hostname
    
        if not hostname:
            raise ValueError(f"Could not parse URL: {url}")
    
        # youtu.be shortlinks
        if hostname == "youtu.be":
            video_id = parsed.path.lstrip("/").split("/")[0].split("?")[0]
            if video_id:
                return video_id
            raise ValueError(f"Could not extract video ID from URL: {url}")
    
        # youtube.com variants
        youtube_pattern = r"^(?:www\.|m\.|music\.)?youtube\.com$"
        if not re.match(youtube_pattern, hostname):
            raise ValueError(f"Not a YouTube URL: {url}")
    
        # /watch?v=VIDEO_ID
        if parsed.path == "/watch":
            query_params = parse_qs(parsed.query)
            video_ids = query_params.get("v")
            if video_ids:
                return video_ids[0]
            raise ValueError(f"Could not extract video ID from URL: {url}")
    
        # /embed/VIDEO_ID, /shorts/VIDEO_ID, /live/VIDEO_ID, /v/VIDEO_ID
        path_parts = parsed.path.split("/")
        if len(path_parts) >= 3 and path_parts[1] in ("embed", "shorts", "live", "v"):
            video_id = path_parts[2].split("?")[0]
            if video_id:
                return video_id
    
        raise ValueError(f"Could not extract video ID from URL: {url}")
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, but the description explains the return structure (dict with video_id, language, transcript). It does not disclose potential errors, rate limits, or side effects, but for a simple read operation, basic behavioral context is covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with a clear structure: a one-line purpose, structured Args, and Returns sections. No redundant information; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (implied), the description adequately explains the return format. It covers the main functionality and parameters, though it could detail error handling or edge cases for full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description adds clear meaning for all three parameters (url, language, with_timestamps) in the 'Args' section, including defaults and purpose. This compensates for the missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get transcript text from a YouTube video,' specifying the verb and resource. It effectively distinguishes from the sibling tool 'get_video_info', which presumably handles video metadata.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites or scenarios where it should not be used. It only explains parameters without contextual usage advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kzmshx/youtube-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server