DownloadClosedCaptions
Extract closed captions from YouTube videos to enable content analysis, summarization, and accessibility applications.
Instructions
Download closed captions from YouTube video.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| video_url | Yes |
Implementation Reference
- src/mcp_youtube/tools.py:117-148 (handler)The main handler function for the DownloadClosedCaptions tool, decorated with @tool_runner.register. It parses the YouTube video URL, retrieves or caches the transcript using youtube_transcript_api, extracts the text content, and returns it as TextContent.@tool_runner.register async def download_closed_captions( args: DownloadClosedCaptions, ) -> t.Sequence[TextContent | ImageContent | EmbeddedResource]: transcripts_dir = xdg_cache_home() / "mcp-youtube" / "transcripts" transcripts_dir.mkdir(parents=True, exist_ok=True) video_id = _parse_youtube_url(args.video_url) if not video_id: raise ValueError(f"Unrecognized YouTube URL: {args.video_url}") if not transcripts_dir.joinpath(f"{video_id}.json").exists(): transcript = YouTubeTranscriptApi.get_transcript(video_id) if not transcript or not isinstance(transcript, list): raise ValueError("No transcript found for the video.") json_data = json.dumps(transcript, indent=None) transcripts_dir.joinpath(f"{video_id}.json").write_text(json_data) else: json_data = transcripts_dir.joinpath(f"{video_id}.json").read_text() transcript = json.loads(json_data) content = " ".join([line["text"] for line in transcript]) return [ TextContent( type="text", text=content, ), ]
- src/mcp_youtube/tools.py:75-79 (schema)Pydantic model defining the input schema for the tool, including the required video_url parameter and tool description docstring.class DownloadClosedCaptions(ToolArgs): """Download closed captions from YouTube video.""" video_url: str
- src/mcp_youtube/tools.py:81-115 (helper)Helper function to parse YouTube URLs (both youtu.be and youtube.com/watch?v= formats) and extract the video ID, used in the handler.def _parse_youtube_url(url: str) -> str | None: """ Parse a YouTube URL and extract the video ID from the v= parameter. Args: url (str): YouTube URL in various formats Returns: str: Video ID if found, None otherwise Examples: >>> parse_youtube_url("https://www.youtube.com/watch?v=dQw4w9WgXcQ") 'dQw4w9WgXcQ' >>> parse_youtube_url("https://youtu.be/dQw4w9WgXcQ") 'dQw4w9WgXcQ' >>> parse_youtube_url("https://www.youtube.com/watch?v=dQw4w9WgXcQ&t=123") 'dQw4w9WgXcQ' """ # Handle youtu.be format if "youtu.be" in url: return url.split("/")[-1].split("?")[0] # Handle regular youtube.com format try: parsed_url = urlparse(url) if "youtube.com" in parsed_url.netloc: params = parse_qs(parsed_url.query) if "v" in params: return params["v"][0] except: # noqa: E722, S110 pass return None