get_timed_transcript
Extract timestamped transcripts from YouTube videos to analyze content, create subtitles, or study video material with precise time references.
Instructions
Retrieves the transcript of a YouTube video with timestamps.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL of the YouTube video | |
| lang | No | The preferred language for the transcript | en |
| next_cursor | No | Cursor to retrieve the next page of the transcript |
Implementation Reference
- Main handler function for 'get_timed_transcript' tool. Parses video URL, fetches transcript snippets using helper, applies pagination and response limit if set, returns TimedTranscript with title, snippets, and next_cursor.@mcp.tool() async def get_timed_transcript( ctx: Context[ServerSession, AppContext], url: str = Field(description="The URL of the YouTube video"), lang: str = Field(description="The preferred language for the transcript", default="en"), next_cursor: str | None = Field(description="Cursor to retrieve the next page of the transcript", default=None), ) -> TimedTranscript: """Retrieves the transcript of a YouTube video with timestamps.""" title, snippets = _get_transcript_snippets(ctx.request_context.lifespan_context, _parse_video_id(url), lang) if response_limit is None or response_limit <= 0: return TimedTranscript( title=title, snippets=[TranscriptSnippet.from_fetched_transcript_snippet(s) for s in snippets] ) res = [] size = len(title) + 1 cursor = None for i, s in islice(enumerate(snippets), int(next_cursor or 0), None): snippet = TranscriptSnippet.from_fetched_transcript_snippet(s) if size + len(snippet) + 1 > response_limit: cursor = str(i) break res.append(snippet) return TimedTranscript(title=title, snippets=res, next_cursor=cursor)
- Pydantic model defining the output schema for get_timed_transcript: title, list of timed snippets, and pagination cursor.class TimedTranscript(BaseModel): """Transcript of a YouTube video with timestamps.""" title: str = Field(description="Title of the video") snippets: list[TranscriptSnippet] = Field(description="Transcript snippets of the video") next_cursor: str | None = Field(description="Cursor to retrieve the next page of the transcript", default=None)
- Pydantic model for individual timed transcript snippet, used in TimedTranscript.snippets. Includes conversion from youtube_transcript_api snippet.class TranscriptSnippet(BaseModel): """Transcript snippet of a YouTube video.""" text: str = Field(description="Text of the transcript snippet") start: float = Field(description="The timestamp at which this transcript snippet appears on screen in seconds.") duration: float = Field(description="The duration of how long the snippet in seconds.") def __len__(self) -> int: return len(self.model_dump_json()) @classmethod def from_fetched_transcript_snippet( cls: type[TranscriptSnippet], snippet: FetchedTranscriptSnippet ) -> TranscriptSnippet: return cls(text=snippet.text, start=snippet.start, duration=snippet.duration)
- Cached helper to fetch transcript snippets using YouTubeTranscriptApi, prefers given language or fallback to English. Also scrapes video title from YouTube page.@lru_cache def _get_transcript_snippets(ctx: AppContext, video_id: str, lang: str) -> Tuple[str, list[FetchedTranscriptSnippet]]: if lang == "en": languages = ["en"] else: languages = [lang, "en"] page = ctx.http_client.get( f"https://www.youtube.com/watch?v={video_id}", headers={"Accept-Language": ",".join(languages)} ) page.raise_for_status() soup = BeautifulSoup(page.text, "html.parser") title = soup.title.string if soup.title and soup.title.string else "Transcript" transcripts = ctx.ytt_api.fetch(video_id, languages=languages) return title, transcripts.snippets
- Helper function to extract YouTube video ID from various URL formats (youtu.be or watch?v=).def _parse_video_id(url: str) -> str: parsed_url = urlparse(url) if parsed_url.hostname == "youtu.be": return parsed_url.path.lstrip("/") else: q = parse_qs(parsed_url.query).get("v") if q is None: raise ValueError(f"couldn't find a video ID from the provided URL: {url}.") return q[0]