get_transcript

Extract YouTube video transcripts with timestamps in your preferred language. Use this tool to obtain captions for analysis, translation, or content creation.

Instructions

Get transcript for a YouTube video with timestamps

Input Schema

TableJSON Schema

Name	Required	Description	Default
`videoId`	Yes	YouTube video ID or full YouTube URL
`lang`	No	Preferred language code for captions (default: en)	en

Implementation Reference

src/index.ts:261-338 (handler)
The main handler function for the 'get_transcript' tool. It validates input, extracts the YouTube video ID, fetches video details and caption tracks using YouTube's InnerTube API, selects the best caption track, downloads and parses the XML transcript, and formats the output with timestamps, video metadata, and the full transcript.
private async handleGetTranscript(args: any): Promise<CallToolResult> { if (!this.isValidTranscriptArgs(args)) { throw new McpError( ErrorCode.InvalidParams, 'Invalid transcript arguments. Required: videoId' ); } try { // Extract video ID from URL or use directly const videoId = this.extractVideoId(args.videoId); if (!videoId) { throw new Error('Invalid YouTube URL or video ID'); } // Get video info from InnerTube API const videoInfo = await this.getVideoInfo(videoId); if (!videoInfo.videoDetails) { throw new Error('Could not fetch video details'); } const { title, author, lengthSeconds } = videoInfo.videoDetails; const duration = `${Math.floor(parseInt(lengthSeconds) / 60)}:${(parseInt(lengthSeconds) % 60).toString().padStart(2, '0')}`; // Extract captions const captionTracks = videoInfo.captions?.playerCaptionsTracklistRenderer?.captionTracks; if (!captionTracks || captionTracks.length === 0) { throw new Error('No captions available for this video'); } // Select best caption const selectedCaption = this.selectBestCaption(captionTracks, args.lang); if (!selectedCaption) { throw new Error('No suitable captions found'); } const captionType = selectedCaption.kind === 'asr' ? 'auto-generated' : 'manual'; // Fetch transcript content const transcriptResponse = await this.axiosInstance.get(selectedCaption.baseUrl); const parsedTranscript = this.parseXMLTranscript(transcriptResponse.data); // Format response const formattedTranscript = `# ${title} **Author:** ${author} **Duration:** ${duration} **Captions:** ${selectedCaption.name?.simpleText || selectedCaption.languageCode} (${captionType}) ## Transcript ${parsedTranscript.map(segment => `${segment.timestamp} ${segment.text}`).join('\n')} --- *Generated using DeepSRT MCP Server*`; return { content: [ { type: 'text', text: formattedTranscript } ] }; } catch (error) { return { content: [ { type: 'text', text: `Error getting transcript: ${error instanceof Error ? error.message : String(error)}` } ], isError: true }; } }
src/index.ts:96-110 (schema)
Input schema definition for the 'get_transcript' tool, specifying parameters videoId (required) and optional lang.
inputSchema: { type: 'object', properties: { videoId: { type: 'string', description: 'YouTube video ID or full YouTube URL', }, lang: { type: 'string', description: 'Preferred language code for captions (default: en)', default: 'en', }, }, required: ['videoId'], },
src/index.ts:116-127 (registration)
Registration of the CallToolRequestHandler that dispatches 'get_transcript' calls to the handleGetTranscript method.
this.server.setRequestHandler(CallToolRequestSchema, async (request) => { if (request.params.name === 'get_summary') { return this.handleGetSummary(request.params.arguments); } else if (request.params.name === 'get_transcript') { return this.handleGetTranscript(request.params.arguments); } else { throw new McpError( ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}` ); } });
src/index.ts:93-111 (registration)
Tool registration in the ListTools response, defining name, description, and input schema for 'get_transcript'.
{ name: 'get_transcript', description: 'Get transcript for a YouTube video with timestamps', inputSchema: { type: 'object', properties: { videoId: { type: 'string', description: 'YouTube video ID or full YouTube URL', }, lang: { type: 'string', description: 'Preferred language code for captions (default: en)', default: 'en', }, }, required: ['videoId'], }, },
src/index.ts:403-483 (helper)
Core helper function that parses YouTube's XML timedtext format into timestamped transcript segments, handling syllable reconstruction and HTML entity decoding.
private parseXMLTranscript(xmlContent: string): Array<{timestamp: string, text: string}> { const result: Array<{timestamp: string, text: string}> = []; // Handle YouTube's timedtext format if (xmlContent.includes('<timedtext')) { // Extract the body content const bodyMatch = xmlContent.match(/<body>(.*?)<\/body>/s); if (!bodyMatch) return result; const bodyContent = bodyMatch[1]; // Find all <p> tags with their content const pTagRegex = /<p[^>]*t="(\d+)"[^>]*>(.*?)<\/p>/gs; let match; while ((match = pTagRegex.exec(bodyContent)) !== null) { const startTime = parseInt(match[1]); const pContent = match[2]; // Skip empty paragraphs or paragraphs with only whitespace/newlines if (!pContent.trim() || pContent.trim() === '') { continue; } // Extract text from <s> tags within this paragraph const sTagRegex = /<s[^>]*>(.*?)<\/s>/g; const syllables: string[] = []; let sMatch; while ((sMatch = sTagRegex.exec(pContent)) !== null) { let syllable = sMatch[1]; // Decode HTML entities syllable = syllable .replace(/&/g, '&') .replace(/</g, '<') .replace(/>/g, '>') .replace(/"/g, '"') .replace(/'/g, "'") .replace(/ /g, ' '); syllables.push(syllable); } // Reconstruct words from syllables if (syllables.length > 0) { const words: string[] = []; let currentWord = ''; for (const syllable of syllables) { if (syllable.startsWith(' ')) { // This syllable starts a new word if (currentWord.trim()) { words.push(currentWord.trim()); } currentWord = syllable; // Keep the leading space for now } else { // This syllable continues the current word currentWord += syllable; } } // Don't forget the last word if (currentWord.trim()) { words.push(currentWord.trim()); } // Join words with single spaces const fullText = words.join(' ').trim(); // Skip music notation and empty segments if (fullText && !fullText.match(/^\[.*\]$/) && fullText !== '♪♪♪' && fullText.trim() !== '') { const timestamp = this.formatTimestamp(startTime); result.push({ timestamp, text: fullText }); } } } } return result; }

DeepSRT MCP Server

get_transcript

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API