Skip to main content
Glama
kimtaeyoon83

mcp-server-youtube-transcript

by kimtaeyoon83

get_transcript

Extract transcripts from YouTube videos with optional language selection, timestamp inclusion, and ad filtering for content analysis and accessibility.

Instructions

Extract transcript from a YouTube video URL or ID. Automatically falls back to available languages if requested language is not available.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesYouTube video URL or ID
langNoLanguage code for transcript (e.g., 'ko', 'en'). Will fall back to available language if not found.en
include_timestampsNoInclude timestamps in output (e.g., '[0:05] text'). Useful for referencing specific moments. Default: false
strip_adsNoFilter out sponsored segments from transcript based on chapter markers (e.g., chapters marked as 'Werbung', 'Ad', 'Sponsor'). Default: true

Implementation Reference

  • src/index.ts:16-60 (registration)
    Defines the MCP Tool object for 'get_transcript' including name, description, inputSchema, outputSchema, and annotations. Used for both listing tools and validation.
    const TOOLS: Tool[] = [ { name: "get_transcript", description: "Extract transcript from a YouTube video URL or ID. Automatically falls back to available languages if requested language is not available.", inputSchema: { type: "object", properties: { url: { type: "string", description: "YouTube video URL or ID" }, lang: { type: "string", description: "Language code for transcript (e.g., 'ko', 'en'). Will fall back to available language if not found.", default: "en" }, include_timestamps: { type: "boolean", description: "Include timestamps in output (e.g., '[0:05] text'). Useful for referencing specific moments. Default: false", default: false }, strip_ads: { type: "boolean", description: "Filter out sponsored segments from transcript based on chapter markers (e.g., chapters marked as 'Werbung', 'Ad', 'Sponsor'). Default: true", default: true } }, required: ["url"] }, // OutputSchema describes structuredContent format for Claude Code outputSchema: { type: "object", properties: { meta: { type: "string", description: "Title | Author | Subs | Views | Date" }, content: { type: "string" } }, required: ["content"] }, annotations: { title: "Get Transcript", readOnlyHint: true, openWorldHint: true, }, }, ];
  • Primary handler logic for executing the 'get_transcript' tool in response to CallToolRequest. Validates input, extracts video ID, fetches and processes transcript, adds informational notes about language fallback and ad stripping, returns structured MCP response.
    case "get_transcript": { const { url: input, lang = "en", include_timestamps = false, strip_ads = true } = args; if (!input || typeof input !== 'string') { throw new McpError( ErrorCode.InvalidParams, 'URL parameter is required and must be a string' ); } if (lang && typeof lang !== 'string') { throw new McpError( ErrorCode.InvalidParams, 'Language code must be a string' ); } try { const videoId = this.extractor.extractYoutubeId(input); console.log(`Processing transcript for video: ${videoId}, lang: ${lang}, timestamps: ${include_timestamps}, strip_ads: ${strip_ads}`); const result = await this.extractor.getTranscript(videoId, lang, include_timestamps, strip_ads); console.log(`Successfully extracted transcript (${result.text.length} chars, lang: ${result.actualLang}, ads stripped: ${result.adsStripped})`); // Build transcript with notes let transcript = result.text; // Add language fallback notice if different from requested if (result.actualLang !== lang) { transcript = `[Note: Requested language '${lang}' not available. Using '${result.actualLang}'. Available: ${result.availableLanguages.join(', ')}]\n\n${transcript}`; } // Add ad filtering notice based on what happened if (result.adsStripped > 0) { // Ads were filtered by chapter markers transcript = `[Note: ${result.adsStripped} sponsored segment lines filtered out based on chapter markers]\n\n${transcript}`; } else if (strip_ads && result.adChaptersFound === 0) { // No chapter markers found - add prompt hint as fallback transcript += '\n\n[Note: No chapter markers found. If summarizing, please exclude any sponsored segments or ads from the summary.]'; } // Claude Code v2.0.21+ needs structuredContent for proper display return { content: [{ type: "text" as const, text: transcript }], structuredContent: { meta: `${result.metadata.title} | ${result.metadata.author} | ${result.metadata.subscriberCount} subs | ${result.metadata.viewCount} views | ${result.metadata.publishDate}`, content: transcript.replace(/[\r\n]+/g, ' ').replace(/\s+/g, ' ') } }; } catch (error) { console.error('Transcript extraction failed:', error); if (error instanceof McpError) { throw error; } throw new McpError( ErrorCode.InternalError, `Failed to process transcript: ${(error as Error).message}` ); } }
  • Helper method in YouTubeTranscriptExtractor class that fetches subtitles using getSubtitles, strips ad segments based on chapters if requested, formats the transcript text, and returns processed result with metadata.
    async getTranscript(videoId: string, lang: string, includeTimestamps: boolean, stripAds: boolean): Promise<{ text: string; actualLang: string; availableLanguages: string[]; adsStripped: number; adChaptersFound: number; metadata: { title: string; author: string; subscriberCount: string; viewCount: string; publishDate: string; }; }> { try { const result = await getSubtitles({ videoID: videoId, lang: lang, enableFallback: true, }); let lines = result.lines; let adsStripped = 0; // Filter out lines that fall within ad chapters if (stripAds && result.adChapters.length > 0) { const originalCount = lines.length; lines = lines.filter(line => { const lineStartMs = line.start * 1000; // Check if this line falls within any ad chapter return !result.adChapters.some((ad: AdChapter) => lineStartMs >= ad.startMs && lineStartMs < ad.endMs ); }); adsStripped = originalCount - lines.length; if (adsStripped > 0) { console.log(`[youtube-transcript] Filtered ${adsStripped} lines from ${result.adChapters.length} ad chapter(s): ${result.adChapters.map((a: AdChapter) => a.title).join(', ')}`); } } return { text: this.formatTranscript(lines, includeTimestamps), actualLang: result.actualLang, availableLanguages: result.availableLanguages.map((t: CaptionTrack) => t.languageCode), adsStripped, adChaptersFound: result.adChapters.length, metadata: result.metadata }; } catch (error) { console.error('Failed to fetch transcript:', error); throw new McpError( ErrorCode.InternalError, `Failed to retrieve transcript: ${(error as Error).message}` ); } }
  • Formats transcript lines into readable string, optionally with timestamps in [m:ss] or [h:mm:ss] format.
    private formatTranscript(transcript: TranscriptLine[], includeTimestamps: boolean): string { if (includeTimestamps) { return transcript .map(line => { const totalSeconds = Math.floor(line.start); const hours = Math.floor(totalSeconds / 3600); const mins = Math.floor((totalSeconds % 3600) / 60); const secs = totalSeconds % 60; // Use h:mm:ss for videos > 1 hour, mm:ss otherwise const timestamp = hours > 0 ? `[${hours}:${mins.toString().padStart(2, '0')}:${secs.toString().padStart(2, '0')}]` : `[${mins}:${secs.toString().padStart(2, '0')}]`; return `${timestamp} ${line.text.trim()}`; }) .filter(text => text.length > 0) .join('\n'); } return transcript .map(line => line.text.trim()) .filter(text => text.length > 0) .join(' '); }
  • Core helper function that implements YouTube transcript fetching via internal /youtubei/v1/get_transcript API endpoint using protobuf-encoded parameters, visitorData auth, language fallback, page data extraction for captions/chapters/metadata.
    export async function getSubtitles(options: { videoID: string; lang?: string; enableFallback?: boolean; }): Promise<SubtitleResult> { const { videoID, lang = 'en', enableFallback = true } = options; // Validate video ID format if (!videoID || typeof videoID !== 'string') { throw new Error('Invalid video ID: must be a non-empty string'); } // Get page data (visitor data needed for API authentication) const { visitorData, availableLanguages, adChapters, metadata } = await getPageData(videoID); // Determine which language to use let targetLang = lang; if (availableLanguages.length > 0) { const hasRequestedLang = availableLanguages.some(t => t.languageCode === lang); if (!hasRequestedLang && enableFallback) { // Try English first const hasEnglish = availableLanguages.some(t => t.languageCode === 'en'); if (hasEnglish) { targetLang = 'en'; console.error(`[youtube-fetcher] Language '${lang}' not available, falling back to 'en'`); } else { // Use first available targetLang = availableLanguages[0].languageCode; console.error(`[youtube-fetcher] Language '${lang}' not available, falling back to '${targetLang}'`); } } else if (!hasRequestedLang) { throw new Error(`Language '${lang}' not available. Available: ${availableLanguages.map(t => t.languageCode).join(', ')}`); } } // Build request payload using ANDROID client to avoid FAILED_PRECONDITION errors // The ANDROID client bypasses YouTube's A/B test for poToken enforcement const params = buildParams(videoID, targetLang); const payload = JSON.stringify({ context: { client: { hl: targetLang, gl: 'US', clientName: 'ANDROID', clientVersion: ANDROID_CLIENT_VERSION, androidSdkVersion: 30, visitorData: visitorData } }, params: params }); // Make API request let response: string; try { response = await httpsRequest({ hostname: 'www.youtube.com', path: '/youtubei/v1/get_transcript?prettyPrint=false', method: 'POST', headers: { 'Content-Type': 'application/json', 'Content-Length': Buffer.byteLength(payload), 'User-Agent': ANDROID_USER_AGENT, 'Origin': 'https://www.youtube.com' } }, payload); } catch (err) { throw new Error(`Failed to fetch transcript API: ${(err as Error).message}`); } // Parse response with error handling let json: any; try { json = JSON.parse(response); } catch (err) { throw new Error(`Failed to parse YouTube API response: ${(err as Error).message}. Response preview: ${response.substring(0, 200)}`); } // Check for API-level errors if (json.error) { const errorMsg = json.error.message || json.error.code || 'Unknown API error'; throw new Error(`YouTube API error: ${errorMsg}`); } // Extract transcript segments - handle both WEB and ANDROID response formats const webSegments = json?.actions?.[0]?.updateEngagementPanelAction?.content ?.transcriptRenderer?.content?.transcriptSearchPanelRenderer?.body ?.transcriptSegmentListRenderer?.initialSegments; const androidSegments = json?.actions?.[0]?.elementsCommand?.transformEntityCommand ?.arguments?.transformTranscriptSegmentListArguments?.overwrite?.initialSegments; const segments = webSegments || androidSegments || []; if (segments.length === 0) { throw new Error('No transcript available for this video. The video may not have captions enabled.'); } // Convert to TranscriptLine format const lines = segments .filter((seg: any) => seg?.transcriptSegmentRenderer) // Skip section headers .map((seg: any) => { const renderer = seg.transcriptSegmentRenderer; // Handle both WEB format (snippet.runs) and ANDROID format (snippet.elementsAttributedString) const webText = renderer?.snippet?.runs?.map((r: any) => r.text || '').join(''); const androidText = renderer?.snippet?.elementsAttributedString?.content; const text = webText || androidText || ''; const startMs = parseInt(renderer?.startMs || '0', 10); const endMs = parseInt(renderer?.endMs || '0', 10); return { text: text, start: startMs / 1000, dur: (endMs - startMs) / 1000 }; }) .filter((line: TranscriptLine) => line.text.length > 0); return { lines, requestedLang: lang, actualLang: targetLang, availableLanguages, adChapters, metadata }; }

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kimtaeyoon83/mcp-server-youtube-transcript'

If you have feedback or need assistance with the MCP directory API, please join our Discord server