Skip to main content
Glama
jedarden

YouTube Transcript DL MCP Server

by jedarden

get_bulk_transcripts

Extract transcripts from multiple YouTube videos simultaneously in various languages and formats for batch processing.

Instructions

Extract transcripts from multiple YouTube videos

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
videoIdsYesArray of YouTube video IDs or URLs
languageNoLanguage code (e.g., "en", "es", "fr")en
outputFormatNoOutput formatjson
includeMetadataNoInclude metadata in response

Implementation Reference

  • The main handler function for the 'get_bulk_transcripts' tool in the MCP server. It validates input, constructs the request object, delegates to the transcript service, and returns a formatted MCP response.
    private async handleGetBulkTranscripts(args: any) {
      const { videoIds, language = 'en', outputFormat = 'json', includeMetadata = true } = args;
      
      if (!videoIds || !Array.isArray(videoIds) || videoIds.length === 0) {
        throw new McpError(ErrorCode.InvalidParams, 'videoIds array is required');
      }
    
      const request = {
        videoIds,
        language,
        outputFormat,
        includeMetadata
      };
    
      const result = await this.transcriptService.getBulkTranscripts(request);
      
      return {
        content: [{
          type: 'text',
          text: JSON.stringify(result, null, 2)
        }]
      };
    }
  • Registration of the 'get_bulk_transcripts' tool in the server's list of available tools, defining its metadata and input schema for the MCP ListTools request.
    {
      name: 'get_bulk_transcripts',
      description: 'Extract transcripts from multiple YouTube videos',
      inputSchema: {
        type: 'object',
        properties: {
          videoIds: {
            type: 'array',
            items: { type: 'string' },
            description: 'Array of YouTube video IDs or URLs'
          },
          language: {
            type: 'string',
            description: 'Language code (e.g., "en", "es", "fr")',
            default: 'en'
          },
          outputFormat: {
            type: 'string',
            enum: ['text', 'json', 'srt'],
            description: 'Output format',
            default: 'json'
          },
          includeMetadata: {
            type: 'boolean',
            description: 'Include metadata in response',
            default: true
          }
        },
        required: ['videoIds']
      }
    },
  • TypeScript type definitions for the BulkTranscriptRequest (input) and BulkTranscriptResponse (output) used by the tool implementation.
    export interface BulkTranscriptRequest {
      videoIds: string[];
      outputFormat: 'text' | 'json' | 'srt';
      language?: string;
      includeMetadata?: boolean;
    }
    
    export interface BulkTranscriptResponse {
      results: TranscriptResponse[];
      errors: Array<{
        videoId: string;
        error: string;
      }>;
      summary: {
        total: number;
        successful: number;
        failed: number;
      };
    }
  • Helper service method containing the core logic for fetching bulk transcripts. It invokes a Python script for extraction, processes the results, handles errors, and returns structured responses.
    public async getBulkTranscripts(
      request: BulkTranscriptRequest
    ): Promise<BulkTranscriptResponse> {
      try {
        this.logger.info(`Processing bulk request for ${request.videoIds.length} videos`);
    
        // Call Python script for bulk processing
        const videoIds = request.videoIds.map(id => this.extractVideoId(id)).join(',');
        const command = `python3 "${this.pythonScript}" bulk --video-ids "${videoIds}" --language "${request.language || 'en'}"`;
        
        const { stdout, stderr } = await execAsync(command);
    
        if (stderr) {
          this.logger.warn(`Python script warning: ${stderr}`);
        }
    
        const pythonResult: PythonBulkResult = JSON.parse(stdout);
    
        if (!pythonResult.success) {
          throw new Error('Bulk processing failed');
        }
    
        // Convert results to our format
        const results: TranscriptResponse[] = [];
        for (const result of pythonResult.results) {
          const transcript: TranscriptItem[] = result.transcript.map(item => ({
            text: item.text,
            start: item.start,
            duration: item.duration
          }));
    
          results.push({
            videoId: result.videoId,
            title: await this.getVideoTitle(result.videoId),
            language: result.language,
            transcript,
            metadata: {
              extractedAt: new Date().toISOString(),
              source: 'youtube-transcript-api',
              duration: result.metadata?.duration || transcript.reduce((acc, item) => acc + item.duration, 0)
            }
          });
        }
    
        return {
          results,
          errors: pythonResult.errors,
          summary: pythonResult.summary
        };
    
      } catch (error) {
        this.logger.error(`Failed to process bulk request:`, error);
        return {
          results: [],
          errors: request.videoIds.map(videoId => ({
            videoId: this.extractVideoId(videoId),
            error: error instanceof Error ? error.message : 'Unknown error'
          })),
          summary: {
            total: request.videoIds.length,
            successful: 0,
            failed: request.videoIds.length
          }
        };
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure but only states the basic operation. It doesn't mention whether this is a read-only operation, potential rate limits, authentication requirements, error handling (e.g., for invalid video IDs), or what happens when transcripts are unavailable. For a bulk operation tool with zero annotation coverage, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that gets straight to the point with zero wasted words. It's appropriately sized for a tool with clear parameters documented elsewhere and follows good front-loading principles.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a bulk operation tool with 4 parameters, no annotations, and no output schema, the description is incomplete. It doesn't explain the return format, error conditions, performance characteristics, or how results are structured for multiple videos. The agent would need to guess about important behavioral aspects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description doesn't add any parameter-specific information beyond what's already in the schema (which has 100% coverage). It doesn't explain relationships between parameters, provide examples, or clarify semantics like what 'includeMetadata' actually includes. With high schema coverage, the baseline is 3, but no additional value is added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Extract transcripts') and resource ('from multiple YouTube videos'), making the purpose immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'get_transcript' (single video) or 'get_playlist_transcripts' (playlist-based), which would be needed for a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_transcript' (single video) or 'get_playlist_transcripts' (playlist-based). It also doesn't mention prerequisites, rate limits, or error conditions, leaving the agent with insufficient context for optimal tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jedarden/yt-transcript-dl-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server