Skip to main content
Glama

get-segmented-transcript

Split a YouTube video transcript into equal time segments for focused analysis. Extract markdown-formatted text with timestamps, enabling easy navigation and detailed study of specific parts of the content.

Instructions

Divide a video transcript into segments for easier analysis and navigation. This tool splits the video into equal time segments and extracts the transcript for each segment with proper timestamps. Ideal for analyzing the structure of longer videos or when you need to focus on specific parts of the content. Parameters: videoId (required) - The YouTube video ID; segmentCount (optional) - Number of segments to divide the video into (default: 4, max: 10). Returns a markdown-formatted text with each segment clearly labeled with time ranges and containing the relevant transcript text.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
segmentCountNo
videoIdYes

Implementation Reference

  • Core handler logic for segmenting YouTube video transcripts into equal time-based segments, formatting each with timestamps and transcript text, and returning a structured FormattedTranscript object including markdown output.
    async getSegmentedTranscript(
      videoId: string,
      segmentCount: number = 4
    ): Promise<FormattedTranscript> {
      try {
        // Get full transcript
        const transcriptData = await this.getTranscript(videoId);
    
        // Get video details for title and other metadata
        const videoData = await this.getVideoDetails(videoId);
        const video = videoData.items?.[0];
    
        if (!transcriptData.length) {
          throw new Error('No transcript available for this video');
        }
    
        // Calculate total duration
        const lastSegment = transcriptData[transcriptData.length - 1];
        const totalDuration = (lastSegment.offset + lastSegment.duration) / 1000; // in seconds
    
        // Calculate segment size
        const segmentDuration = totalDuration / segmentCount;
        const segments: {
          startTime: number;
          endTime: number;
          text: string;
          transcriptSegments: TranscriptSegment[];
        }[] = [];
    
        // Create segments
        for (let i = 0; i < segmentCount; i++) {
          const startTime = i * segmentDuration;
          const endTime = (i + 1) * segmentDuration;
    
          // Find all transcript segments that fall within this time range
          const segmentTranscript = transcriptData.filter(segment => {
            const segmentStartTime = segment.offset / 1000;
            const segmentEndTime = (segment.offset + segment.duration) / 1000;
            return segmentStartTime >= startTime && segmentStartTime < endTime;
          });
    
          if (segmentTranscript.length > 0) {
            segments.push({
              startTime,
              endTime,
              text: segmentTranscript.map(s => s.text).join(' '),
              transcriptSegments: segmentTranscript
            });
          }
        }
    
        // Create formatted output
        const title = video?.snippet?.title || 'Video Transcript';
        let formattedText = `# Segmented Transcript: ${title}\n\n`;
    
        segments.forEach((segment, index) => {
          const startTimeFormatted = this.formatTimestamp(segment.startTime * 1000);
          const endTimeFormatted = this.formatTimestamp(segment.endTime * 1000);
    
          formattedText += `## Segment ${index + 1} [${startTimeFormatted} - ${endTimeFormatted}]\n\n`;
    
          // Add transcript for this segment
          formattedText += segment.transcriptSegments.map(s =>
            `[${this.formatTimestamp(s.offset)}] ${s.text}`
          ).join('\n');
    
          formattedText += '\n\n';
        });
    
        return {
          segments: transcriptData,
          totalSegments: transcriptData.length,
          duration: totalDuration,
          format: 'timestamped',
          text: formattedText,
          metadata: video ? [{
            id: video.id,
            title: video.snippet?.title,
            channelId: video.snippet?.channelId,
            channelTitle: video.snippet?.channelTitle,
            publishedAt: video.snippet?.publishedAt,
            duration: video.contentDetails?.duration,
            viewCount: video.statistics?.viewCount,
            likeCount: video.statistics?.likeCount
          }] : undefined
        };
      } catch (error) {
        console.error('Error creating segmented transcript:', error);
        throw error;
      }
    }
  • src/index.ts:759-789 (registration)
    MCP tool registration using server.tool(), including full description, Zod input schema, and wrapper async handler that parses parameters, calls YouTubeService.getSegmentedTranscript, and formats the response as MCP content.
    server.tool(
      'get-segmented-transcript',
      'Divide a video transcript into segments for easier analysis and navigation. This tool splits the video into equal time segments and extracts the transcript for each segment with proper timestamps. Ideal for analyzing the structure of longer videos or when you need to focus on specific parts of the content. Parameters: videoId (required) - The YouTube video ID; segmentCount (optional) - Number of segments to divide the video into (default: 4, max: 10). Returns a markdown-formatted text with each segment clearly labeled with time ranges and containing the relevant transcript text.',
      {
        videoId: z.string().min(1),
        segmentCount: z.string().optional()
      },
      async ({ videoId, segmentCount }) => {
        try {
          // 문자열 segmentCount를 숫자로 변환
          const segmentCountNum = segmentCount ? parseInt(segmentCount, 10) : 4;
    
          const segmentedTranscript = await youtubeService.getSegmentedTranscript(videoId, segmentCountNum);
    
          return {
            content: [{
              type: 'text',
              text: segmentedTranscript.text || 'Failed to create segmented transcript'
            }]
          };
        } catch (error) {
          return {
            content: [{
              type: 'text',
              text: `Error creating segmented transcript: ${error instanceof Error ? error.message : String(error)}`
            }],
            isError: true
          };
        }
      }
    );
  • Zod validation schema for tool parameters: videoId (required non-empty string), segmentCount (optional string).
    {
      videoId: z.string().min(1),
      segmentCount: z.string().optional()
    },
  • TypeScript interface defining the structure of the output FormattedTranscript used by the getSegmentedTranscript method.
    export interface FormattedTranscript {
      segments: TranscriptSegment[];
      totalSegments: number;
      duration: number; // Total duration in seconds
      format: string;
      text?: string; // Formatted text (for timestamped and merged formats)
      metadata?: Array<VideoMetadata | null>;
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it splits transcripts into equal time segments, extracts text with timestamps, returns markdown-formatted output, and specifies defaults (segmentCount default: 4) and limits (max: 10). However, it does not cover potential errors (e.g., invalid videoId) or performance aspects like rate limits, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with the core purpose in the first sentence, usage context in the second, and parameter details in the third. Every sentence adds value without redundancy, making it efficient and easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a processing tool with 2 parameters), no annotations, and no output schema, the description does well by covering purpose, usage, parameters, and output format. However, it lacks details on error handling or edge cases (e.g., what happens with very short videos), which would make it more complete for safe invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must fully compensate. It does so by clearly explaining both parameters: 'videoId (required) - The YouTube video ID' and 'segmentCount (optional) - Number of segments to divide the video into (default: 4, max: 10)'. This adds essential meaning beyond the bare schema, including requirements, defaults, and constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('divide', 'split', 'extract') and resource ('video transcript'), distinguishing it from siblings like 'get-video-transcript' (which likely returns the full transcript) and 'get-key-moments' (which likely identifies highlights rather than equal segments). It explicitly mentions what the tool does beyond just retrieving a transcript.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('for easier analysis and navigation', 'ideal for analyzing the structure of longer videos or when you need to focus on specific parts'), but it does not explicitly mention when not to use it or name alternatives (e.g., 'get-video-transcript' for the full transcript). This gives good guidance but lacks explicit exclusions or sibling comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/coyaSONG/youtube-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server