Skip to main content
Glama

get_summary

Generate summaries of YouTube videos in multiple languages and formats, extracting key information from video content for quick understanding.

Instructions

Get summary for a YouTube video

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
videoIdYesYouTube video ID
langNoTarget language (default: zh-tw)zh-tw
modeNoSummary mode (default: narrative)narrative

Implementation Reference

  • Main execution handler for the 'get_summary' MCP tool. Validates input, fetches YouTube video metadata and captions using InnerTube API, selects appropriate caption track, calls DeepSRT API for video summary and title translation, then formats a Markdown response with title, metadata, and summary.
      private async handleGetSummary(args: any): Promise<CallToolResult> {
        if (!this.isValidSummaryArgs(args)) {
          throw new McpError(
            ErrorCode.InvalidParams,
            'Invalid summary arguments. Required: videoId'
          );
        }
    
        try {
          const videoId = args.videoId;
          const lang = args.lang || 'zh-tw';
          const mode = args.mode || 'narrative';
    
          // Step 1: Get video info and captions from YouTube
          const videoInfo = await this.getVideoInfo(videoId);
          
          if (!videoInfo.videoDetails) {
            throw new Error('Could not fetch video details');
          }
    
          const { title, author, lengthSeconds } = videoInfo.videoDetails;
          
          // Step 2: Extract captions
          const captionTracks = videoInfo.captions?.playerCaptionsTracklistRenderer?.captionTracks;
          if (!captionTracks || captionTracks.length === 0) {
            throw new Error('No captions available for this video');
          }
    
          // Step 3: Select best caption
          const selectedCaption = this.selectBestCaption(captionTracks);
          if (!selectedCaption) {
            throw new Error('No suitable captions found');
          }
    
          // Step 4: Extract transcript argument from caption URL
          const transcriptArg = new URL(selectedCaption.baseUrl).search.slice(1);
    
          // Step 5: Call DeepSRT API for summarization
          const summaryParams = new URLSearchParams({
            v: videoId,
            action: 'summarize',
            lang: lang,
            mode: mode
          });
    
          const titleParams = new URLSearchParams({
            v: videoId,
            txt: title,
            action: 'translate',
            lang: lang,
            mode: mode
          });
    
          const [summaryResponse, titleResponse] = await Promise.all([
            this.axiosInstance.get(`https://worker.deepsrt.com/transcript2?${summaryParams}`, {
              headers: {
                'Accept': 'application/json',
                'X-Transcript-Arg': transcriptArg,
                'User-Agent': 'DeepSRT-CLI/1.5.4'
              }
            }),
            this.axiosInstance.get(`https://worker.deepsrt.com/transcript2?${titleParams}`, {
              headers: {
                'Accept': 'application/json',
                'X-Transcript-Arg': transcriptArg,
                'User-Agent': 'DeepSRT-CLI/1.5.4'
              }
            })
          ]);
    
          const summaryData = summaryResponse.data;
          const titleData = titleResponse.data;
    
          // Format response
          const translatedTitle = titleData.success ? 
            (titleData.result || titleData.translation || title) : title;
          
          const summaryText = summaryData.summary || summaryData.result || summaryData.content;
    
          if (!summaryText) {
            return {
              content: [
                {
                  type: 'text',
                  text: `Error getting summary: ${summaryData.error || 'No summary generated'}`
                }
              ],
              isError: true
            };
          }
    
          // Format summary with video information
          const duration = Math.floor(parseInt(lengthSeconds) / 60) + ':' + 
                          (parseInt(lengthSeconds) % 60).toString().padStart(2, '0');
    
          const formattedSummary = `# ${translatedTitle}
    
    **Author:** ${author}  
    **Duration:** ${duration}  
    **Language:** ${lang}  
    **Mode:** ${mode}
    
    ## Summary
    
    ${summaryText}
    
    ---
    *Generated using DeepSRT MCP Server*`;
    
          return {
            content: [
              {
                type: 'text',
                text: formattedSummary
              }
            ]
          };
    
        } catch (error) {
          return {
            content: [
              {
                type: 'text',
                text: `Error getting summary: ${error instanceof Error ? error.message : String(error)}`
              }
            ],
            isError: true
          };
        }
      }
  • JSON Schema defining the input parameters for the 'get_summary' tool: videoId (required string), lang (optional string, default 'zh-tw'), mode (optional enum ['narrative','bullet'], default 'narrative').
    inputSchema: {
      type: 'object',
      properties: {
        videoId: {
          type: 'string',
          description: 'YouTube video ID',
        },
        lang: {
          type: 'string',
          description: 'Target language (default: zh-tw)',
          default: 'zh-tw',
        },
        mode: {
          type: 'string',
          enum: ['narrative', 'bullet'],
          description: 'Summary mode (default: narrative)',
          default: 'narrative',
        },
      },
      required: ['videoId'],
    },
  • src/index.ts:68-92 (registration)
    Tool registration in the ListTools response, defining name 'get_summary', description, and inputSchema.
    {
      name: 'get_summary',
      description: 'Get summary for a YouTube video',
      inputSchema: {
        type: 'object',
        properties: {
          videoId: {
            type: 'string',
            description: 'YouTube video ID',
          },
          lang: {
            type: 'string',
            description: 'Target language (default: zh-tw)',
            default: 'zh-tw',
          },
          mode: {
            type: 'string',
            enum: ['narrative', 'bullet'],
            description: 'Summary mode (default: narrative)',
            default: 'narrative',
          },
        },
        required: ['videoId'],
      },
    },
  • src/index.ts:117-120 (registration)
    Dispatch logic in CallToolRequest handler that routes 'get_summary' calls to the handleGetSummary method.
    if (request.params.name === 'get_summary') {
      return this.handleGetSummary(request.params.arguments);
    } else if (request.params.name === 'get_transcript') {
      return this.handleGetTranscript(request.params.arguments);
  • Validation helper function for get_summary arguments, type-guarding the expected input shape.
    private isValidSummaryArgs(
      args: any
    ): args is { videoId: string; lang?: string; mode?: SummaryMode } {
      return (
        typeof args === 'object' &&
        args !== null &&
        typeof args.videoId === 'string' &&
        args.videoId.length > 0 &&
        (args.lang === undefined || typeof args.lang === 'string') &&
        (args.mode === undefined ||
          args.mode === 'narrative' ||
          args.mode === 'bullet')
      );
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states what the tool does but doesn't mention any behavioral traits such as rate limits, authentication needs, error handling, or what the summary output looks like (e.g., format, length). This leaves significant gaps for a tool that likely interacts with external APIs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without any wasted words. It is appropriately sized and front-loaded, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description is incomplete. It doesn't address key contextual aspects like the summary format, potential errors, or how it differs from the sibling tool. For a tool with external dependencies (YouTube API), more information on behavior and constraints is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (videoId, lang, mode) with descriptions and defaults. The description adds no additional meaning beyond what the schema provides, such as explaining what 'narrative' vs 'bullet' modes entail or how the lang parameter affects the summary.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get summary') and resource ('for a YouTube video'), making the purpose immediately understandable. It distinguishes from the sibling tool 'get_transcript' by focusing on summaries rather than transcripts, though it doesn't explicitly mention this distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'get_transcript'. The description lacks context about prerequisites, limitations, or scenarios where this tool is preferred, leaving the agent with no usage direction beyond the basic purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DeepSRT/deepsrt-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server