Skip to main content
Glama

summarize_youtube_video

Generate text summaries of YouTube videos by providing a URL, with options for short, medium, or long output lengths.

Instructions

Generates a summary of a given YouTube video URL using Gemini Vision API.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
youtube_urlYes
summary_lengthNoDesired summary length: 'short', 'medium', or 'long' (default: 'medium').medium

Implementation Reference

  • The main handler function for the 'summarize_youtube_video' tool. It validates input using the schema, constructs a prompt based on summary length, calls the Gemini API via callGeminiApi helper with the YouTube URL as video input, and returns the generated summary or handles errors.
    case "summarize_youtube_video": {
       try {
         // Parse and validate arguments
         const args = SummarizeYoutubeVideoInputSchema.parse(request.params.arguments);
         const { youtube_url, summary_length } = args;
         const length = summary_length || 'medium'; // Use default if not provided
    
         console.error(`[INFO] Received request to summarize YouTube URL: ${youtube_url} (Length: ${length})`);
    
         // Construct the prompt for Gemini API
         const finalPrompt = `Please summarize this video. Aim for a ${length} length summary.`;
    
         // Call Gemini API using the helper function
         const summary = await callGeminiApi(finalPrompt, {
           mimeType: "video/youtube",
           fileUri: youtube_url,
         });
    
         console.error(`[INFO] Successfully generated summary.`);
         // Return success response
         return {
           content: [{ type: "text", text: summary }],
         };
    
       } catch (error: any) {
         console.error(`[ERROR] Failed during summarize_youtube_video tool execution:`, error);
    
         // Handle Zod validation errors
         if (error instanceof z.ZodError) {
           return {
             content: [{ type: "text", text: `Invalid input: ${JSON.stringify(error.errors)}` }],
             isError: true,
           };
         }
    
         // Handle generic errors
         let errorMessage = `Failed to generate summary for the video.`;
         if (error.message) {
           errorMessage += ` Details: ${error.message}`;
         }
         return {
           content: [{ type: "text", text: errorMessage }],
           isError: true,
         };
       }
     }
  • Zod input schema for the tool, defining required YouTube URL and optional summary length enum.
    const SummaryLengthEnum = z.enum(['short', 'medium', 'long']).default('medium');
    const SummarizeYoutubeVideoInputSchema = z.object({
      youtube_url: z.string().url({ message: "Invalid YouTube URL provided." }),
      summary_length: SummaryLengthEnum.optional().describe("Desired summary length: 'short', 'medium', or 'long' (default: 'medium')."),
    });
  • src/index.ts:70-74 (registration)
    Tool registration in the ListTools handler, providing name, description, and JSON schema derived from Zod schema.
    {
      name: "summarize_youtube_video",
      description: "Generates a summary of a given YouTube video URL using Gemini Vision API.",
      inputSchema: zodToJsonSchema(SummarizeYoutubeVideoInputSchema),
    },
  • Shared helper function used by the tool handler to call the Gemini API with a prompt and YouTube video file data, handling errors and returning the generated text.
    async function callGeminiApi(prompt: string, fileData: { mimeType: string; fileUri: string }): Promise<string> {
      try {
        const result = await geminiModel.generateContent([
          prompt,
          { fileData },
        ]);
        const response = result.response;
        return response.text();
      } catch (error: any) {
        console.error(`[ERROR] Gemini API call failed:`, error);
        // Attempt to provide more specific error info based on message content
        // (Since GoogleGenerativeAIError type seems unavailable for direct check)
        if (error instanceof Error) {
          // Check for common messages indicating client-side issues (API key, quota, etc.)
          // This part might need refinement based on actual observed error messages.
          if (error.message.includes('API key') || error.message.includes('permission denied')) {
             throw new Error(`Authentication/Authorization Error with Gemini API: ${error.message}`);
          } else if (error.message.includes('quota')) {
             throw new Error(`Gemini API quota likely exceeded: ${error.message}`);
          } else if (error.message.toLowerCase().includes('invalid')) { // Generic check for invalid inputs
             throw new Error(`Invalid input likely provided to Gemini API: ${error.message}`);
          } else if (error.message.includes('500') || error.message.includes('server error') || error.message.includes('network issue')) {
             // Guessing based on common error patterns for server/network issues
             throw new Error(`Gemini API server error or network issue: ${error.message}`);
          }
          // Re-throw generic error if specific checks don't match
          throw new Error(`Gemini API Error: ${error.message}`);
        }
        // Re-throw if it's not an Error instance for some reason
        throw error; // Keep original error if not an Error instance
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the method ('using Gemini Vision API') but lacks details on rate limits, authentication needs, error handling, or output format. For a tool that likely involves API calls and video processing, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. It's appropriately sized for the tool's complexity, with zero waste or redundancy, making it easy to understand at a glance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving video processing and an external API), lack of annotations, no output schema, and incomplete parameter documentation, the description is insufficient. It doesn't cover behavioral aspects like performance, limitations, or what the summary output looks like, leaving significant gaps for effective tool use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (only 'summary_length' has a description). The description adds no parameter semantics beyond the schema, as it doesn't explain the 'youtube_url' parameter or provide additional context for 'summary_length'. With partial schema coverage, the description doesn't compensate for the undocumented parameter, resulting in a baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generates a summary of a given YouTube video URL using Gemini Vision API.' It specifies the verb ('Generates a summary'), resource ('YouTube video URL'), and method ('using Gemini Vision API'). However, it doesn't explicitly differentiate from sibling tools like 'ask_about_youtube_video' or 'extract_key_moments', which might offer similar or overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools or contexts where this tool is preferred, such as for quick overviews versus detailed analysis. Without such guidance, users might struggle to choose between this and tools like 'ask_about_youtube_video' or 'extract_key_moments'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/minbang930/Youtube-Vision-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server