Youtube Vision MCP

summarize_youtube_video

Generate text summaries of YouTube videos by providing a URL, with options for short, medium, or long output lengths.

Instructions

Generates a summary of a given YouTube video URL using Gemini Vision API.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`youtube_url`	Yes
`summary_length`	No	Desired summary length: 'short', 'medium', or 'long' (default: 'medium').	medium

Implementation Reference

src/index.ts:137-182 (handler)

The main handler function for the 'summarize_youtube_video' tool. It validates input using the schema, constructs a prompt based on summary length, calls the Gemini API via callGeminiApi helper with the YouTube URL as video input, and returns the generated summary or handles errors.

case "summarize_youtube_video": {
   try {
     // Parse and validate arguments
     const args = SummarizeYoutubeVideoInputSchema.parse(request.params.arguments);
     const { youtube_url, summary_length } = args;
     const length = summary_length || 'medium'; // Use default if not provided

     console.error(`[INFO] Received request to summarize YouTube URL: ${youtube_url} (Length: ${length})`);

     // Construct the prompt for Gemini API
     const finalPrompt = `Please summarize this video. Aim for a ${length} length summary.`;

     // Call Gemini API using the helper function
     const summary = await callGeminiApi(finalPrompt, {
       mimeType: "video/youtube",
       fileUri: youtube_url,
     });

     console.error(`[INFO] Successfully generated summary.`);
     // Return success response
     return {
       content: [{ type: "text", text: summary }],
     };

   } catch (error: any) {
     console.error(`[ERROR] Failed during summarize_youtube_video tool execution:`, error);

     // Handle Zod validation errors
     if (error instanceof z.ZodError) {
       return {
         content: [{ type: "text", text: `Invalid input: ${JSON.stringify(error.errors)}` }],
         isError: true,
       };
     }

     // Handle generic errors
     let errorMessage = `Failed to generate summary for the video.`;
     if (error.message) {
       errorMessage += ` Details: ${error.message}`;
     }
     return {
       content: [{ type: "text", text: errorMessage }],
       isError: true,
     };
   }
 }

src/index.ts:45-49 (schema)

Zod input schema for the tool, defining required YouTube URL and optional summary length enum.

const SummaryLengthEnum = z.enum(['short', 'medium', 'long']).default('medium');
const SummarizeYoutubeVideoInputSchema = z.object({
  youtube_url: z.string().url({ message: "Invalid YouTube URL provided." }),
  summary_length: SummaryLengthEnum.optional().describe("Desired summary length: 'short', 'medium', or 'long' (default: 'medium')."),
});

src/index.ts:70-74 (registration)

Tool registration in the ListTools handler, providing name, description, and JSON schema derived from Zod schema.

{
  name: "summarize_youtube_video",
  description: "Generates a summary of a given YouTube video URL using Gemini Vision API.",
  inputSchema: zodToJsonSchema(SummarizeYoutubeVideoInputSchema),
},

src/index.ts:96-127 (helper)

Shared helper function used by the tool handler to call the Gemini API with a prompt and YouTube video file data, handling errors and returning the generated text.

async function callGeminiApi(prompt: string, fileData: { mimeType: string; fileUri: string }): Promise<string> {
  try {
    const result = await geminiModel.generateContent([
      prompt,
      { fileData },
    ]);
    const response = result.response;
    return response.text();
  } catch (error: any) {
    console.error(`[ERROR] Gemini API call failed:`, error);
    // Attempt to provide more specific error info based on message content
    // (Since GoogleGenerativeAIError type seems unavailable for direct check)
    if (error instanceof Error) {
      // Check for common messages indicating client-side issues (API key, quota, etc.)
      // This part might need refinement based on actual observed error messages.
      if (error.message.includes('API key') || error.message.includes('permission denied')) {
         throw new Error(`Authentication/Authorization Error with Gemini API: ${error.message}`);
      } else if (error.message.includes('quota')) {
         throw new Error(`Gemini API quota likely exceeded: ${error.message}`);
      } else if (error.message.toLowerCase().includes('invalid')) { // Generic check for invalid inputs
         throw new Error(`Invalid input likely provided to Gemini API: ${error.message}`);
      } else if (error.message.includes('500') || error.message.includes('server error') || error.message.includes('network issue')) {
         // Guessing based on common error patterns for server/network issues
         throw new Error(`Gemini API server error or network issue: ${error.message}`);
      }
      // Re-throw generic error if specific checks don't match
      throw new Error(`Gemini API Error: ${error.message}`);
    }
    // Re-throw if it's not an Error instance for some reason
    throw error; // Keep original error if not an Error instance
  }
}

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the method ('using Gemini Vision API') but lacks details on rate limits, authentication needs, error handling, or output format. For a tool that likely involves API calls and video processing, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. It's appropriately sized for the tool's complexity, with zero waste or redundancy, making it easy to understand at a glance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving video processing and an external API), lack of annotations, no output schema, and incomplete parameter documentation, the description is insufficient. It doesn't cover behavioral aspects like performance, limitations, or what the summary output looks like, leaving significant gaps for effective tool use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (only 'summary_length' has a description). The description adds no parameter semantics beyond the schema, as it doesn't explain the 'youtube_url' parameter or provide additional context for 'summary_length'. With partial schema coverage, the description doesn't compensate for the undocumented parameter, resulting in a baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generates a summary of a given YouTube video URL using Gemini Vision API.' It specifies the verb ('Generates a summary'), resource ('YouTube video URL'), and method ('using Gemini Vision API'). However, it doesn't explicitly differentiate from sibling tools like 'ask_about_youtube_video' or 'extract_key_moments', which might offer similar or overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools or contexts where this tool is preferred, such as for quick overviews versus detailed analysis. Without such guidance, users might struggle to choose between this and tools like 'ask_about_youtube_video' or 'extract_key_moments'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/minbang930/Youtube-Vision-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server