Gemini MCP Server

Generate Video

generate_video

Create realistic 4-8 second videos from text prompts using AI. Supports optional starting image and reference images for consistency, with synchronized audio.

Instructions

Generate videos using Google Veo 3.1 AI model. Creates realistic 4-8 second videos from text prompts with optional first-frame image and reference images for character/style consistency. Supports native audio generation. Processing time: 2-5 minutes for 1080p videos. Returns video file path with optional thumbnail and HTML preview player. ⚠️ IMPORTANT: Video generation is ASYNC and takes 2-5 minutes. The tool will poll for completion automatically.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`prompt`	Yes	Detailed description of the video to generate. Be specific about actions, camera movements, lighting, and style. Example: "A close-up shot of a futuristic coffee machine brewing a glowing blue espresso, with steam rising dramatically. Cinematic lighting, 4K quality."
`model`	No	Video generation model (default: veo-3.1-generate-preview)	veo-3.1-generate-preview
`aspectRatio`	No	Video aspect ratio: 16:9 (landscape) or 9:16 (portrait/vertical)	16:9
`resolution`	No	Video resolution. Higher resolutions take longer to generate and result in larger files.	1080p
`durationSeconds`	No	Video duration in seconds (4, 6, or 8 seconds)
`generateAudio`	No	Generate native synchronized audio effects and dialogue based on the prompt
`sampleCount`	No	Number of video samples to generate (1-4). Each sample is a separate generation.
`seed`	No	Optional seed for deterministic output. Use the same seed with the same prompt for consistent results.
`outputPath`	No	Optional custom output path for the video file (e.g., C:/videos/output.mp4). If not provided, saves to default output directory with timestamped filename.
`generateThumbnail`	No	Extract thumbnail from video (requires ffmpeg installed). Thumbnail is saved alongside video.
`generateHTMLPlayer`	No	Generate interactive HTML video player with preview and download options
`firstFrameImage`	No	Starting frame image for image-to-video generation. Provide via filePath (local file) or data+mimeType (base64). The video will animate from this image. Supports JPEG, PNG, WebP.
`referenceImages`	No	Up to 3 reference images for character/style consistency. Each needs a referenceType ("asset" or "style") and an image.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description fully carries the behavioral transparency burden. It discloses the async processing, auto-polling, processing time estimates, and output features (file path, thumbnail, HTML player). No contradictions exist since annotations are absent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose and structured into two clear paragraphs. It is concise but includes necessary warnings and details. Minor redundancy (e.g., mentioning processing time twice) could be trimmed, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, nested objects, no output schema), the description adequately covers inputs, behavior, and basic output format. However, it lacks explicit error handling details, failure modes, or return type schema, which would be beneficial for a complex async tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are well-documented. The description adds value beyond the schema by summarizing key behaviors (e.g., resolution impact on time, audio generation, thumbnail extraction) and providing a concrete prompt example, which aids agent decision-making.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates videos using Google Veo 3.1 AI model, specifying duration (4-8 seconds), input types (text prompts with optional images), and outputs (file path, thumbnail, HTML player). It distinguishes itself from siblings like generate_image (image generation) and transcribe (audio processing) by focusing on video creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly guides usage by noting the async nature and 2-5 minute processing time, but it does not explicitly state when to use this tool versus alternatives like generate_image for static images or when not to use it (e.g., for editing existing videos). The warning and auto-polling note help manage expectations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Raindancer118/gemini-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server