Provides video generation capabilities using Google's Veo 3 API through the Gemini API, enabling text-to-video and image-to-video generation with realistic motion and audio
MCP Veo 3 Video Generation Server
A Model Context Protocol (MCP) server that provides video generation capabilities using Google's Veo 3 API through the Gemini API. Generate high-quality videos from text prompts or images with realistic motion and audio.
Features
🎬 Text-to-Video: Generate videos from descriptive text prompts
🖼️ Image-to-Video: Animate static images with motion prompts
🎵 Audio Generation: Native audio generation with Veo 3 models
🎨 Multiple Models: Support for Veo 3, Veo 3 Fast, and Veo 2
📐 Aspect Ratios: Widescreen (16:9) and portrait (9:16) support
❌ Negative Prompts: Specify what to avoid in generated videos
📁 File Management: List and manage generated videos
⚡ Async Processing: Non-blocking video generation with progress tracking
Supported Models
Model | Description | Speed | Quality | Audio |
| Latest Veo 3 with highest quality | Slower | Highest | ✅ |
| Optimized for speed and business use | Faster | High | ✅ |
| Previous generation model | Medium | Good | ❌ |
📦 Installation Options
Installation
Option 1: Direct Usage (Recommended)
Option 2: Development Setup
Clone this directory:
git clone https://github.com/dayongd1/mcp-veo3 cd mcp-veo3Install with uv:
uv syncOr use the automated setup:
python setup.pySet up API key:
Get your Gemini API key from Google AI Studio
Create
.env
file:cp env_example.txt .env
Edit
.env
and add yourGEMINI_API_KEY
Or set environment variable:
export GEMINI_API_KEY='your_key'
Configuration
Environment Variables
Create a .env
file with the following variables:
MCP Client Configuration
Option 1: Using uvx (Recommended - after PyPI publication)
Option 2: Using uv run (Development)
Option 3: Direct Python
CLI Arguments:
--output-dir
(required): Directory to save generated videos--api-key
(optional): Gemini API key (overrides environment variable)
Available Tools
1. generate_video
Generate a video from a text prompt.
Parameters:
prompt
(required): Text description of the videomodel
(optional): Model to use (default: veo-3.0-generate-preview)negative_prompt
(optional): What to avoid in the videoaspect_ratio
(optional): 16:9 or 9:16 (default: 16:9)output_dir
(optional): Directory to save videos (default: generated_videos)
Example:
2. generate_video_from_image
Generate a video from a starting image and motion prompt.
Parameters:
prompt
(required): Text description of the desired motion/actionimage_path
(required): Path to the starting image filemodel
(optional): Model to use (default: veo-3.0-generate-preview)negative_prompt
(optional): What to avoid in the videoaspect_ratio
(optional): 16:9 or 9:16 (default: 16:9)output_dir
(optional): Directory to save videos (default: generated_videos)
Example:
3. list_generated_videos
List all generated videos in the output directory.
Parameters:
output_dir
(optional): Directory to list videos from (default: generated_videos)
4. get_video_info
Get detailed information about a video file.
Parameters:
video_path
(required): Path to the video file
Usage Examples
Basic Text-to-Video Generation
Image-to-Video with Negative Prompt
Creative Animation
Prompt Writing Tips
Effective Prompts
Be specific: Include details about lighting, mood, camera angles
Describe motion: Specify the type of movement you want
Set the scene: Include environment and atmospheric details
Mention style: Cinematic, realistic, animated, etc.
Example Prompts
Cinematic Realism:
Creative Animation:
Dialogue Scene:
Negative Prompts
Describe what you don't want to see:
❌ Don't use "no" or "don't":
"no cars"
✅ Do describe unwanted elements:
"cars, vehicles, traffic"
Limitations
Generation Time: 11 seconds to 6 minutes depending on complexity
Video Length: 8 seconds maximum
Resolution: 720p output
Storage: Videos are stored on Google's servers for 2 days only
Regional Restrictions: Person generation defaults to "dont_allow" in EU/UK/CH/MENA
Watermarking: All videos include SynthID watermarks
🚨 Troubleshooting
"API key not found"
"Output directory not accessible"
"Video generation timeout"
"Import errors"
Error Handling
The server handles common errors gracefully:
Invalid API Key: Clear error message with setup instructions
File Not Found: Validation for image paths in image-to-video
Generation Timeout: Configurable timeout with progress updates
Model Errors: Fallback error handling with detailed messages
Development
Running Tests
Code Formatting
Contributing
Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request
📚 Links
MCP Docs: https://modelcontextprotocol.io/
Veo 3 API: https://ai.google.dev/gemini-api/docs/video
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
Documentation: Google Veo 3 API Docs
API Key: Get your Gemini API key
Issues: Report bugs and feature requests in the GitHub issues
Changelog
v1.0.1
🔧 API Fix: Updated to match official Veo 3 API specification
Removed unsupported parameters: aspect_ratio, negative_prompt, person_generation
Simplified API calls: Now using only model and prompt parameters as per official docs
Fixed video generation errors: Resolved "unexpected keyword argument" issues
Updated documentation: Added notes about current API limitations
v1.0.0
Initial release
Support for Veo 3, Veo 3 Fast, and Veo 2 models
Text-to-video and image-to-video generation
FastMCP framework with progress tracking
Comprehensive error handling and logging
File management utilities
uv/uvx support for easy installation
Built with FastMCP | Python 3.10+ | MIT License
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables video generation from text prompts or images using Google's Veo 3 API. Supports multiple models, audio generation, and various aspect ratios for creating high-quality videos.
Related MCP Servers
- AsecurityAlicenseAqualityA server that enables generating videos from static images using Vidu's AI models, with features for image-to-video conversion, task monitoring, and image uploading.Last updated -33MIT License
- -securityFlicense-qualityA server that connects to the xAI/Grok image generation API, allowing users to generate images from text prompts with support for multiple image generation and different response formats.Last updated -8
- AsecurityAlicenseAqualityAllows AI assistants to generate and transform high-quality images from text prompts using Google's Gemini model via the MCP protocol.Last updated -322MIT License
- AsecurityAlicenseAqualityMCP server that exposes Google's Veo2 video generation capabilities, allowing clients to generate videos from text prompts or images.Last updated -730MIT License