Provides video generation capabilities using Google's Veo 3 API through the Gemini API, enabling text-to-video and image-to-video generation with realistic motion and audio
MCP Veo 3 Video Generation Server
A Model Context Protocol (MCP) server that provides video generation capabilities using Google's Veo 3 API through the Gemini API. Generate high-quality videos from text prompts or images with realistic motion and audio.
Features
- 🎬 Text-to-Video: Generate videos from descriptive text prompts
- 🖼️ Image-to-Video: Animate static images with motion prompts
- 🎵 Audio Generation: Native audio generation with Veo 3 models
- 🎨 Multiple Models: Support for Veo 3, Veo 3 Fast, and Veo 2
- 📐 Aspect Ratios: Widescreen (16:9) and portrait (9:16) support
- ❌ Negative Prompts: Specify what to avoid in generated videos
- 📁 File Management: List and manage generated videos
- ⚡ Async Processing: Non-blocking video generation with progress tracking
Supported Models
Model | Description | Speed | Quality | Audio |
---|---|---|---|---|
veo-3.0-generate-preview | Latest Veo 3 with highest quality | Slower | Highest | ✅ |
veo-3.0-fast-generate-preview | Optimized for speed and business use | Faster | High | ✅ |
veo-2.0-generate-001 | Previous generation model | Medium | Good | ❌ |
📦 Installation Options
Installation
Option 1: Direct Usage (Recommended)
Option 2: Development Setup
- Clone this directory:
- Install with uv:Or use the automated setup:
- Set up API key:
- Get your Gemini API key from Google AI Studio
- Create
.env
file:cp env_example.txt .env
- Edit
.env
and add yourGEMINI_API_KEY
- Or set environment variable:
export GEMINI_API_KEY='your_key'
Configuration
Environment Variables
Create a .env
file with the following variables:
MCP Client Configuration
Option 1: Using uvx (Recommended - after PyPI publication)
Option 2: Using uv run (Development)
Option 3: Direct Python
CLI Arguments:
--output-dir
(required): Directory to save generated videos--api-key
(optional): Gemini API key (overrides environment variable)
Available Tools
1. generate_video
Generate a video from a text prompt.
Parameters:
prompt
(required): Text description of the videomodel
(optional): Model to use (default: veo-3.0-generate-preview)negative_prompt
(optional): What to avoid in the videoaspect_ratio
(optional): 16:9 or 9:16 (default: 16:9)output_dir
(optional): Directory to save videos (default: generated_videos)
Example:
2. generate_video_from_image
Generate a video from a starting image and motion prompt.
Parameters:
prompt
(required): Text description of the desired motion/actionimage_path
(required): Path to the starting image filemodel
(optional): Model to use (default: veo-3.0-generate-preview)negative_prompt
(optional): What to avoid in the videoaspect_ratio
(optional): 16:9 or 9:16 (default: 16:9)output_dir
(optional): Directory to save videos (default: generated_videos)
Example:
3. list_generated_videos
List all generated videos in the output directory.
Parameters:
output_dir
(optional): Directory to list videos from (default: generated_videos)
4. get_video_info
Get detailed information about a video file.
Parameters:
video_path
(required): Path to the video file
Usage Examples
Basic Text-to-Video Generation
Image-to-Video with Negative Prompt
Creative Animation
Prompt Writing Tips
Effective Prompts
- Be specific: Include details about lighting, mood, camera angles
- Describe motion: Specify the type of movement you want
- Set the scene: Include environment and atmospheric details
- Mention style: Cinematic, realistic, animated, etc.
Example Prompts
Cinematic Realism:
Creative Animation:
Dialogue Scene:
Negative Prompts
Describe what you don't want to see:
- ❌ Don't use "no" or "don't":
"no cars"
- ✅ Do describe unwanted elements:
"cars, vehicles, traffic"
Limitations
- Generation Time: 11 seconds to 6 minutes depending on complexity
- Video Length: 8 seconds maximum
- Resolution: 720p output
- Storage: Videos are stored on Google's servers for 2 days only
- Regional Restrictions: Person generation defaults to "dont_allow" in EU/UK/CH/MENA
- Watermarking: All videos include SynthID watermarks
🚨 Troubleshooting
"API key not found"
"Output directory not accessible"
"Video generation timeout"
"Import errors"
Error Handling
The server handles common errors gracefully:
- Invalid API Key: Clear error message with setup instructions
- File Not Found: Validation for image paths in image-to-video
- Generation Timeout: Configurable timeout with progress updates
- Model Errors: Fallback error handling with detailed messages
Development
Running Tests
Code Formatting
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
📚 Links
- PyPI: https://pypi.org/project/mcp-veo3/
- GitHub: https://github.com/dayongd1/mcp-veo3
- MCP Docs: https://modelcontextprotocol.io/
- Veo 3 API: https://ai.google.dev/gemini-api/docs/video
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- Documentation: Google Veo 3 API Docs
- API Key: Get your Gemini API key
- Issues: Report bugs and feature requests in the GitHub issues
Changelog
v1.0.1
- 🔧 API Fix: Updated to match official Veo 3 API specification
- Removed unsupported parameters: aspect_ratio, negative_prompt, person_generation
- Simplified API calls: Now using only model and prompt parameters as per official docs
- Fixed video generation errors: Resolved "unexpected keyword argument" issues
- Updated documentation: Added notes about current API limitations
v1.0.0
- Initial release
- Support for Veo 3, Veo 3 Fast, and Veo 2 models
- Text-to-video and image-to-video generation
- FastMCP framework with progress tracking
- Comprehensive error handling and logging
- File management utilities
- uv/uvx support for easy installation
Built with FastMCP | Python 3.10+ | MIT License
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables video generation from text prompts or images using Google's Veo 3 API. Supports multiple models, audio generation, and various aspect ratios for creating high-quality videos.
Related MCP Servers
- AsecurityAlicenseAqualityA server that enables generating videos from static images using Vidu's AI models, with features for image-to-video conversion, task monitoring, and image uploading.Last updated -32TypeScriptMIT License
- -securityFlicense-qualityA server that connects to the xAI/Grok image generation API, allowing users to generate images from text prompts with support for multiple image generation and different response formats.Last updated -8JavaScript
- AsecurityAlicenseAqualityAllows AI assistants to generate and transform high-quality images from text prompts using Google's Gemini model via the MCP protocol.Last updated -316PythonMIT License
- AsecurityAlicenseAqualityMCP server that exposes Google's Veo2 video generation capabilities, allowing clients to generate videos from text prompts or images.Last updated -728TypeScriptMIT License