Provides video generation capabilities using Google's Veo 3 API through the Gemini API, enabling text-to-video and image-to-video generation with realistic motion and audio
MCP Veo 3 Video Generation Server
A Model Context Protocol (MCP) server that provides video generation capabilities using Google's Veo 3 API through the Gemini API. Generate high-quality videos from text prompts or images with realistic motion and audio.
Features
๐ฌ Text-to-Video: Generate videos from descriptive text prompts
๐ผ๏ธ Image-to-Video: Animate static images with motion prompts
๐ต Audio Generation: Native audio generation with Veo 3 models
๐จ Multiple Models: Support for Veo 3, Veo 3 Fast, and Veo 2
๐ Aspect Ratios: Widescreen (16:9) and portrait (9:16) support
โ Negative Prompts: Specify what to avoid in generated videos
๐ File Management: List and manage generated videos
โก Async Processing: Non-blocking video generation with progress tracking
Supported Models
Model | Description | Speed | Quality | Audio |
| Latest Veo 3 with highest quality | Slower | Highest | โ |
| Optimized for speed and business use | Faster | High | โ |
| Previous generation model | Medium | Good | โ |
๐ฆ Installation Options
Installation
Option 1: Direct Usage (Recommended)
Option 2: Development Setup
Clone this directory:
git clone https://github.com/dayongd1/mcp-veo3 cd mcp-veo3Install with uv:
uv syncOr use the automated setup:
python setup.pySet up API key:
Get your Gemini API key from Google AI Studio
Create
.envfile:cp env_example.txt .envEdit
.envand add yourGEMINI_API_KEYOr set environment variable:
export GEMINI_API_KEY='your_key'
Configuration
Environment Variables
Create a .env file with the following variables:
MCP Client Configuration
Option 1: Using uvx (Recommended - after PyPI publication)
Option 2: Using uv run (Development)
Option 3: Direct Python
CLI Arguments:
--output-dir(required): Directory to save generated videos--api-key(optional): Gemini API key (overrides environment variable)
Available Tools
1. generate_video
Generate a video from a text prompt.
Parameters:
prompt(required): Text description of the videomodel(optional): Model to use (default: veo-3.0-generate-preview)negative_prompt(optional): What to avoid in the videoaspect_ratio(optional): 16:9 or 9:16 (default: 16:9)output_dir(optional): Directory to save videos (default: generated_videos)
Example:
2. generate_video_from_image
Generate a video from a starting image and motion prompt.
Parameters:
prompt(required): Text description of the desired motion/actionimage_path(required): Path to the starting image filemodel(optional): Model to use (default: veo-3.0-generate-preview)negative_prompt(optional): What to avoid in the videoaspect_ratio(optional): 16:9 or 9:16 (default: 16:9)output_dir(optional): Directory to save videos (default: generated_videos)
Example:
3. list_generated_videos
List all generated videos in the output directory.
Parameters:
output_dir(optional): Directory to list videos from (default: generated_videos)
4. get_video_info
Get detailed information about a video file.
Parameters:
video_path(required): Path to the video file
Usage Examples
Basic Text-to-Video Generation
Image-to-Video with Negative Prompt
Creative Animation
Prompt Writing Tips
Effective Prompts
Be specific: Include details about lighting, mood, camera angles
Describe motion: Specify the type of movement you want
Set the scene: Include environment and atmospheric details
Mention style: Cinematic, realistic, animated, etc.
Example Prompts
Cinematic Realism:
Creative Animation:
Dialogue Scene:
Negative Prompts
Describe what you don't want to see:
โ Don't use "no" or "don't":
"no cars"โ Do describe unwanted elements:
"cars, vehicles, traffic"
Limitations
Generation Time: 11 seconds to 6 minutes depending on complexity
Video Length: 8 seconds maximum
Resolution: 720p output
Storage: Videos are stored on Google's servers for 2 days only
Regional Restrictions: Person generation defaults to "dont_allow" in EU/UK/CH/MENA
Watermarking: All videos include SynthID watermarks
๐จ Troubleshooting
"API key not found"
"Output directory not accessible"
"Video generation timeout"
"Import errors"
Error Handling
The server handles common errors gracefully:
Invalid API Key: Clear error message with setup instructions
File Not Found: Validation for image paths in image-to-video
Generation Timeout: Configurable timeout with progress updates
Model Errors: Fallback error handling with detailed messages
Development
Running Tests
Code Formatting
Contributing
Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request
๐ Links
MCP Docs: https://modelcontextprotocol.io/
Veo 3 API: https://ai.google.dev/gemini-api/docs/video
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
Documentation: Google Veo 3 API Docs
API Key: Get your Gemini API key
Issues: Report bugs and feature requests in the GitHub issues
Changelog
v1.0.1
๐ง API Fix: Updated to match official Veo 3 API specification
Removed unsupported parameters: aspect_ratio, negative_prompt, person_generation
Simplified API calls: Now using only model and prompt parameters as per official docs
Fixed video generation errors: Resolved "unexpected keyword argument" issues
Updated documentation: Added notes about current API limitations
v1.0.0
Initial release
Support for Veo 3, Veo 3 Fast, and Veo 2 models
Text-to-video and image-to-video generation
FastMCP framework with progress tracking
Comprehensive error handling and logging
File management utilities
uv/uvx support for easy installation
Built with FastMCP | Python 3.10+ | MIT License
This server cannot be installed
Related Resources
Related MCP Servers
- AsecurityAlicenseAqualityA server that enables generating videos from static images using Vidu's AI models, with features for image-to-video conversion, task monitoring, and image uploading.Last updated -33MIT License
- -securityFlicense-qualityA server that connects to the xAI/Grok image generation API, allowing users to generate images from text prompts with support for multiple image generation and different response formats.Last updated -7
- AsecurityAlicenseAqualityAllows AI assistants to generate and transform high-quality images from text prompts using Google's Gemini model via the MCP protocol.Last updated -324MIT License
- AsecurityAlicenseAqualityMCP server that exposes Google's Veo2 video generation capabilities, allowing clients to generate videos from text prompts or images.Last updated -730MIT License