Leverages the Google Files API to manage media uploads and storage, enabling token-efficient video generation by pre-uploading and reusing reference images and videos.
Integrates with Google's Veo 3.1 AI model to generate high-quality videos from text prompts and reference images, supporting features like frame interpolation and video extensions.
Facilitates the use of Google Cloud's Generative Language API and Vertex AI infrastructure for authenticated video generation, operation management, and cost-effective AI workflows.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Veo 3.1 MCP ServerGenerate an 8-second cinematic video of a cozy cabin in a snowstorm"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
🎬 Veo 3.1 MCP Server
Token-Efficient AI Video Generation with Google's Veo 3.1
🎯 What is This?
An MCP server for Google's Veo 3.1 - the state-of-the-art AI video generation model. Generate stunning videos from text prompts, reference images, or interpolate between first/last frames.
Key Features
✅ Text-to-Video - Generate videos from descriptions
✅ Reference Images - Up to 3 images for style guidance
✅ Frame Interpolation - First + last frame → coherent video
✅ Video Extension - Extend Veo-generated videos
✅ Batch Generation - Generate multiple videos with concurrency control
✅ Cost Estimation - Know costs before generating
✅ Token-Efficient - Auto-upload refs to Files API (97% token savings!)
🚀 Quick Start
1. Installation
2. Get API Key
Go to Google AI Studio
Create API key
Enable Veo 3.1 in your project (billing required)
3. Configure
4. Add to Cursor
Add to ~/.cursor/mcp.json:
Restart Cursor. Done! ✅
🛠️ Tools
1. start_video_generation - Generate Video
Basic text-to-video:
With reference images (token-efficient!):
First/last frame interpolation:
Parameters:
model-veo-3.1-generate-001(quality) orveo-3.1-fast-generate-001(speed)durationSeconds- 4, 6, or 8aspectRatio-16:9or9:16resolution-720por1080pgenerateAudio- Include synchronized audio (2x cost)seed- For reproducibilitysampleCount- Generate 1-4 videos
2. get_video_job - Check Status
Returns status and video URLs when complete.
3. upload_image - Pre-Upload References
Returns fileUri valid for 48 hours. Reuse across multiple generations!
4. extend_video - Extend Videos
5. start_batch_video_generation - Batch Generate
6. estimate_veo_cost - Cost Estimation
Returns estimated cost in USD.
💰 Pricing
Model | Video Only | Video + Audio |
veo-3.1-generate-001 (quality) | $0.20/sec | $0.40/sec |
veo-3.1-fast-generate-001 (speed) | $0.10/sec | $0.15/sec |
Example Costs:
8s video (fast, no audio): $0.80
8s video (quality, with audio): $3.20
4s video (fast, no audio): $0.40
📊 Limits & Constraints
Parameter | Limit |
Duration | 4, 6, or 8 seconds |
Reference images | 0-3 images |
Sample count | 1-4 videos |
Resolutions | 720p, 1080p |
Aspect ratios | 16:9, 9:16 |
Rate limit | ~50 requests/min |
💡 Usage Examples
Simple Text-to-Video
With Style Reference
Frame Interpolation
Batch Generation
🔍 How Token Efficiency Works
❌ Naive Approach (Base64)
Cost: Massive token usage per call
✅ Token-Efficient (This MCP)
What Happens:
Server downloads image (no tokens)
Computes SHA-256 hash
Checks cache (48h validity)
Uploads to Files API if needed (~1s)
Uses short
files/abc123URI (~5 tokens)
Savings: 97%+ fewer tokens! 🎉
⏱️ Generation Times
Configuration | Typical Time |
4s, 720p, no audio | 30-60 sec |
8s, 1080p, no audio | 60-120 sec |
8s, 1080p, with audio | 90-150 sec |
With references | +10-30 sec |
Frame interpolation | +20-40 sec |
Note: Times vary based on prompt complexity and server load.
🎨 Best Practices
1. Start Small, Scale Up
2. Use Fast Model for Testing
Switch to quality model for final:
3. Pre-Upload Frequently Used References
4. Leverage Batch for Variations
5. Monitor Costs
Always estimate before large batches:
🎬 Async Operation Flow
Veo uses async long-running operations:
Tip: Don't poll too frequently (< 10s intervals).
🆘 Troubleshooting
"API not enabled" (403)
Go to Google Cloud Console
Enable "Generative Language API"
Enable billing
Wait 5-10 minutes for propagation
"Rate limit exceeded"
Veo allows ~50 requests/min
Use batch tool with
concurrency: 3Add delays between requests
"Invalid aspect ratio with references"
9:16 may not work with reference images
Use 16:9 for reference mode
Check Veo 3.1 docs for updates
"Video extension failed"
Only Veo-generated videos can be extended
Cannot extend arbitrary MP4s
Input must be from previous Veo job
Long generation times
1080p takes longer than 720p
Audio generation adds time
Reference images add processing
Frame interpolation is slowest
📚 Resources
🎯 Status: Production Ready ✅
✅ All 6 tools implemented
✅ Token-efficient file handling
✅ Async operation support
✅ Batch generation with concurrency control
✅ Cost estimation
✅ Comprehensive validation
✅ Error handling
✅ Full documentation
Ready to generate amazing videos! 🚀
Built with 🎬 for AI video generation