# 🎬 Veo 3.1 MCP Server
**Token-Efficient AI Video Generation with Google's Veo 3.1**
## 🎯 What is This?
An MCP server for **Google's Veo 3.1** - the state-of-the-art AI video generation model. Generate stunning videos from text prompts, reference images, or interpolate between first/last frames.
### Key Features
- ✅ **Text-to-Video** - Generate videos from descriptions
- ✅ **Reference Images** - Up to 3 images for style guidance
- ✅ **Frame Interpolation** - First + last frame → coherent video
- ✅ **Video Extension** - Extend Veo-generated videos
- ✅ **Batch Generation** - Generate multiple videos with concurrency control
- ✅ **Cost Estimation** - Know costs before generating
- ✅ **Token-Efficient** - Auto-upload refs to Files API (97% token savings!)
---
## 🚀 Quick Start
### 1. Installation
```bash
cd veo-mcp
npm install
npm run build
```
### 2. Get API Key
1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Create API key
3. **Enable Veo 3.1** in your project (billing required)
### 3. Configure
```bash
cp environment.template .env
# Edit .env and add your key
```
### 4. Add to Cursor
Add to `~/.cursor/mcp.json`:
```json
{
"mcpServers": {
"veo": {
"command": "node",
"args": ["C:\\Users\\woute\\Githubs\\MCP\\veo-mcp\\dist\\index.js"],
"env": {
"GEMINI_API_KEY": "your_api_key_here"
}
}
}
}
```
Restart Cursor. Done! ✅
---
## 🛠️ Tools
### 1. `start_video_generation` - Generate Video
**Basic text-to-video:**
```json
{
"prompt": "A serene Zen garden at sunrise, cherry blossoms falling, cinematic"
}
```
**With reference images (token-efficient!):**
```json
{
"prompt": "A futuristic cityscape at night, neon lights",
"referenceImages": [{
"source": "url",
"url": "https://example.com/style.jpg"
}],
"durationSeconds": 8,
"resolution": "1080p"
}
```
**First/last frame interpolation:**
```json
{
"prompt": "Smooth transition between these scenes",
"firstFrame": {
"source": "file_path",
"filePath": "C:\\first.jpg"
},
"lastFrame": {
"source": "file_path",
"filePath": "C:\\last.jpg"
}
}
```
**Parameters:**
- `model` - `veo-3.1-generate-001` (quality) or `veo-3.1-fast-generate-001` (speed)
- `durationSeconds` - 4, 6, or 8
- `aspectRatio` - `16:9` or `9:16`
- `resolution` - `720p` or `1080p`
- `generateAudio` - Include synchronized audio (2x cost)
- `seed` - For reproducibility
- `sampleCount` - Generate 1-4 videos
### 2. `get_video_job` - Check Status
```json
{
"operationName": "operations/xyz"
}
```
Returns status and video URLs when complete.
### 3. `upload_image` - Pre-Upload References
```json
{
"source": "file_path",
"filePath": "C:\\style-ref.jpg"
}
```
Returns `fileUri` valid for 48 hours. Reuse across multiple generations!
### 4. `extend_video` - Extend Videos
```json
{
"videoFileUri": "files/abc123",
"additionalSeconds": 7,
"prompt": "Continue with the character walking into the sunset"
}
```
### 5. `start_batch_video_generation` - Batch Generate
```json
{
"jobs": [
{"key": "scene1", "request": {"prompt": "..."}},
{"key": "scene2", "request": {"prompt": "..."}}
],
"concurrency": 3
}
```
### 6. `estimate_veo_cost` - Cost Estimation
```json
{
"model": "veo-3.1-fast-generate-001",
"durationSeconds": 8,
"sampleCount": 1,
"generateAudio": false
}
```
Returns estimated cost in USD.
---
## 💰 Pricing
| Model | Video Only | Video + Audio |
|-------|------------|---------------|
| **veo-3.1-generate-001** (quality) | $0.20/sec | $0.40/sec |
| **veo-3.1-fast-generate-001** (speed) | $0.10/sec | $0.15/sec |
**Example Costs:**
- 8s video (fast, no audio): **$0.80**
- 8s video (quality, with audio): **$3.20**
- 4s video (fast, no audio): **$0.40**
---
## 📊 Limits & Constraints
| Parameter | Limit |
|-----------|-------|
| Duration | 4, 6, or 8 seconds |
| Reference images | 0-3 images |
| Sample count | 1-4 videos |
| Resolutions | 720p, 1080p |
| Aspect ratios | 16:9, 9:16 |
| Rate limit | ~50 requests/min |
---
## 💡 Usage Examples
### Simple Text-to-Video
```
Generate an 8-second video of a peaceful forest scene with morning mist
```
### With Style Reference
```
Create a video of a tech startup office, using this image for style: C:\ref.jpg
```
### Frame Interpolation
```
Generate a smooth transition between first.jpg and last.jpg, 8 seconds, cinematic camera movement
```
### Batch Generation
```
Generate 5 different video variations of a product showcase with different angles
```
---
## 🔍 How Token Efficiency Works
### ❌ Naive Approach (Base64)
```json
{
"referenceImages": [{
"base64": "iVBORw0KGgo..." // 500KB → ~50,000 tokens!
}]
}
```
**Cost:** Massive token usage per call
### ✅ Token-Efficient (This MCP)
```json
{
"referenceImages": [{
"source": "url",
"url": "https://example.com/ref.jpg" // ~20 tokens
}]
}
```
**What Happens:**
1. Server downloads image (no tokens)
2. Computes SHA-256 hash
3. Checks cache (48h validity)
4. Uploads to Files API if needed (~1s)
5. Uses short `files/abc123` URI (~5 tokens)
**Savings:** 97%+ fewer tokens! 🎉
---
## ⏱️ Generation Times
| Configuration | Typical Time |
|---------------|-------------|
| 4s, 720p, no audio | 30-60 sec |
| 8s, 1080p, no audio | 60-120 sec |
| 8s, 1080p, with audio | 90-150 sec |
| With references | +10-30 sec |
| Frame interpolation | +20-40 sec |
**Note:** Times vary based on prompt complexity and server load.
---
## 🎨 Best Practices
### 1. Start Small, Scale Up
```
Step 1: Generate 1 video at 720p
Step 2: If good, regenerate at 1080p
Step 3: Use batch for variations
```
### 2. Use Fast Model for Testing
```json
{
"model": "veo-3.1-fast-generate-001", // Testing
"resolution": "720p"
}
```
Switch to quality model for final:
```json
{
"model": "veo-3.1-generate-001", // Final
"resolution": "1080p"
}
```
### 3. Pre-Upload Frequently Used References
```json
// Step 1: Upload once
upload_image {"source": "file_path", "filePath": "brand-style.jpg"}
// Returns: files/xyz123
// Step 2: Reuse many times
{
"referenceImages": [{"source": "file_uri", "fileUri": "files/xyz123"}]
}
```
### 4. Leverage Batch for Variations
```json
{
"jobs": [
{"key": "v1", "request": {"prompt": "Scene 1...", "seed": 1}},
{"key": "v2", "request": {"prompt": "Scene 1...", "seed": 2}},
{"key": "v3", "request": {"prompt": "Scene 1...", "seed": 3}}
]
}
```
### 5. Monitor Costs
Always estimate before large batches:
```json
estimate_veo_cost {
"model": "veo-3.1-fast-generate-001",
"durationSeconds": 8,
"sampleCount": 10
}
// Returns: $8.00 estimate
```
---
## 🎬 Async Operation Flow
Veo uses async long-running operations:
```
1. start_video_generation
↓ Returns operationName immediately
2. get_video_job (poll every 10-30s)
↓ Returns {done: false, status: "RUNNING"}
3. get_video_job (after 60-120s)
↓ Returns {done: true, videos: [{videoUri: "..."}]}
4. Download video from videoUri
```
**Tip:** Don't poll too frequently (< 10s intervals).
---
## 🆘 Troubleshooting
### "API not enabled" (403)
1. Go to Google Cloud Console
2. Enable "Generative Language API"
3. Enable billing
4. Wait 5-10 minutes for propagation
### "Rate limit exceeded"
- Veo allows ~50 requests/min
- Use batch tool with `concurrency: 3`
- Add delays between requests
### "Invalid aspect ratio with references"
- 9:16 may not work with reference images
- Use 16:9 for reference mode
- Check Veo 3.1 docs for updates
### "Video extension failed"
- Only Veo-generated videos can be extended
- Cannot extend arbitrary MP4s
- Input must be from previous Veo job
### Long generation times
- 1080p takes longer than 720p
- Audio generation adds time
- Reference images add processing
- Frame interpolation is slowest
---
## 📚 Resources
- [Veo 3.1 Documentation](https://ai.google.dev/gemini-api/docs/video)
- [Vertex AI Pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing)
- [Google AI Studio](https://aistudio.google.com/)
- [Files API Guide](https://ai.google.dev/gemini-api/docs/files)
---
## 🎯 Status: Production Ready ✅
- ✅ All 6 tools implemented
- ✅ Token-efficient file handling
- ✅ Async operation support
- ✅ Batch generation with concurrency control
- ✅ Cost estimation
- ✅ Comprehensive validation
- ✅ Error handling
- ✅ Full documentation
**Ready to generate amazing videos!** 🚀
---
**Built with 🎬 for AI video generation**