video_to_audio
Extract or generate synchronized audio from video content using AI, allowing customization of sound effects, ambient noise, and atmospheric elements based on descriptive prompts.
Instructions
Generate AI-powered audio from video content using MMAudio technology. Analyzes video frames and generates synchronized audio including sound effects, ambient noise, and atmospheric elements.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| video_url | Yes | URL of the video file to generate audio for (supports mp4, webm, avi, mov formats) | |
| prompt | Yes | Describe the audio you want to generate (e.g., "forest sounds with birds chirping", "urban traffic noise", "peaceful ocean waves") | |
| negative_prompt | No | Describe what you want to avoid in the generated audio (optional) | |
| seed | No | Random seed for reproducible results (optional) | |
| num_steps | No | Number of inference steps (higher = better quality, slower) | |
| duration | No | Duration of generated audio in seconds | |
| cfg_strength | No | Classifier-free guidance strength (higher = more adherence to prompt) |
Implementation Reference
- server/index.js:272-346 (handler)The primary handler function for the 'video_to_audio' tool. Validates input using VideoToAudioInputSchema, sends POST request to MMAudio API (/api/video-to-audio), handles various HTTP errors, parses and validates response, and returns structured content with audio details.async handleVideoToAudio(args) { this.ensureConfigured(); try { const input = VideoToAudioInputSchema.parse(args); console.error(`[MMAudio] Starting video-to-audio generation for: ${input.video_url}`); const response = await fetch(`${this.config.baseUrl}/api/video-to-audio`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${this.config.apiKey}`, 'User-Agent': 'MMAudio-MCP/1.0.0', }, body: JSON.stringify(input), timeout: this.config.timeout, }); if (!response.ok) { const errorText = await response.text(); let errorMessage = `HTTP ${response.status}`; try { const errorData = JSON.parse(errorText); errorMessage = errorData.error || errorMessage; } catch { errorMessage = errorText || errorMessage; } if (response.status === 401) { throw new McpError(ErrorCode.InvalidRequest, 'Invalid API key. Please check your MMAudio API key.'); } else if (response.status === 403) { throw new McpError(ErrorCode.InvalidRequest, 'Insufficient credits for video-to-audio generation.'); } else if (response.status === 429) { throw new McpError(ErrorCode.InvalidRequest, 'Rate limit exceeded. Please try again later.'); } throw new Error(errorMessage); } const result = await response.json(); const validatedResult = VideoToAudioResponseSchema.parse(result); console.error(`[MMAudio] Video-to-audio generation completed successfully`); return { content: [ { type: 'text', text: JSON.stringify({ success: true, message: 'Audio generated successfully from video', result: { audio_url: validatedResult.video.url, content_type: validatedResult.video.content_type, file_name: validatedResult.video.file_name, file_size: validatedResult.video.file_size, duration: input.duration, prompt: input.prompt, } }, null, 2), }, ], }; } catch (error) { if (error instanceof z.ZodError) { throw new McpError( ErrorCode.InvalidParams, `Invalid input parameters: ${error.errors.map(e => `${e.path.join('.')}: ${e.message}`).join(', ')}` ); } throw error; } }
- server/index.js:37-45 (schema)Zod input schema defining parameters for video_to_audio tool: video_url (required), prompt (required), and optional parameters for generation control.const VideoToAudioInputSchema = z.object({ video_url: z.string().url('Invalid video URL'), prompt: z.string().min(1, 'Prompt is required'), negative_prompt: z.string().optional().default(''), seed: z.number().int().optional().nullable(), num_steps: z.number().int().min(1).max(50).default(25), duration: z.number().min(1).max(30).default(8), cfg_strength: z.number().min(1).max(10).default(4.5), });
- server/index.js:64-66 (schema)Zod response schema for video_to_audio tool output, referencing shared AudioResponseSchema (lines 57-62).const VideoToAudioResponseSchema = z.object({ video: AudioResponseSchema, });
- server/index.js:124-173 (registration)Tool registration in ListToolsRequestSchema handler, defining name, description, and JSON schema mirroring the Zod input schema.{ name: 'video_to_audio', description: 'Generate AI-powered audio from video content using MMAudio technology. Analyzes video frames and generates synchronized audio including sound effects, ambient noise, and atmospheric elements.', inputSchema: { type: 'object', properties: { video_url: { type: 'string', format: 'uri', description: 'URL of the video file to generate audio for (supports mp4, webm, avi, mov formats)', }, prompt: { type: 'string', description: 'Describe the audio you want to generate (e.g., "forest sounds with birds chirping", "urban traffic noise", "peaceful ocean waves")', }, negative_prompt: { type: 'string', description: 'Describe what you want to avoid in the generated audio (optional)', default: '', }, seed: { type: 'integer', description: 'Random seed for reproducible results (optional)', nullable: true, }, num_steps: { type: 'integer', minimum: 1, maximum: 50, default: 25, description: 'Number of inference steps (higher = better quality, slower)', }, duration: { type: 'number', minimum: 1, maximum: 30, default: 8, description: 'Duration of generated audio in seconds', }, cfg_strength: { type: 'number', minimum: 1, maximum: 10, default: 4.5, description: 'Classifier-free guidance strength (higher = more adherence to prompt)', }, }, required: ['video_url', 'prompt'], }, },
- server/index.js:243-244 (registration)Dispatch case in CallToolRequestSchema handler that routes 'video_to_audio' calls to the handleVideoToAudio method.case 'video_to_audio': return await this.handleVideoToAudio(args);