Skip to main content
Glama

video_to_audio

Extract or generate synchronized audio from video content using AI, allowing customization of sound effects, ambient noise, and atmospheric elements based on descriptive prompts.

Instructions

Generate AI-powered audio from video content using MMAudio technology. Analyzes video frames and generates synchronized audio including sound effects, ambient noise, and atmospheric elements.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
video_urlYesURL of the video file to generate audio for (supports mp4, webm, avi, mov formats)
promptYesDescribe the audio you want to generate (e.g., "forest sounds with birds chirping", "urban traffic noise", "peaceful ocean waves")
negative_promptNoDescribe what you want to avoid in the generated audio (optional)
seedNoRandom seed for reproducible results (optional)
num_stepsNoNumber of inference steps (higher = better quality, slower)
durationNoDuration of generated audio in seconds
cfg_strengthNoClassifier-free guidance strength (higher = more adherence to prompt)

Implementation Reference

  • The primary handler function for the 'video_to_audio' tool. Validates input using VideoToAudioInputSchema, sends POST request to MMAudio API (/api/video-to-audio), handles various HTTP errors, parses and validates response, and returns structured content with audio details.
    async handleVideoToAudio(args) { this.ensureConfigured(); try { const input = VideoToAudioInputSchema.parse(args); console.error(`[MMAudio] Starting video-to-audio generation for: ${input.video_url}`); const response = await fetch(`${this.config.baseUrl}/api/video-to-audio`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${this.config.apiKey}`, 'User-Agent': 'MMAudio-MCP/1.0.0', }, body: JSON.stringify(input), timeout: this.config.timeout, }); if (!response.ok) { const errorText = await response.text(); let errorMessage = `HTTP ${response.status}`; try { const errorData = JSON.parse(errorText); errorMessage = errorData.error || errorMessage; } catch { errorMessage = errorText || errorMessage; } if (response.status === 401) { throw new McpError(ErrorCode.InvalidRequest, 'Invalid API key. Please check your MMAudio API key.'); } else if (response.status === 403) { throw new McpError(ErrorCode.InvalidRequest, 'Insufficient credits for video-to-audio generation.'); } else if (response.status === 429) { throw new McpError(ErrorCode.InvalidRequest, 'Rate limit exceeded. Please try again later.'); } throw new Error(errorMessage); } const result = await response.json(); const validatedResult = VideoToAudioResponseSchema.parse(result); console.error(`[MMAudio] Video-to-audio generation completed successfully`); return { content: [ { type: 'text', text: JSON.stringify({ success: true, message: 'Audio generated successfully from video', result: { audio_url: validatedResult.video.url, content_type: validatedResult.video.content_type, file_name: validatedResult.video.file_name, file_size: validatedResult.video.file_size, duration: input.duration, prompt: input.prompt, } }, null, 2), }, ], }; } catch (error) { if (error instanceof z.ZodError) { throw new McpError( ErrorCode.InvalidParams, `Invalid input parameters: ${error.errors.map(e => `${e.path.join('.')}: ${e.message}`).join(', ')}` ); } throw error; } }
  • Zod input schema defining parameters for video_to_audio tool: video_url (required), prompt (required), and optional parameters for generation control.
    const VideoToAudioInputSchema = z.object({ video_url: z.string().url('Invalid video URL'), prompt: z.string().min(1, 'Prompt is required'), negative_prompt: z.string().optional().default(''), seed: z.number().int().optional().nullable(), num_steps: z.number().int().min(1).max(50).default(25), duration: z.number().min(1).max(30).default(8), cfg_strength: z.number().min(1).max(10).default(4.5), });
  • Zod response schema for video_to_audio tool output, referencing shared AudioResponseSchema (lines 57-62).
    const VideoToAudioResponseSchema = z.object({ video: AudioResponseSchema, });
  • Tool registration in ListToolsRequestSchema handler, defining name, description, and JSON schema mirroring the Zod input schema.
    { name: 'video_to_audio', description: 'Generate AI-powered audio from video content using MMAudio technology. Analyzes video frames and generates synchronized audio including sound effects, ambient noise, and atmospheric elements.', inputSchema: { type: 'object', properties: { video_url: { type: 'string', format: 'uri', description: 'URL of the video file to generate audio for (supports mp4, webm, avi, mov formats)', }, prompt: { type: 'string', description: 'Describe the audio you want to generate (e.g., "forest sounds with birds chirping", "urban traffic noise", "peaceful ocean waves")', }, negative_prompt: { type: 'string', description: 'Describe what you want to avoid in the generated audio (optional)', default: '', }, seed: { type: 'integer', description: 'Random seed for reproducible results (optional)', nullable: true, }, num_steps: { type: 'integer', minimum: 1, maximum: 50, default: 25, description: 'Number of inference steps (higher = better quality, slower)', }, duration: { type: 'number', minimum: 1, maximum: 30, default: 8, description: 'Duration of generated audio in seconds', }, cfg_strength: { type: 'number', minimum: 1, maximum: 10, default: 4.5, description: 'Classifier-free guidance strength (higher = more adherence to prompt)', }, }, required: ['video_url', 'prompt'], }, },
  • Dispatch case in CallToolRequestSchema handler that routes 'video_to_audio' calls to the handleVideoToAudio method.
    case 'video_to_audio': return await this.handleVideoToAudio(args);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mmaudio/mmaudio-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server