text_to_audio
Convert text descriptions into AI-generated audio content including sound effects, ambient sounds, music, and atmospheric soundscapes.
Instructions
Generate AI-powered audio content from text descriptions using MMAudio technology. Create sound effects, ambient audio, music, and atmospheric soundscapes from natural language descriptions.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | Describe the audio you want to generate (e.g., "rain falling on leaves", "coffee shop ambiance", "futuristic sci-fi sounds") | |
| duration | No | Duration of generated audio in seconds | |
| num_steps | No | Number of inference steps (higher = better quality, slower) | |
| cfg_strength | No | Classifier-free guidance strength (higher = more adherence to prompt) | |
| negative_prompt | No | Describe what you want to avoid in the generated audio (optional) | |
| seed | No | Random seed for reproducible results |
Implementation Reference
- server/index.js:351-425 (handler)Executes the text_to_audio tool: validates input, calls external MMAudio API, handles errors, validates response, and returns audio generation result.async handleTextToAudio(args) { this.ensureConfigured(); try { const input = TextToAudioInputSchema.parse(args); console.error(`[MMAudio] Starting text-to-audio generation for prompt: "${input.prompt}"`); const response = await fetch(`${this.config.baseUrl}/api/text-to-audio`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${this.config.apiKey}`, 'User-Agent': 'MMAudio-MCP/1.0.0', }, body: JSON.stringify(input), timeout: this.config.timeout, }); if (!response.ok) { const errorText = await response.text(); let errorMessage = `HTTP ${response.status}`; try { const errorData = JSON.parse(errorText); errorMessage = errorData.error || errorMessage; } catch { errorMessage = errorText || errorMessage; } if (response.status === 401) { throw new McpError(ErrorCode.InvalidRequest, 'Invalid API key. Please check your MMAudio API key.'); } else if (response.status === 403) { throw new McpError(ErrorCode.InvalidRequest, 'Insufficient credits for text-to-audio generation.'); } else if (response.status === 429) { throw new McpError(ErrorCode.InvalidRequest, 'Rate limit exceeded. Please try again later.'); } throw new Error(errorMessage); } const result = await response.json(); const validatedResult = TextToAudioResponseSchema.parse(result); console.error(`[MMAudio] Text-to-audio generation completed successfully`); return { content: [ { type: 'text', text: JSON.stringify({ success: true, message: 'Audio generated successfully from text', result: { audio_url: validatedResult.audio.url, content_type: validatedResult.audio.content_type, file_name: validatedResult.audio.file_name, file_size: validatedResult.audio.file_size, duration: input.duration, prompt: input.prompt, } }, null, 2), }, ], }; } catch (error) { if (error instanceof z.ZodError) { throw new McpError( ErrorCode.InvalidParams, `Invalid input parameters: ${error.errors.map(e => `${e.path.join('.')}: ${e.message}`).join(', ')}` ); } throw error; } }
- server/index.js:47-54 (schema)Zod input schema for validating parameters of the text_to_audio tool.const TextToAudioInputSchema = z.object({ prompt: z.string().min(1, 'Prompt is required'), duration: z.number().min(1).max(30).default(8), num_steps: z.number().int().min(1).max(50).default(25), cfg_strength: z.number().min(1).max(10).default(4.5), negative_prompt: z.string().optional().default(''), seed: z.number().int().optional().default(0), });
- server/index.js:174-219 (registration)MCP tool registration in the ListTools response, including name, description, and input schema definition.{ name: 'text_to_audio', description: 'Generate AI-powered audio content from text descriptions using MMAudio technology. Create sound effects, ambient audio, music, and atmospheric soundscapes from natural language descriptions.', inputSchema: { type: 'object', properties: { prompt: { type: 'string', description: 'Describe the audio you want to generate (e.g., "rain falling on leaves", "coffee shop ambiance", "futuristic sci-fi sounds")', }, duration: { type: 'number', minimum: 1, maximum: 30, default: 8, description: 'Duration of generated audio in seconds', }, num_steps: { type: 'integer', minimum: 1, maximum: 50, default: 25, description: 'Number of inference steps (higher = better quality, slower)', }, cfg_strength: { type: 'number', minimum: 1, maximum: 10, default: 4.5, description: 'Classifier-free guidance strength (higher = more adherence to prompt)', }, negative_prompt: { type: 'string', description: 'Describe what you want to avoid in the generated audio (optional)', default: '', }, seed: { type: 'integer', default: 0, description: 'Random seed for reproducible results', }, }, required: ['prompt'], }, }, {
- server/index.js:68-70 (schema)Zod response schema for validating the API response of text_to_audio tool (references shared AudioResponseSchema).const TextToAudioResponseSchema = z.object({ audio: AudioResponseSchema, });
- server/index.js:245-246 (registration)Dispatch case in CallToolRequest handler that routes text_to_audio calls to the handler function.case 'text_to_audio': return await this.handleTextToAudio(args);