text_to_audio
Convert text to audio with customizable voice, speed, and emotion, saving the file to a specified directory. Integrates with MiniMax API for high-quality speech synthesis.
Instructions
Convert text to audio with a given voice and save the output audio file to a given directory. If no directory is provided, the file will be saved to desktop. If no voice ID is provided, the default voice will be used.
Note: This tool calls MiniMax API and may incur costs. Use only when explicitly requested by the user.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
bitrate | No | Bitrate (bps), values: [64000, 96000, 128000, 160000, 192000, 224000, 256000, 320000] | |
channel | No | Audio channels, values: [1, 2] | |
emotion | No | Speech emotion, values: ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"] | happy |
format | No | Audio format, values: ["pcm", "mp3","flac", "wav"] | mp3 |
languageBoost | No | Enhance the ability to recognize specified languages and dialects. Supported values include: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto', default is 'auto' | auto |
model | No | Model to use | speech-02-hd |
outputDirectory | No | The directory to save the output file. `outputDirectory` is relative to `MINIMAX_MCP_BASE_PATH` (or `basePath` in config). The final save path is `${basePath}/${outputDirectory}`. For example, if `MINIMAX_MCP_BASE_PATH=~/Desktop` and `outputDirectory=workspace`, the output will be saved to `~/Desktop/workspace/` | |
outputFile | No | Path to save the generated audio file, automatically generated if not provided | |
pitch | No | Speech pitch | |
sampleRate | No | Sample rate (Hz), values: [8000, 16000, 22050, 24000, 32000, 44100] | |
speed | No | Speech speed | |
subtitleEnable | No | The parameter controls whether the subtitle service is enabled. The model must be 'speech-01-turbo' or 'speech-01-hd'. If this parameter is not provided, the default value is false | |
text | Yes | Text to convert to audio | |
voiceId | No | Voice ID to use, e.g. "female-shaonv" | male-qn-qingse |
vol | No | Speech volume |
Input Schema (JSON Schema)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
"bitrate": {
"default": 128000,
"description": "Bitrate (bps), values: [64000, 96000, 128000, 160000, 192000, 224000, 256000, 320000]",
"type": "number"
},
"channel": {
"default": 1,
"description": "Audio channels, values: [1, 2]",
"type": "number"
},
"emotion": {
"default": "happy",
"description": "Speech emotion, values: [\"happy\", \"sad\", \"angry\", \"fearful\", \"disgusted\", \"surprised\", \"neutral\"]",
"type": "string"
},
"format": {
"default": "mp3",
"description": "Audio format, values: [\"pcm\", \"mp3\",\"flac\", \"wav\"]",
"type": "string"
},
"languageBoost": {
"default": "auto",
"description": "Enhance the ability to recognize specified languages and dialects. Supported values include: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto', default is 'auto'",
"type": "string"
},
"model": {
"default": "speech-02-hd",
"description": "Model to use",
"type": "string"
},
"outputDirectory": {
"description": "The directory to save the output file. `outputDirectory` is relative to `MINIMAX_MCP_BASE_PATH` (or `basePath` in config). The final save path is `${basePath}/${outputDirectory}`. For example, if `MINIMAX_MCP_BASE_PATH=~/Desktop` and `outputDirectory=workspace`, the output will be saved to `~/Desktop/workspace/`",
"type": "string"
},
"outputFile": {
"description": "Path to save the generated audio file, automatically generated if not provided",
"type": "string"
},
"pitch": {
"default": 0,
"description": "Speech pitch",
"maximum": 12,
"minimum": -12,
"type": "number"
},
"sampleRate": {
"default": 32000,
"description": "Sample rate (Hz), values: [8000, 16000, 22050, 24000, 32000, 44100]",
"type": "number"
},
"speed": {
"default": 1,
"description": "Speech speed",
"maximum": 2,
"minimum": 0.5,
"type": "number"
},
"subtitleEnable": {
"default": false,
"description": "The parameter controls whether the subtitle service is enabled. The model must be 'speech-01-turbo' or 'speech-01-hd'. If this parameter is not provided, the default value is false",
"type": "boolean"
},
"text": {
"description": "Text to convert to audio",
"type": "string"
},
"voiceId": {
"default": "male-qn-qingse",
"description": "Voice ID to use, e.g. \"female-shaonv\"",
"type": "string"
},
"vol": {
"default": 1,
"description": "Speech volume",
"maximum": 10,
"minimum": 0.1,
"type": "number"
}
},
"required": [
"text"
],
"type": "object"
}