text_to_audio
Convert text to speech audio files using customizable voices, emotions, and audio settings, then save them to a specified directory.
Instructions
Convert text to audio with a given voice and save the output audio file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. Voice id is optional, if not provided, the default voice will be used.
COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.
Args:
text (str): The text to convert to speech.
voice_id (str, optional): The id of the voice to use. For example, "male-qn-qingse"/"audiobook_female_1"/"cute_boy"/"Charming_Lady"...
model (string, optional): The model to use.
speed (float, optional): Speed of the generated audio. Controls the speed of the generated speech. Values range from 0.5 to 2.0, with 1.0 being the default speed.
vol (float, optional): Volume of the generated audio. Controls the volume of the generated speech. Values range from 0 to 10, with 1 being the default volume.
pitch (int, optional): Pitch of the generated audio. Controls the speed of the generated speech. Values range from -12 to 12, with 0 being the default speed.
emotion (str, optional): Emotion of the generated audio. Controls the emotion of the generated speech. Values range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"], with "happy" being the default emotion.
sample_rate (int, optional): Sample rate of the generated audio. Controls the sample rate of the generated speech. Values range [8000,16000,22050,24000,32000,44100] with 32000 being the default sample rate.
bitrate (int, optional): Bitrate of the generated audio. Controls the bitrate of the generated speech. Values range [32000,64000,128000,256000] with 128000 being the default bitrate.
channel (int, optional): Channel of the generated audio. Controls the channel of the generated speech. Values range [1, 2] with 1 being the default channel.
format (str, optional): Format of the generated audio. Controls the format of the generated speech. Values range ["pcm", "mp3","flac"] with "mp3" being the default format.
language_boost (str, optional): Language boost of the generated audio. Controls the language boost of the generated speech. Values range ['Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto'] with "auto" being the default language boost.
output_directory (str): The directory to save the audio to.
Returns:
Text content with the path to the output file and name of the voice used.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | ||
| output_directory | No | ||
| voice_id | No | female-shaonv | |
| model | No | speech-02-hd | |
| speed | No | ||
| vol | No | ||
| pitch | No | ||
| emotion | No | happy | |
| sample_rate | No | ||
| bitrate | No | ||
| channel | No | ||
| format | No | mp3 | |
| language_boost | No | auto |