Mobvoi TTS MCP Server

by mobvoi
MIT License
1
  • Apple
  • Linux

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
APP_KEYYesYour APP_KEY from Mobvoi Sequence Monkey open platform
APP_SECRETYesYour APP_SECRET from Mobvoi Sequence Monkey open platform

Schema

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Tools

Functions exposed to the LLM to take actions

NameDescription
async_text_to_speech

Async version of text_to_speech. Convert text to speech with a given speaker and save the output audio file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. You can choose speaker by providing speaker parameter. If speaker is not provided, the default speaker(xiaoyi_meet) will be used.

⚠️ COST WARNING: This tool makes an API call to Mobvoi TTS service which may incur costs. Only use when explicitly requested by the user. Args: text (str): The text to convert to speech. speaker (str): Determine which speaker's voice to be used to synthesize the audio. audio_type (str): Determine the format of the synthesized audio. Value can choose form [pcm/mp3/speex-wb-10/wav]. speed (float): Control the speed of the synthesized audio. Values range from 0.5 to 2.0, with 1.0 being the default speed. Lower values create slower, more deliberate speech while higher values produce faster-paced speech. Extreme values can impact the quality of the generated speech. Range is 0.7 to 1.2. rate(int): Control the sampling rate of the synthesized audio. Value can choose from [8000/16000/24000], with 24000 being the deault rate. volume(float): Control the volume of the synthesized audio. Values range from 0.1 to 1.0, with 1.0 being the default volume. pitch(float): Control the pitch of the synthesized audio. Values range from -10 to 10, with 0 being the default pitch. If the parameter is less than 0, the pitch will become lower; otherwise, it will be higher. streaming(bool): Whether to output in a streaming manner. The default value is false. output_directory (str): Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: Text content with the path to the output file and name of the speaker used.
async_voice_clone_url

Async version of voice_clone. Clone a voice from a given url audio file. This tool will return a speaker id which can be used in text_to_speech tool.

⚠️ COST WARNING: This tool makes an API call to Mobvoi TTS service which may incur costs. Only use when explicitly requested by the user. Args: wav_uri (str): The url of the audio file to clone.
async_voice_clone_local

Async version of voice_clone. Clone a voice from a given local audio file. This tool will return a speaker id which can be used in text_to_speech tool.

⚠️ COST WARNING: This tool makes an API call to Mobvoi TTS service which may incur costs. Only use when explicitly requested by the user. Args: audio_file_path (str): The path of the audio file to clone.
play_audio

Play an audio file. Supports WAV and MP3 formats.

ID: s2oudyvkuo