text_to_speech
Convert text to speech audio files using specified voices and models, saving results to your chosen directory for accessibility or content creation.
Instructions
[AllVoiceLab Tool] Generate speech from provided text.
This tool converts text to speech using the specified voice and model. The generated audio file is saved to the specified directory.
Args:
text: Target text for speech synthesis. Maximum 5,000 characters.
voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve).
model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve).
speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1.
output_dir: Output directory for the generated audio file. Default is user's desktop.
Returns:
TextContent containing file path to the generated audio file.
Limitations:
- Text must not exceed 5,000 characters
- Both voice_id and model_id must be valid and provided
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | ||
| voice_id | Yes | ||
| model_id | Yes | ||
| speed | No | ||
| output_dir | No |
Implementation Reference
- allvoicelab_mcp/tools/speech.py:10-102 (handler)Main handler function for the text_to_speech MCP tool. Performs input validation (text length, voice_id, model_id), validates model against available models, calls the AllVoiceLab client to generate speech, and returns the file path or error.def text_to_speech( text: str, voice_id: str, model_id: str, speed: int = 1, output_dir: str = None ) -> TextContent: """ Convert text to speech Args: text: Target text for speech synthesis. Maximum 5,000 characters. voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve). model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve). speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1. output_dir: Output directory for the generated audio file. Default is user's desktop. Returns: TextContent: Contains the file path to the generated audio file. """ all_voice_lab = get_client() output_dir = all_voice_lab.get_output_path(output_dir) logging.info(f"Tool called: text_to_speech, voice_id: {voice_id}, model_id: {model_id}, speed: {speed}") logging.info(f"Output directory: {output_dir}") # Validate parameters if not text: logging.warning("Text parameter is empty") return TextContent( type="text", text="text parameter cannot be empty" ) if len(text) > 5000: logging.warning(f"Text parameter exceeds maximum length: {len(text)} characters") return TextContent( type="text", text="text parameter cannot exceed 5,000 characters" ) if not voice_id: logging.warning("voice_id parameter is empty") return TextContent( type="text", text="voice_id parameter cannot be empty" ) # Validate voice_id is numeric if not voice_id.isdigit(): logging.warning(f"Invalid voice_id format: {voice_id}, not a numeric value") return TextContent( type="text", text="voice_id parameter must be a numeric value" ) if not model_id: logging.warning("model_id parameter is empty") return TextContent( type="text", text="model_id parameter cannot be empty" ) # Validate model_id against available models try: logging.info(f"Validating model_id: {model_id}") model_resp = all_voice_lab.get_supported_voice_model() available_models = model_resp.models valid_model_ids = [model.model_id for model in available_models] if model_id not in valid_model_ids: logging.warning(f"Invalid model_id: {model_id}, available models: {valid_model_ids}") return TextContent( type="text", text=f"Invalid model_id: {model_id}. Please use a valid model ID." ) logging.info(f"Model ID validation successful: {model_id}") except Exception as e: logging.error(f"Failed to validate model_id: {str(e)}") # Continue with the process even if validation fails # to maintain backward compatibility try: logging.info(f"Starting text-to-speech processing, text length: {len(text)} characters") file_path = all_voice_lab.text_to_speech(text, voice_id, model_id, output_dir, speed) logging.info(f"Text-to-speech successful, file saved at: {file_path}") return TextContent( type="text", text=f"Speech generation completed, file saved at: {file_path}\n" ) except Exception as e: logging.error(f"Text-to-speech failed: {str(e)}") return TextContent( type="text", text=f"Synthesis failed, tool temporarily unavailable" )
- allvoicelab_mcp/server.py:55-76 (registration)MCP tool registration for text_to_speech, including name, detailed description with input schema and limitations, bound to the handler function.mcp.tool( name="text_to_speech", description="""[AllVoiceLab Tool] Generate speech from provided text. This tool converts text to speech using the specified voice and model. The generated audio file is saved to the specified directory. Args: text: Target text for speech synthesis. Maximum 5,000 characters. voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve). model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve). speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1. output_dir: Output directory for the generated audio file. Default is user's desktop. Returns: TextContent containing file path to the generated audio file. Limitations: - Text must not exceed 5,000 characters - Both voice_id and model_id must be valid and provided """ )(text_to_speech)
- client/all_voice_lab.py:265-330 (helper)Helper function in AllVoiceLab client that performs the actual HTTP POST request to the text-to-speech API endpoint, saves the audio file locally, and returns the file path. Called by the MCP handler.def text_to_speech(self, text: str, voice_id, model_id: str, output_dir: str, speed: float = 1.0) -> str: """ Call API to convert text to speech and save as file Args: text: Text to convert voice_id: Voice ID model_id: Model ID output_dir: Output directory speed: Speech speed Returns: Saved audio file path """ # Build request body request_body = { "text": text, "language_code": "auto", "voice_id": int(voice_id), "model_id": model_id, "voice_settings": { "speed": float(speed) } } # API endpoint url = f"{self.api_domain}/v1/text-to-speech/create" # Send request and get response response = requests.post( url=url, json=request_body, headers=self._get_headers(), stream=True # Use streaming for large files ) logging.info(f"text to speech response: {response.headers}") # Check response status response.raise_for_status() # Try to get filename from response headers filename = None content_disposition = response.headers.get('Content-Disposition') if content_disposition: filename_match = re.search(r'filename="?([^"]+)"?', content_disposition) if filename_match: filename = filename_match.group(1) # If filename not obtained from response headers, generate a unique filename if not filename: timestamp = int(time.time()) random_suffix = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz0123456789', k=6)) filename = f"tts_{timestamp}_{random_suffix}.mp3" # Build complete file path output_dir = Path(output_dir) output_dir.mkdir(parents=True, exist_ok=True) file_path = output_dir / filename # Save response content to file with open(file_path, 'wb') as f: for chunk in response.iter_content(chunk_size=8192): if chunk: f.write(chunk) # Return file path return str(file_path)