Skip to main content
Glama
allvoicelab

All Voice Lab MCP Server

Official
by allvoicelab

text_to_speech

Convert text to speech audio files using specified voices and models, saving results to your chosen directory for accessibility or content creation.

Instructions

[AllVoiceLab Tool] Generate speech from provided text.

This tool converts text to speech using the specified voice and model. The generated audio file is saved to the specified directory.

Args:
    text: Target text for speech synthesis. Maximum 5,000 characters.
    voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve).
    model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve).
    speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1.
    output_dir: Output directory for the generated audio file. Default is user's desktop.
    
Returns:
    TextContent containing file path to the generated audio file.
    
Limitations:
    - Text must not exceed 5,000 characters
    - Both voice_id and model_id must be valid and provided

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textYes
voice_idYes
model_idYes
speedNo
output_dirNo

Implementation Reference

  • Main handler function for the text_to_speech MCP tool. Performs input validation (text length, voice_id, model_id), validates model against available models, calls the AllVoiceLab client to generate speech, and returns the file path or error.
    def text_to_speech(
        text: str,
        voice_id: str,
        model_id: str,
        speed: int = 1,
        output_dir: str = None
    ) -> TextContent:
        """
        Convert text to speech
        
        Args:
            text: Target text for speech synthesis. Maximum 5,000 characters.
            voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve).
            model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve).
            speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1.
            output_dir: Output directory for the generated audio file. Default is user's desktop.
            
        Returns:
            TextContent: Contains the file path to the generated audio file.
        """
        all_voice_lab = get_client()
        output_dir = all_voice_lab.get_output_path(output_dir)
        logging.info(f"Tool called: text_to_speech, voice_id: {voice_id}, model_id: {model_id}, speed: {speed}")
        logging.info(f"Output directory: {output_dir}")
    
        # Validate parameters
        if not text:
            logging.warning("Text parameter is empty")
            return TextContent(
                type="text",
                text="text parameter cannot be empty"
            )
        if len(text) > 5000:
            logging.warning(f"Text parameter exceeds maximum length: {len(text)} characters")
            return TextContent(
                type="text",
                text="text parameter cannot exceed 5,000 characters"
            )
        if not voice_id:
            logging.warning("voice_id parameter is empty")
            return TextContent(
                type="text",
                text="voice_id parameter cannot be empty"
            )
        # Validate voice_id is numeric
        if not voice_id.isdigit():
            logging.warning(f"Invalid voice_id format: {voice_id}, not a numeric value")
            return TextContent(
                type="text",
                text="voice_id parameter must be a numeric value"
            )
        if not model_id:
            logging.warning("model_id parameter is empty")
            return TextContent(
                type="text",
                text="model_id parameter cannot be empty"
            )
    
        
    
        # Validate model_id against available models
        try:
            logging.info(f"Validating model_id: {model_id}")
            model_resp = all_voice_lab.get_supported_voice_model()
            available_models = model_resp.models
            valid_model_ids = [model.model_id for model in available_models]
    
            if model_id not in valid_model_ids:
                logging.warning(f"Invalid model_id: {model_id}, available models: {valid_model_ids}")
                return TextContent(
                    type="text",
                    text=f"Invalid model_id: {model_id}. Please use a valid model ID."
                )
            logging.info(f"Model ID validation successful: {model_id}")
        except Exception as e:
            logging.error(f"Failed to validate model_id: {str(e)}")
            # Continue with the process even if validation fails
            # to maintain backward compatibility
    
        try:
            logging.info(f"Starting text-to-speech processing, text length: {len(text)} characters")
            file_path = all_voice_lab.text_to_speech(text, voice_id, model_id, output_dir, speed)
            logging.info(f"Text-to-speech successful, file saved at: {file_path}")
            return TextContent(
                type="text",
                text=f"Speech generation completed, file saved at: {file_path}\n"
            )
        except Exception as e:
            logging.error(f"Text-to-speech failed: {str(e)}")
            return TextContent(
                type="text",
                text=f"Synthesis failed, tool temporarily unavailable"
            )
  • MCP tool registration for text_to_speech, including name, detailed description with input schema and limitations, bound to the handler function.
    mcp.tool(
        name="text_to_speech",
        description="""[AllVoiceLab Tool] Generate speech from provided text.
        
        This tool converts text to speech using the specified voice and model. The generated audio file is saved to the specified directory.
        
        Args:
            text: Target text for speech synthesis. Maximum 5,000 characters.
            voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve).
            model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve).
            speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1.
            output_dir: Output directory for the generated audio file. Default is user's desktop.
            
        Returns:
            TextContent containing file path to the generated audio file.
            
        Limitations:
            - Text must not exceed 5,000 characters
            - Both voice_id and model_id must be valid and provided
        """
    )(text_to_speech)
  • Helper function in AllVoiceLab client that performs the actual HTTP POST request to the text-to-speech API endpoint, saves the audio file locally, and returns the file path. Called by the MCP handler.
    def text_to_speech(self, text: str, voice_id, model_id: str, output_dir: str, speed: float = 1.0) -> str:
        """
        Call API to convert text to speech and save as file
    
        Args:
            text: Text to convert
            voice_id: Voice ID
            model_id: Model ID
            output_dir: Output directory
            speed: Speech speed
    
        Returns:
            Saved audio file path
        """
        # Build request body
        request_body = {
            "text": text,
            "language_code": "auto",
            "voice_id": int(voice_id),
            "model_id": model_id,
            "voice_settings": {
                "speed": float(speed)
            }
        }
    
        # API endpoint
        url = f"{self.api_domain}/v1/text-to-speech/create"
    
        # Send request and get response
        response = requests.post(
            url=url,
            json=request_body,
            headers=self._get_headers(),
            stream=True  # Use streaming for large files
        )
        logging.info(f"text to speech response: {response.headers}")
        # Check response status
        response.raise_for_status()
    
        # Try to get filename from response headers
        filename = None
        content_disposition = response.headers.get('Content-Disposition')
        if content_disposition:
            filename_match = re.search(r'filename="?([^"]+)"?', content_disposition)
            if filename_match:
                filename = filename_match.group(1)
    
        # If filename not obtained from response headers, generate a unique filename
        if not filename:
            timestamp = int(time.time())
            random_suffix = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz0123456789', k=6))
            filename = f"tts_{timestamp}_{random_suffix}.mp3"
    
        # Build complete file path
        output_dir = Path(output_dir)
        output_dir.mkdir(parents=True, exist_ok=True)
        file_path = output_dir / filename
    
        # Save response content to file
        with open(file_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)
    
        # Return file path
        return str(file_path)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/allvoicelab/AllVoiceLab-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server