All Voice Lab MCP Server

Official

Overview Schema Related Servers Score Discussions

text_to_speech

Convert text to speech audio files using specified voices and models, saving results to your chosen directory for accessibility or content creation.

Instructions

[AllVoiceLab Tool] Generate speech from provided text.

This tool converts text to speech using the specified voice and model. The generated audio file is saved to the specified directory.

Args:
    text: Target text for speech synthesis. Maximum 5,000 characters.
    voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve).
    model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve).
    speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1.
    output_dir: Output directory for the generated audio file. Default is user's desktop.
    
Returns:
    TextContent containing file path to the generated audio file.
    
Limitations:
    - Text must not exceed 5,000 characters
    - Both voice_id and model_id must be valid and provided

Input Schema

TableJSON Schema

Name	Required	Description	Default
`text`	Yes
`voice_id`	Yes
`model_id`	Yes
`speed`	No
`output_dir`	No

Implementation Reference

allvoicelab_mcp/tools/speech.py:10-102 (handler)

Main handler function for the text_to_speech MCP tool. Performs input validation (text length, voice_id, model_id), validates model against available models, calls the AllVoiceLab client to generate speech, and returns the file path or error.

def text_to_speech(
    text: str,
    voice_id: str,
    model_id: str,
    speed: int = 1,
    output_dir: str = None
) -> TextContent:
    """
    Convert text to speech
    
    Args:
        text: Target text for speech synthesis. Maximum 5,000 characters.
        voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve).
        model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve).
        speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1.
        output_dir: Output directory for the generated audio file. Default is user's desktop.
        
    Returns:
        TextContent: Contains the file path to the generated audio file.
    """
    all_voice_lab = get_client()
    output_dir = all_voice_lab.get_output_path(output_dir)
    logging.info(f"Tool called: text_to_speech, voice_id: {voice_id}, model_id: {model_id}, speed: {speed}")
    logging.info(f"Output directory: {output_dir}")

    # Validate parameters
    if not text:
        logging.warning("Text parameter is empty")
        return TextContent(
            type="text",
            text="text parameter cannot be empty"
        )
    if len(text) > 5000:
        logging.warning(f"Text parameter exceeds maximum length: {len(text)} characters")
        return TextContent(
            type="text",
            text="text parameter cannot exceed 5,000 characters"
        )
    if not voice_id:
        logging.warning("voice_id parameter is empty")
        return TextContent(
            type="text",
            text="voice_id parameter cannot be empty"
        )
    # Validate voice_id is numeric
    if not voice_id.isdigit():
        logging.warning(f"Invalid voice_id format: {voice_id}, not a numeric value")
        return TextContent(
            type="text",
            text="voice_id parameter must be a numeric value"
        )
    if not model_id:
        logging.warning("model_id parameter is empty")
        return TextContent(
            type="text",
            text="model_id parameter cannot be empty"
        )

    

    # Validate model_id against available models
    try:
        logging.info(f"Validating model_id: {model_id}")
        model_resp = all_voice_lab.get_supported_voice_model()
        available_models = model_resp.models
        valid_model_ids = [model.model_id for model in available_models]

        if model_id not in valid_model_ids:
            logging.warning(f"Invalid model_id: {model_id}, available models: {valid_model_ids}")
            return TextContent(
                type="text",
                text=f"Invalid model_id: {model_id}. Please use a valid model ID."
            )
        logging.info(f"Model ID validation successful: {model_id}")
    except Exception as e:
        logging.error(f"Failed to validate model_id: {str(e)}")
        # Continue with the process even if validation fails
        # to maintain backward compatibility

    try:
        logging.info(f"Starting text-to-speech processing, text length: {len(text)} characters")
        file_path = all_voice_lab.text_to_speech(text, voice_id, model_id, output_dir, speed)
        logging.info(f"Text-to-speech successful, file saved at: {file_path}")
        return TextContent(
            type="text",
            text=f"Speech generation completed, file saved at: {file_path}\n"
        )
    except Exception as e:
        logging.error(f"Text-to-speech failed: {str(e)}")
        return TextContent(
            type="text",
            text=f"Synthesis failed, tool temporarily unavailable"
        )

allvoicelab_mcp/server.py:55-76 (registration)

MCP tool registration for text_to_speech, including name, detailed description with input schema and limitations, bound to the handler function.

mcp.tool(
    name="text_to_speech",
    description="""[AllVoiceLab Tool] Generate speech from provided text.
    
    This tool converts text to speech using the specified voice and model. The generated audio file is saved to the specified directory.
    
    Args:
        text: Target text for speech synthesis. Maximum 5,000 characters.
        voice_id: Voice ID to use for synthesis. Required. Must be a valid voice ID from the available voices (use get_voices tool to retrieve).
        model_id: Model ID to use for synthesis. Required. Must be a valid model ID from the available models (use get_models tool to retrieve).
        speed: Speech rate adjustment, range [0.5, 1.5], where 0.5 is slowest and 1.5 is fastest. Default value is 1.
        output_dir: Output directory for the generated audio file. Default is user's desktop.
        
    Returns:
        TextContent containing file path to the generated audio file.
        
    Limitations:
        - Text must not exceed 5,000 characters
        - Both voice_id and model_id must be valid and provided
    """
)(text_to_speech)

client/all_voice_lab.py:265-330 (helper)

Helper function in AllVoiceLab client that performs the actual HTTP POST request to the text-to-speech API endpoint, saves the audio file locally, and returns the file path. Called by the MCP handler.

def text_to_speech(self, text: str, voice_id, model_id: str, output_dir: str, speed: float = 1.0) -> str:
    """
    Call API to convert text to speech and save as file

    Args:
        text: Text to convert
        voice_id: Voice ID
        model_id: Model ID
        output_dir: Output directory
        speed: Speech speed

    Returns:
        Saved audio file path
    """
    # Build request body
    request_body = {
        "text": text,
        "language_code": "auto",
        "voice_id": int(voice_id),
        "model_id": model_id,
        "voice_settings": {
            "speed": float(speed)
        }
    }

    # API endpoint
    url = f"{self.api_domain}/v1/text-to-speech/create"

    # Send request and get response
    response = requests.post(
        url=url,
        json=request_body,
        headers=self._get_headers(),
        stream=True  # Use streaming for large files
    )
    logging.info(f"text to speech response: {response.headers}")
    # Check response status
    response.raise_for_status()

    # Try to get filename from response headers
    filename = None
    content_disposition = response.headers.get('Content-Disposition')
    if content_disposition:
        filename_match = re.search(r'filename="?([^"]+)"?', content_disposition)
        if filename_match:
            filename = filename_match.group(1)

    # If filename not obtained from response headers, generate a unique filename
    if not filename:
        timestamp = int(time.time())
        random_suffix = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz0123456789', k=6))
        filename = f"tts_{timestamp}_{random_suffix}.mp3"

    # Build complete file path
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    file_path = output_dir / filename

    # Save response content to file
    with open(file_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            if chunk:
                f.write(chunk)

    # Return file path
    return str(file_path)

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behaviors: it generates and saves an audio file, specifies default values and ranges (speed, output_dir), mentions character limits, and references required validation tools. It doesn't cover rate limits or authentication needs, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Args, Returns, Limitations), front-loaded with the core purpose, and every sentence adds value. No redundant information—each part serves to clarify usage, parameters, or constraints efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no annotations, no output schema), the description is complete: it explains the tool's purpose, how to use it with sibling references, all parameter semantics, return value (file path), and limitations. This provides everything needed for an agent to invoke it correctly without structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It adds significant meaning beyond the schema: explains each parameter's purpose, provides constraints (max 5,000 characters, valid IDs from specific tools, speed range), default values, and output directory behavior. This comprehensively documents all 5 parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('generate speech from provided text', 'converts text to speech') and identifies the resource (audio file). It distinguishes itself from siblings like speech_to_speech or subtitle_extraction by focusing on text-to-speech synthesis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage by referencing sibling tools (get_voices, get_models) to obtain valid IDs, and mentions limitations that guide when to use it. However, it doesn't explicitly state when NOT to use this tool versus alternatives like speech_to_speech or when text_translation might be needed first.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/allvoicelab/AllVoiceLab-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server