Skip to main content
Glama

synthesize_speech

Convert text to natural-sounding speech audio with customizable voice options and speaking speed, returning MP3 format audio.

Instructions

Convert text to natural-sounding speech audio.

Returns base64-encoded audio in MP3 format. Use list_voices to see available voice options.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textYesThe text to convert to speech
voiceNoVoice ID to use (see list_voices for available options)alloy
speedNoSpeaking speed multiplier (0.5 to 2.0)

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • Main handler function for the synthesize_speech tool. It converts text to speech by making an async HTTP POST request to the Brainiall API endpoint '/v1/tts/synthesize' with text, voice (default: 'alloy'), and speed (default: 1.0) parameters. Returns base64-encoded MP3 audio.
    @mcp.tool()
    async def synthesize_speech(
        text: Annotated[str, "The text to convert to speech"],
        voice: Annotated[str, "Voice ID to use (see list_voices for available options)"] = "alloy",
        speed: Annotated[float, "Speaking speed multiplier (0.5 to 2.0)"] = 1.0,
    ) -> dict:
        """Convert text to natural-sounding speech audio.
    
        Returns base64-encoded audio in MP3 format.
        Use list_voices to see available voice options.
        """
        async with _client() as client:
            response = await client.post(
                "/v1/tts/synthesize",
                json={
                    "text": text,
                    "voice": voice,
                    "speed": speed,
                },
            )
            response.raise_for_status()
            return response.json()
  • Input schema definition for synthesize_speech using Annotated type hints. Parameters: text (str, required), voice (str, default 'alloy'), and speed (float, range 0.5-2.0, default 1.0). Each parameter includes descriptive metadata for tool documentation.
    @mcp.tool()
    async def synthesize_speech(
        text: Annotated[str, "The text to convert to speech"],
        voice: Annotated[str, "Voice ID to use (see list_voices for available options)"] = "alloy",
        speed: Annotated[float, "Speaking speed multiplier (0.5 to 2.0)"] = 1.0,
    ) -> dict:
  • server.py:94-94 (registration)
    Tool registration decorator @mcp.tool() that registers the synthesize_speech function as an MCP tool with the FastMCP server instance.
    @mcp.tool()
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: it returns base64-encoded MP3 audio, which is valuable context not in the schema. However, it doesn't mention potential limitations like rate limits, authentication needs, file size constraints, or error conditions, leaving gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by return format and usage tip. All three sentences earn their place: the first defines the tool, the second specifies output format, and the third provides actionable guidance. Zero waste, appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (text-to-speech conversion), 100% schema coverage, and presence of an output schema (implied by context signals), the description is largely complete. It covers purpose, output format, and voice reference. However, as a mutation tool with no annotations, it could benefit from more behavioral context like error handling or limitations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all three parameters (text, voice, speed). The description adds no additional parameter semantics beyond what's in the schema, such as explaining voice ID formats or speed effects. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('Convert') and resource ('text to natural-sounding speech audio'), distinguishing it from siblings like assess_pronunciation (evaluation), list_voices (listing), and transcribe_speech (speech-to-text). It precisely communicates the core transformation function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by mentioning list_voices for voice options, which helps guide usage. However, it doesn't explicitly state when to use this tool versus alternatives like transcribe_speech (reverse operation) or assess_pronunciation (quality assessment), nor does it mention any exclusions or prerequisites for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/fasuizu-br/brainiall-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server