ElevenLabs MCP Server

text_to_sound_effects

Generate sound effects from text descriptions using ElevenLabs API. Specify duration, output format, and save location for audio files used in projects requiring custom sound design.

Instructions

Convert text description of a sound effect to sound effect with a given duration and save the output audio file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. Duration must be between 0.5 and 5 seconds.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

Args:
    text: Text description of the sound effect
    duration_seconds: Duration of the sound effect in seconds
    output_directory: Directory where files should be saved.
        Defaults to $HOME/Desktop if not provided.
    loop: Whether to loop the sound effect. Defaults to False.
    output_format (str, optional): Output format of the generated audio. Formatted as codec_sample_rate_bitrate. So an mp3 with 22.05kHz sample rate at 32kbs is represented as mp3_22050_32. MP3 with 192kbps bitrate requires you to be subscribed to Creator tier or above. PCM with 44.1kHz sample rate requires you to be subscribed to Pro tier or above. Note that the μ-law format (sometimes written mu-law, often approximated as u-law) is commonly used for Twilio audio inputs.
        Defaults to "mp3_44100_128". Must be one of:
        mp3_22050_32
        mp3_44100_32
        mp3_44100_64
        mp3_44100_96
        mp3_44100_128
        mp3_44100_192
        pcm_8000
        pcm_16000
        pcm_22050
        pcm_24000
        pcm_44100
        ulaw_8000
        alaw_8000
        opus_48000_32
        opus_48000_64
        opus_48000_96
        opus_48000_128
        opus_48000_192

Input Schema

TableJSON Schema

Name	Required	Default
`duration_seconds`	No
`loop`	No
`output_directory`	No
`output_format`	No	mp3_44100_128
`text`	Yes

Implementation Reference

elevenlabs_mcp/server.py:360-386 (handler)

The handler function that implements the core logic of the 'text_to_sound_effects' tool. It validates input, generates sound effects audio using the ElevenLabs client, saves the audio file, and returns the file path.

def text_to_sound_effects(
    text: str,
    duration_seconds: float = 2.0,
    output_directory: str | None = None,
    output_format: str = "mp3_44100_128",
    loop: bool = False,
) -> TextContent:
    if duration_seconds < 0.5 or duration_seconds > 5:
        make_error("Duration must be between 0.5 and 5 seconds")
    output_path = make_output_path(output_directory, base_path)
    output_file_name = make_output_file("sfx", text, output_path, "mp3")

    audio_data = client.text_to_sound_effects.convert(
        text=text,
        output_format=output_format,
        duration_seconds=duration_seconds,
        loop=loop,
    )
    audio_bytes = b"".join(audio_data)

    with open(output_path / output_file_name, "wb") as f:
        f.write(audio_bytes)

    return TextContent(
        type="text",
        text=f"Success. File saved as: {output_path / output_file_name}",
    )

elevenlabs_mcp/server.py:325-359 (registration)

The @mcp.tool decorator registers the 'text_to_sound_effects' tool, including its detailed description, parameters, and usage instructions.

@mcp.tool(
    description="""Convert text description of a sound effect to sound effect with a given duration and save the output audio file to a given directory.
    Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.
    Duration must be between 0.5 and 5 seconds.

    ⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

    Args:
        text: Text description of the sound effect
        duration_seconds: Duration of the sound effect in seconds
        output_directory: Directory where files should be saved.
            Defaults to $HOME/Desktop if not provided.
        loop: Whether to loop the sound effect. Defaults to False.
        output_format (str, optional): Output format of the generated audio. Formatted as codec_sample_rate_bitrate. So an mp3 with 22.05kHz sample rate at 32kbs is represented as mp3_22050_32. MP3 with 192kbps bitrate requires you to be subscribed to Creator tier or above. PCM with 44.1kHz sample rate requires you to be subscribed to Pro tier or above. Note that the μ-law format (sometimes written mu-law, often approximated as u-law) is commonly used for Twilio audio inputs.
            Defaults to "mp3_44100_128". Must be one of:
            mp3_22050_32
            mp3_44100_32
            mp3_44100_64
            mp3_44100_96
            mp3_44100_128
            mp3_44100_192
            pcm_8000
            pcm_16000
            pcm_22050
            pcm_24000
            pcm_44100
            ulaw_8000
            alaw_8000
            opus_48000_32
            opus_48000_64
            opus_48000_96
            opus_48000_128
            opus_48000_192
    """
)

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and does so effectively. It reveals critical behavioral traits: external API dependency (ElevenLabs), cost implications, file system operations (saving to directory), default behaviors (directory defaults to $HOME/Desktop), and subscription tier requirements for certain formats. The only minor gap is lack of information about error handling or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with purpose statement first, followed by warnings, then detailed parameter documentation. While comprehensive, some sentences could be more concise (e.g., the output_format explanation is quite lengthy). However, all content serves clear purposes: operational guidance, cost warnings, and parameter clarification.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 5-parameter tool with no annotations and no output schema, the description provides excellent coverage of purpose, usage constraints, and parameter semantics. The main gap is lack of information about return values or output file naming conventions. However, given the complexity of the tool and absence of structured metadata, the description does remarkably well at providing operational context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema description coverage, the description provides comprehensive parameter semantics that fully compensate. Each parameter (text, duration_seconds, output_directory, loop, output_format) receives clear explanations including purpose, constraints, defaults, and format specifications. The output_format parameter gets particularly detailed treatment with format explanation, tier requirements, and complete enumeration of valid values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('convert text description to sound effect' and 'save the output audio file') and identifies the resource ('sound effect'). It distinguishes itself from sibling tools like 'text_to_speech' and 'compose_music' by focusing specifically on sound effects generation rather than speech synthesis or music composition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance with the ⚠️ COST WARNING that clearly states when to use ('only use when explicitly requested by the user') and when to avoid (due to API costs). It also specifies duration constraints ('must be between 0.5 and 5 seconds') and tier requirements for certain output formats, giving clear operational boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/projectservan8n/elevenlabs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server