Skip to main content
Glama

clone_voice

Upload a local audio file to create a custom voice. The returned voice ID can be used immediately in text-to-speech.

Instructions

Create a custom voice from a single local audio file. Constraints: WAV or MP3 only, max 3MB, exactly one file. The returned voice_id can be used immediately in text_to_speech. Path supports ~ expansion (e.g., "~/sample.wav").

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYes
audio_pathYes
descriptionNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description must disclose behavior. It specifies file constraints, path expansion, and voice_id usability, but lacks details on idempotency, error handling, or potential side effects like credit consumption.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with all essential information, no redundancy. Front-loaded with main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and existence of output schema, description covers main behavioral aspects and constraints. However, missing parameter details for name and description leave a gap, preventing a perfect score.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must explain all parameters. It only addresses audio_path (local file, expansion) but leaves name and description unexplained. This is insufficient for parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action 'Create a custom voice from a single local audio file' and specifies constraints (WAV/MP3, max 3MB, one file). This differentiates it from sibling tools like search_custom_voice and text_to_speech, which operate on existing voices or perform different tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides clear context for when to use (creating a voice from audio) but does not explicitly exclude other use cases or mention alternatives among siblings. It gives usage constraints but no guidance on when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/supertone-inc/supertone-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server