MiniMax MCP JS

Official

Overview Schema Related Servers Score Discussions(1)

text_to_audio

Convert text to audio with customizable voice, speed, and emotion, saving the file to a specified directory. Integrates with MiniMax API for high-quality speech synthesis.

Instructions

Convert text to audio with a given voice and save the output audio file to a given directory. If no directory is provided, the file will be saved to desktop. If no voice ID is provided, the default voice will be used.

Note: This tool calls MiniMax API and may incur costs. Use only when explicitly requested by the user.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`bitrate`	No	Bitrate (bps), values: [64000, 96000, 128000, 160000, 192000, 224000, 256000, 320000]
`channel`	No	Audio channels, values: [1, 2]
`emotion`	No	Speech emotion, values: ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]	happy
`format`	No	Audio format, values: ["pcm", "mp3","flac", "wav"]	mp3
`languageBoost`	No	Enhance the ability to recognize specified languages and dialects. Supported values include: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto', default is 'auto'	auto
`model`	No	Model to use	speech-02-hd
`outputDirectory`	No	The directory to save the output file. `outputDirectory` is relative to `MINIMAX_MCP_BASE_PATH` (or `basePath` in config). The final save path is `${basePath}/${outputDirectory}`. For example, if `MINIMAX_MCP_BASE_PATH=~/Desktop` and `outputDirectory=workspace`, the output will be saved to `~/Desktop/workspace/`
`outputFile`	No	Path to save the generated audio file, automatically generated if not provided
`pitch`	No	Speech pitch
`sampleRate`	No	Sample rate (Hz), values: [8000, 16000, 22050, 24000, 32000, 44100]
`speed`	No	Speech speed
`subtitleEnable`	No	The parameter controls whether the subtitle service is enabled. The model must be 'speech-01-turbo' or 'speech-01-hd'. If this parameter is not provided, the default value is false
`text`	Yes	Text to convert to audio
`voiceId`	No	Voice ID to use, e.g. "female-shaonv"	male-qn-qingse
`vol`	No	Speech volume

Implementation Reference

src/api/tts.ts:16-111 (handler)

Core handler function that implements text-to-audio conversion by calling MiniMax TTS API endpoint /v1/t2a_v2, validates parameters, prepares request payload, handles response (hex to binary conversion, file saving or URL return), includes helper validators for parameters.

async generateSpeech(request: TTSRequest): Promise<any> {
  // Validate required parameters
  if (!request.text || request.text.trim() === '') {
    throw new MinimaxRequestError(ERROR_TEXT_REQUIRED);
  }

  // Process output file
  let outputFile = request.outputFile;
  if (!outputFile) {
    // If no output file is provided, generate one based on text content
    const textPrefix = request.text.substring(0, 20).replace(/[^\w]/g, '_');
    outputFile = `tts_${textPrefix}_${Date.now()}`;
  }

  if (!path.extname(outputFile)) {
    // If no extension, add one based on format
    const format = request.format || 'mp3';
    outputFile = buildOutputFile(outputFile, request.outputDirectory, format);
  }

  // Prepare request data according to MiniMax API nested structure
  const requestData: Record<string, any> = {
    model: this.ensureValidModel(request.model),
    text: request.text,
    voice_setting: {
      voice_id: request.voiceId || 'male-qn-qingse',
      speed: request.speed || 1.0,
      vol: request.vol || 1.0,
      pitch: request.pitch || 0,
      emotion: this.ensureValidEmotion(request.emotion, this.ensureValidModel(request.model))
    },
    audio_setting: {
      sample_rate: this.ensureValidSampleRate(request.sampleRate),
      bitrate: this.ensureValidBitrate(request.bitrate),
      format: this.ensureValidFormat(request.format),
      channel: this.ensureValidChannel(request.channel)
    },
    language_boost: request.languageBoost || 'auto',
    stream: request.stream,
    subtitle_enable: request.subtitleEnable
  };

  // Add output format (if specified)
  if (request.outputFormat === RESOURCE_MODE_URL) {
    requestData.output_format = 'url';
  }

  // Filter out undefined fields (recursive)
  const filteredData = this.removeUndefinedFields(requestData);

  try {
    // Send request
    const response = await this.api.post<any>('/v1/t2a_v2', filteredData);

    // Process response
    const audioData = response?.data?.audio;

    const subtitleFile = response?.data?.subtitle_file;

    if (!audioData) {
      throw new MinimaxRequestError('Could not get audio data from response');
    }

    // If URL mode, return URL directly
    if (request.outputFormat === RESOURCE_MODE_URL) {
      return {
        audio: audioData,
        subtitle: subtitleFile
      };
    }

    // If base64 mode, decode and save file
    try {
      // Convert hex string to binary
      const audioBuffer = Buffer.from(audioData, 'hex');

      // Ensure output directory exists
      const outputDir = path.dirname(outputFile);
      if (!fs.existsSync(outputDir)) {
        fs.mkdirSync(outputDir, { recursive: true });
      }

      // Write to file
      fs.writeFileSync(outputFile, audioBuffer);

      return {
        audio: outputFile,
        subtitle: subtitleFile
      };
    } catch (error) {
      throw new MinimaxRequestError(`Failed to save audio file: ${String(error)}`);
    }
  } catch (error) {
    throw error;
  }
}

src/mcp-server.ts:123-256 (registration)

Tool registration using McpServer.tool() with name 'text_to_audio', detailed description, Zod input schema defining all parameters with descriptions/defaults/validations, and async handler that prepares params and calls TTSAPI.generateSpeech

private registerTextToAudioTool(): void {
  this.server.tool(
    'text_to_audio',
    'Convert text to audio with a given voice and save the output audio file to a given directory. If no directory is provided, the file will be saved to desktop. If no voice ID is provided, the default voice will be used.\n\nNote: This tool calls MiniMax API and may incur costs. Use only when explicitly requested by the user.',
    {
      text: z.string().describe('Text to convert to audio'),
      outputDirectory: COMMON_PARAMETERS_SCHEMA.outputDirectory,
      voiceId: z.string().optional().default(DEFAULT_VOICE_ID).describe('Voice ID to use, e.g. "female-shaonv"'),
      model: z.string().optional().default(DEFAULT_SPEECH_MODEL).describe('Model to use'),
      speed: z.number().min(0.5).max(2.0).optional().default(DEFAULT_SPEED).describe('Speech speed'),
      vol: z.number().min(0.1).max(10.0).optional().default(DEFAULT_VOLUME).describe('Speech volume'),
      pitch: z.number().min(-12).max(12).optional().default(DEFAULT_PITCH).describe('Speech pitch'),
      emotion: z
        .string()
        .optional()
        .default(DEFAULT_EMOTION)
        .describe('Speech emotion, values: ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]'),
      format: z
        .string()
        .optional()
        .default(DEFAULT_FORMAT)
        .describe('Audio format, values: ["pcm", "mp3","flac", "wav"]'),
      sampleRate: z
        .number()
        .optional()
        .default(DEFAULT_SAMPLE_RATE)
        .describe('Sample rate (Hz), values: [8000, 16000, 22050, 24000, 32000, 44100]'),
      bitrate: z
        .number()
        .optional()
        .default(DEFAULT_BITRATE)
        .describe('Bitrate (bps), values: [64000, 96000, 128000, 160000, 192000, 224000, 256000, 320000]'),
      channel: z.number().optional().default(DEFAULT_CHANNEL).describe('Audio channels, values: [1, 2]'),
      languageBoost: z.string().optional().default(DEFAULT_LANGUAGE_BOOST)
        .describe(`Enhance the ability to recognize specified languages and dialects. Supported values include: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto', default is 'auto'`),
      subtitleEnable: z
        .boolean()
        .optional()
        .default(false)
        .describe(
          `The parameter controls whether the subtitle service is enabled. The model must be 'speech-01-turbo' or 'speech-01-hd'. If this parameter is not provided, the default value is false`,
        ),
      outputFile: z
        .string()
        .optional()
        .describe('Path to save the generated audio file, automatically generated if not provided'),
    },
    async (args, extra) => {
      try {
        // Build TTS request parameters
        const ttsParams = {
          text: args.text,
          outputDirectory: args.outputDirectory,
          voiceId: args.voiceId || DEFAULT_VOICE_ID,
          model: args.model || DEFAULT_SPEECH_MODEL,
          speed: args.speed || DEFAULT_SPEED,
          vol: args.vol || DEFAULT_VOLUME,
          pitch: args.pitch || DEFAULT_PITCH,
          emotion: args.emotion || DEFAULT_EMOTION,
          format: args.format || DEFAULT_FORMAT,
          sampleRate: args.sampleRate || DEFAULT_SAMPLE_RATE,
          bitrate: args.bitrate || DEFAULT_BITRATE,
          channel: args.channel || DEFAULT_CHANNEL,
          languageBoost: args.languageBoost || DEFAULT_LANGUAGE_BOOST,
          subtitleEnable: args.subtitleEnable || false,
          outputFile: args.outputFile,
        };

        // Use global configuration
        const requestApiKey = this.config.apiKey;

        if (!requestApiKey) {
          throw new Error(ERROR_API_KEY_REQUIRED);
        }

        // Update configuration with request-specific parameters
        const requestConfig: Partial<Config> = {
          apiKey: requestApiKey,
          apiHost: this.config.apiHost,
          resourceMode: this.config.resourceMode,
        };

        // Update API instance
        const requestApi = new MiniMaxAPI(requestConfig as Config);
        const requestTtsApi = new TTSAPI(requestApi);

        // Automatically set resource mode (if not specified)
        const outputFormat = requestConfig.resourceMode;
        const ttsRequest = {
          ...ttsParams,
          outputFormat,
        };

        // If no output filename is provided, generate one automatically
        if (!ttsRequest.outputFile) {
          const textPrefix = ttsRequest.text.substring(0, 20).replace(/[^\w]/g, '_');
          ttsRequest.outputFile = `tts_${textPrefix}_${Date.now()}`;
        }

        const result = await requestTtsApi.generateSpeech(ttsRequest);

        // Return different messages based on output format
        if (outputFormat === RESOURCE_MODE_URL) {
          return {
            content: [
              {
                type: 'text',
                text: `Success. Audio URL: ${result.audio}. ${ttsParams.subtitleEnable ? `Subtitle file saved: ${result.subtitle}` : ''}`,
              },
            ],
          };
        } else {
          return {
            content: [
              {
                type: 'text',
                text: `Audio file saved: ${result.audio}. ${ttsParams.subtitleEnable ? `Subtitle file saved: ${result.subtitle}. ` : ''}Voice used: ${ttsParams.voiceId}`,
              },
            ],
          };
        }
      } catch (error) {
        return {
          content: [
            {
              type: 'text',
              text: `Failed to generate audio: ${error instanceof Error ? error.message : String(error)}`,
            },
          ],
        };
      }
    },
  );
}

src/mcp-rest-server.ts:204-245 (schema)

JSON schema definition for text_to_audio tool used in REST server's list tools handler, includes all input parameters with types and descriptions.

{
  name: 'text_to_audio',
  description: 'Convert text to audio',
  arguments: [
    { name: 'text', description: 'Text to convert to audio', required: true },
    { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false },
    { name: 'voiceId', description: 'Voice ID to use, e.g. "female-shaonv"', required: false },
    { name: 'model', description: 'Model to use', required: false },
    { name: 'speed', description: 'Speech speed (0.5-2.0)', required: false },
    { name: 'vol', description: 'Speech volume (0.1-10.0)', required: false },
    { name: 'pitch', description: 'Speech pitch (-12 to 12)', required: false },
    { name: 'emotion', description: 'Speech emotion, values: ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]', required: false },
    { name: 'format', description: 'Audio format, values: ["pcm", "mp3","flac", "wav"]', required: false },
    { name: 'sampleRate', description: 'Sample rate (Hz), values: [8000, 16000, 22050, 24000, 32000, 44100]', required: false },
    { name: 'bitrate', description: 'Bitrate (bps), values: [64000, 96000, 128000, 160000, 192000, 224000, 256000, 320000]', required: false },
    { name: 'channel', description: 'Audio channels, values: [1, 2]', required: false },
    { name: 'languageBoost', description: `Enhance the ability to recognize specified languages and dialects. Supported values include: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto', default is 'auto'`, required: false },
    { name: 'subtitleEnable', description: `The parameter controls whether the subtitle service is enabled. The model must be 'speech-01-turbo' or 'speech-01-hd'. If this parameter is not provided, the default value is false`, required: false },
    { name: 'outputFile', description: 'Output file path, auto-generated if not provided', required: false }
  ],
  inputSchema: {
    type: 'object',
    properties: {
      text: { type: 'string' },
      outputDirectory: { type: 'string' },
      voiceId: { type: 'string' },
      model: { type: 'string' },
      speed: { type: 'number' },
      vol: { type: 'number' },
      pitch: { type: 'number' },
      emotion: { type: 'string' },
      format: { type: 'string' },
      sampleRate: { type: 'number' },
      bitrate: { type: 'number' },
      channel: { type: 'number' },
      languageBoost: { type: 'string' },
      subtitleEnable: { type: 'boolean' },
      outputFile: { type: 'string' }
    },
    required: ['text']
  }
},

src/mcp-rest-server.ts:256-489 (handler)

Handler wrapper in REST server that dispatches to mediaService.generateSpeech with retry logic.

              },
              required: ['voiceType']
            }
          },
          {
            name: 'play_audio',
            description: 'Play audio file. Supports WAV and MP3 formats. Does not support video.',
            arguments: [
              { name: 'inputFilePath', description: 'Path to audio file to play', required: true },
              { name: 'isUrl', description: 'Whether audio file is a URL', required: false }
            ],
            inputSchema: {
              type: 'object',
              properties: {
                inputFilePath: { type: 'string' },
                isUrl: { type: 'boolean' }
              },
              required: ['inputFilePath']
            }
          },
          {
            name: 'text_to_image',
            description: 'Generate image based on text prompt',
            arguments: [
              { name: 'prompt', description: 'Text prompt for image generation', required: true },
              { name: 'model', description: 'Model to use', required: false },
              { name: 'aspectRatio', description: 'Image aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]', required: false },
              { name: 'n', description: 'Number of images to generate (1-9)', required: false },
              { name: 'promptOptimizer', description: 'Whether to optimize prompt', required: false },
              { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false },
              { name: 'outputFile', description: 'Output file path, auto-generated if not provided', required: false }
            ],
            inputSchema: {
              type: 'object',
              properties: {
                prompt: { type: 'string' },
                model: { type: 'string' },
                aspectRatio: { type: 'string' },
                n: { type: 'number' },
                promptOptimizer: { type: 'boolean' },
                outputDirectory: { type: 'string' },
                outputFile: { type: 'string' }
              },
              required: ['prompt']
            }
          },
          {
            name: 'generate_video',
            description: 'Generate video based on text prompt',
            arguments: [
              { name: 'prompt', description: 'Text prompt for video generation', required: true },
              { name: 'model', description: 'Model to use, values: ["T2V-01", "T2V-01-Director", "I2V-01", "I2V-01-Director", "I2V-01-live", "MiniMax-Hailuo-02"]', required: false },
              { name: 'firstFrameImage', description: 'First frame image', required: false },
              { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false },
              { name: 'outputFile', description: 'Output file path, auto-generated if not provided', required: false },
              { name: 'async_mode', description: 'Whether to use async mode. Defaults to False. If True, the video generation task will be submitted asynchronously and the response will return a task_id. Should use `query_video_generation` tool to check the status of the task and get the result', required: false },
              { name: 'resolution', description: 'The resolution of the video. The model must be "MiniMax-Hailuo-02". Values range ["768P", "1080P"]', required: false },
              { name: 'duration', description: 'The duration of the video. The model must be "MiniMax-Hailuo-02". Values can be 6 and 10.', required: false },
            ],
            inputSchema: {
              type: 'object',
              properties: {
                prompt: { type: 'string' },
                model: { type: 'string' },
                firstFrameImage: { type: 'string' },
                outputDirectory: { type: 'string' },
                outputFile: { type: 'string' },
                async_mode: { type: 'boolean' },
                resolution: { type: 'string' },
                duration: { type: 'number' }
              },
              required: ['prompt']
            }
          },
          {
            name: 'voice_clone',
            description: 'Clone voice using provided audio file',
            arguments: [
              { name: 'voiceId', description: 'Voice ID to use', required: true },
              { name: 'audioFile', description: 'Audio file path', required: true },
              { name: 'text', description: 'Text for demo audio', required: false },
              { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false },
              { name: 'isUrl', description: 'Whether audio file is a URL', required: false }
            ],
            inputSchema: {
              type: 'object',
              properties: {
                voiceId: { type: 'string' },
                audioFile: { type: 'string' },
                text: { type: 'string' },
                outputDirectory: { type: 'string' },
                isUrl: { type: 'boolean' }
              },
              required: ['voiceId', 'audioFile']
            }
          },
          {
            name: 'image_to_video',
            description: 'Generate video based on image',
            arguments: [
              { name: 'prompt', description: 'Text prompt for video generation', required: true },
              { name: 'firstFrameImage', description: 'Path to first frame image', required: true },
              { name: 'model', description: 'Model to use, values: ["I2V-01", "I2V-01-Director", "I2V-01-live"]', required: false },
              { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false },
              { name: 'outputFile', description: 'Output file path, auto-generated if not provided', required: false },
              { name: 'async_mode', description: 'Whether to use async mode. Defaults to False. If True, the video generation task will be submitted asynchronously and the response will return a task_id. Should use `query_video_generation` tool to check the status of the task and get the result', required: false }
            ],
            inputSchema: {
              type: 'object',
              properties: {
                prompt: { type: 'string' },
                firstFrameImage: { type: 'string' },
                model: { type: 'string' },
                outputDirectory: { type: 'string' },
                outputFile: { type: 'string' },
                async_mode: { type: 'boolean' }
              },
              required: ['prompt', 'firstFrameImage']
            }
          },
          {
            name: 'music_generation',
            description: 'Generate music based on text prompt and lyrics',
            arguments: [
              { name: 'prompt', description: 'Music creation inspiration describing style, mood, scene, etc.', required: true },
              { name: 'lyrics', description: 'Song lyrics for music generation.\nUse newline (\\n) to separate each line of lyrics. Supports lyric structure tags [Intro][Verse][Chorus][Bridge][Outro]\nto enhance musicality. Character range: [10, 600] (each Chinese character, punctuation, and letter counts as 1 character)', required: true },
              { name: 'sampleRate', description: 'Sample rate of generated music', required: false },
              { name: 'bitrate', description: 'Bitrate of generated music', required: false },
              { name: 'format', description: 'Format of generated music', required: false },
              { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false }
            ],
            inputSchema: {
              type: 'object',
              properties: {
                prompt: { type: 'string' },
                lyrics: { type: 'string' },
                sampleRate: { type: 'number' },
                bitrate: { type: 'number' },
                format: { type: 'string' },
                outputDirectory: { type: 'string' }
              },
              required: ['prompt', 'lyrics']
            }
          },
          {
            name: 'voice_design',
            description: 'Generate a voice based on description prompts',
            arguments: [
              { name: 'prompt', description: 'The prompt to generate the voice from', required: true },
              { name: 'previewText', description: 'The text to preview the voice', required: true },
              { name: 'voiceId', description: 'The id of the voice to use', required: false },
              { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false }
            ],
            inputSchema: {
              type: 'object',
              properties: {
                prompt: { type: 'string' },
                previewText: { type: 'string' },
                voiceId: { type: 'string' },
                outputDirectory: { type: 'string' }
              },
              required: ['prompt', 'previewText']
            }
          }
        ]
      };
    } catch (error) {
      throw this.wrapError('Failed to get tool list', error);
    }
  });

  // Call tool handler
  this.server.setRequestHandler(CallToolRequestSchema, async (request) => {
    const toolName = request.params.tool;
    const toolParams = request.params.params || {};

    try {
      // Create configuration and API instance for this request
      const requestConfig = this.getRequestConfig(request);
      const requestApi = new MiniMaxAPI(requestConfig);
      const mediaService = new MediaService(requestApi);

      // Log API key (partially hidden)
      const apiKey = this.extractApiKeyFromRequest(request);
      const maskedKey = apiKey
        ? `${apiKey.substring(0, 4)}****${apiKey.substring(apiKey.length - 4)}`
        : 'not provided';
      // console.log(`[${new Date().toISOString()}] Using API key: ${maskedKey} to call tool: ${toolName}`);

      // Choose different handler function based on tool name
      switch (toolName) {
        case 'text_to_audio':
          return await this.handleTextToAudio(toolParams, requestApi, mediaService);

        case 'list_voices':
          return await this.handleListVoices(toolParams, requestApi, mediaService);

        case 'play_audio':
          return await this.handlePlayAudio(toolParams);

        case 'text_to_image':
          return await this.handleTextToImage(toolParams, requestApi, mediaService);

        case 'generate_video':
          return await this.handleGenerateVideo(toolParams, requestApi, mediaService);

        case 'voice_clone':
          return await this.handleVoiceClone(toolParams, requestApi, mediaService);

        case 'image_to_video':
          return await this.handleImageToVideo(toolParams, requestApi, mediaService);

        case 'query_video_generation':
          return await this.handleVideoGenerationQuery(toolParams, requestApi, mediaService);
        
        case 'music_generation':
          return await this.handleGenerateMusic(toolParams, requestApi, mediaService);

        case 'voice_design':
          return await this.handleVoiceDesign(toolParams, requestApi, mediaService);

        default:
          throw new Error(`Unknown tool: ${toolName}`);
      }
    } catch (error) {
      throw this.wrapError(`Failed to call tool ${toolName}`, error);
    }
  });
}

/**
 * Handle text to speech request
 */
private async handleTextToAudio(args: any, api: MiniMaxAPI, mediaService: MediaService, attempt = 1): Promise<any> {

src/services/media-service.ts:70-78 (handler)

MediaService wrapper that delegates text-to-audio execution to TTSAPI.generateSpeech.

public async generateSpeech(params: any): Promise<string> {
  this.checkInitialized();
  try {
    return await this.ttsApi.generateSpeech(params);
  } catch (error) {
    // console.error(`[${new Date().toISOString()}] Failed to generate speech:`, error);
    throw this.wrapError('Failed to generate speech', error);
  }
}

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully reveals several important behavioral traits: the tool calls an external API (MiniMax), may incur costs, has default behaviors for missing parameters (voice ID, directory), and saves files to specific locations. However, it doesn't mention error handling, rate limits, or authentication requirements, which would be helpful for a tool with 15 parameters and API dependencies.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured with two focused paragraphs: the first explains the core functionality and default behaviors, the second provides critical usage warnings. Every sentence earns its place, with no redundancy or unnecessary elaboration. The information is front-loaded with the primary purpose immediately clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 15 parameters, no annotations, and no output schema, the description does well by covering the core purpose, defaults, cost implications, and usage constraints. However, it doesn't describe the output format or what happens after file saving (e.g., returns file path, success confirmation), which would be important given the absence of an output schema. The cost warning and explicit usage guidance compensate somewhat for these gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already comprehensively documents all 15 parameters. The description adds minimal parameter semantics beyond the schema - it mentions that voice ID and directory have defaults when not provided, but doesn't explain the relationships between parameters or provide additional context about parameter interactions. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('convert text to audio'), identifies the key resources involved ('with a given voice', 'save the output audio file'), and distinguishes it from siblings like 'play_audio' (which plays rather than creates) and 'voice_clone' (which clones rather than converts text). The verb+resource combination is precise and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Use only when explicitly requested by the user') and includes an important exclusion/warning about costs ('may incur costs'). It also distinguishes from alternatives by specifying this is for text-to-audio conversion, not other audio-related operations like playing or voice cloning available in sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

synthesize_speechC
@samihalawa/advanced-tts-mcp
generateText2Audio
@baidu-xiling/mcp
generate_speechD
@PsychArch/minimax-mcp-tools
text_to_speech
@mberg/kokoro-tts-mcp
speak
@CryptoDappDev/piper-tts-mcp
text_to_speech
@hammeiam/koroko-speech-mcp

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MiniMax-AI/MiniMax-MCP-JS'

If you have feedback or need assistance with the MCP directory API, please join our Discord server