Skip to main content
Glama

text_to_speech

Convert text into spoken audio using OpenAI's TTS technology, with options for different voices, models, and audio formats. The generated audio can be saved to a file and optionally played automatically.

Instructions

Converts text into spoken audio using OpenAI TTS (default voice: alloy), saves it to a file, and optionally plays it.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
inputYesThe text to synthesize into speech.
modelNoThe TTS model to use.tts-1
playNoWhether to automatically play the generated audio file.
response_formatNoThe format of the audio response.mp3
voiceNoOptional: The voice to use. Overrides the configured default (alloy).

Implementation Reference

  • Handler function for CallToolRequestSchema that dispatches to text_to_speech implementation, including argument validation, OpenAI TTS generation, audio file saving, optional playback, and error handling.
    server.setRequestHandler(CallToolRequestSchema, async (request) => {
      if (request.params.name !== TEXT_TO_SPEECH_TOOL_NAME) {
        throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}`);
      }
    
      if (!isValidTextToSpeechArgs(request.params.arguments)) {
        throw new McpError(ErrorCode.InvalidParams, "Invalid arguments for text_to_speech tool.");
      }
    
      const {
        input,
        // Use voice from args if provided, otherwise use the configured DEFAULT_VOICE
        voice = DEFAULT_VOICE,
        model = "tts-1",
        response_format = "mp3",
        play = false,
      } = request.params.arguments;
    
      // Ensure the final voice is valid (handles case where default might somehow be invalid, though unlikely with validation above)
      const finalVoice: AllowedVoice = (ALLOWED_VOICES as readonly string[]).includes(voice) ? voice : DEFAULT_VOICE;
    
    
      let playbackMessage = "";
    
      try {
        if (!fs.existsSync(OUTPUT_DIR)) {
          fs.mkdirSync(OUTPUT_DIR, { recursive: true });
          console.error(`Created output directory: ${OUTPUT_DIR}`);
        }
    
        console.error(`Generating speech with voice: ${finalVoice}`); // Log the voice being used
    
        const speechResponse = await openai.audio.speech.create({
          model: model,
          voice: finalVoice, // Use the validated final voice
          input: input,
          response_format: response_format,
        });
    
        const audioBuffer = Buffer.from(await speechResponse.arrayBuffer());
        const timestamp = Date.now();
        const filename = `speech_${timestamp}.${response_format}`;
        const filePath = path.join(OUTPUT_DIR, filename);
        const relativeFilePath = path.relative(process.cwd(), filePath);
    
        fs.writeFileSync(filePath, audioBuffer);
        console.error(`Audio saved to: ${filePath}`);
    
        if (play) {
          const command = `${AUDIO_PLAYER_COMMAND} "${filePath}"`;
          console.error(`Attempting to play audio with command: ${command}`);
          exec(command, (error, stdout, stderr) => {
            if (error) console.error(`Playback Error: ${error.message}`);
            if (stderr) console.error(`Playback Stderr: ${stderr}`);
            if (stdout) console.error(`Playback stdout: ${stdout}`);
          });
          playbackMessage = ` Playback initiated using command: ${AUDIO_PLAYER_COMMAND}.`;
        }
    
        return {
          content: [
            {
              type: "text",
              text: JSON.stringify({
                message: `Audio saved successfully.${playbackMessage}`,
                filePath: relativeFilePath,
                format: response_format,
                voiceUsed: finalVoice, // Inform client which voice was actually used
              }),
              mimeType: "application/json",
            },
          ],
        };
      } catch (error) {
        let errorMessage = "Failed to generate speech.";
        if (error instanceof APIError) {
          errorMessage = `OpenAI API Error (${error.status}): ${error.message}`;
        } else if (error instanceof Error) {
          errorMessage = error.message;
        }
        console.error(`[${TEXT_TO_SPEECH_TOOL_NAME} Error]`, errorMessage, error);
        return {
            content: [{ type: "text", text: errorMessage }],
            isError: true
        }
      }
    });
  • src/index.ts:68-109 (registration)
    Tool registration via ListToolsRequestSchema, specifying name, description, and detailed input schema for text_to_speech.
    server.setRequestHandler(ListToolsRequestSchema, async () => {
      return {
        tools: [
          {
            name: TEXT_TO_SPEECH_TOOL_NAME,
            description: `Converts text into spoken audio using OpenAI TTS (default voice: ${DEFAULT_VOICE}), saves it to a file, and optionally plays it.`, // Updated description
            inputSchema: {
              type: "object",
              properties: {
                input: {
                  type: "string",
                  description: "The text to synthesize into speech.",
                },
                voice: {
                  type: "string",
                  description: `Optional: The voice to use. Overrides the configured default (${DEFAULT_VOICE}).`,
                  enum: [...ALLOWED_VOICES], // Use the defined constant
                },
                model: {
                  type: "string",
                  description: "The TTS model to use.",
                  enum: ["tts-1", "tts-1-hd"],
                  default: "tts-1",
                },
                response_format: {
                  type: "string",
                  description: "The format of the audio response.",
                  enum: ["mp3", "opus", "aac", "flac"],
                  default: "mp3",
                },
                play: {
                  type: "boolean",
                  description: "Whether to automatically play the generated audio file.",
                  default: false,
                }
              },
              required: ["input"],
            },
          },
        ],
      };
    });
  • TypeScript type definition mirroring the input schema for text_to_speech arguments.
    type TextToSpeechArgs = {
      input: string;
      voice?: AllowedVoice; // Use the specific type
      model?: "tts-1" | "tts-1-hd";
      response_format?: "mp3" | "opus" | "aac" | "flac";
      play?: boolean;
    };
  • Helper function to validate arguments for the text_to_speech tool.
    function isValidTextToSpeechArgs(args: any): args is TextToSpeechArgs {
      return (
        typeof args === "object" &&
        args !== null &&
        typeof args.input === "string" &&
        (args.voice === undefined || (ALLOWED_VOICES as readonly string[]).includes(args.voice)) && // Validate against allowed voices
        (args.model === undefined || ["tts-1", "tts-1-hd"].includes(args.model)) &&
        (args.response_format === undefined || ["mp3", "opus", "aac", "flac"].includes(args.response_format)) &&
        (args.play === undefined || typeof args.play === 'boolean')
      );
    }
  • Constant defining the tool name.
    const TEXT_TO_SPEECH_TOOL_NAME = "text_to_speech";
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pinkpixel-dev/blabber-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server