text_to_speech
Convert text into spoken audio using OpenAI's TTS technology, with options for different voices, models, and audio formats. The generated audio can be saved to a file and optionally played automatically.
Instructions
Converts text into spoken audio using OpenAI TTS (default voice: alloy), saves it to a file, and optionally plays it.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input | Yes | The text to synthesize into speech. | |
| model | No | The TTS model to use. | tts-1 |
| play | No | Whether to automatically play the generated audio file. | |
| response_format | No | The format of the audio response. | mp3 |
| voice | No | Optional: The voice to use. Overrides the configured default (alloy). |
Implementation Reference
- src/index.ts:134-220 (handler)Handler function for CallToolRequestSchema that dispatches to text_to_speech implementation, including argument validation, OpenAI TTS generation, audio file saving, optional playback, and error handling.server.setRequestHandler(CallToolRequestSchema, async (request) => { if (request.params.name !== TEXT_TO_SPEECH_TOOL_NAME) { throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}`); } if (!isValidTextToSpeechArgs(request.params.arguments)) { throw new McpError(ErrorCode.InvalidParams, "Invalid arguments for text_to_speech tool."); } const { input, // Use voice from args if provided, otherwise use the configured DEFAULT_VOICE voice = DEFAULT_VOICE, model = "tts-1", response_format = "mp3", play = false, } = request.params.arguments; // Ensure the final voice is valid (handles case where default might somehow be invalid, though unlikely with validation above) const finalVoice: AllowedVoice = (ALLOWED_VOICES as readonly string[]).includes(voice) ? voice : DEFAULT_VOICE; let playbackMessage = ""; try { if (!fs.existsSync(OUTPUT_DIR)) { fs.mkdirSync(OUTPUT_DIR, { recursive: true }); console.error(`Created output directory: ${OUTPUT_DIR}`); } console.error(`Generating speech with voice: ${finalVoice}`); // Log the voice being used const speechResponse = await openai.audio.speech.create({ model: model, voice: finalVoice, // Use the validated final voice input: input, response_format: response_format, }); const audioBuffer = Buffer.from(await speechResponse.arrayBuffer()); const timestamp = Date.now(); const filename = `speech_${timestamp}.${response_format}`; const filePath = path.join(OUTPUT_DIR, filename); const relativeFilePath = path.relative(process.cwd(), filePath); fs.writeFileSync(filePath, audioBuffer); console.error(`Audio saved to: ${filePath}`); if (play) { const command = `${AUDIO_PLAYER_COMMAND} "${filePath}"`; console.error(`Attempting to play audio with command: ${command}`); exec(command, (error, stdout, stderr) => { if (error) console.error(`Playback Error: ${error.message}`); if (stderr) console.error(`Playback Stderr: ${stderr}`); if (stdout) console.error(`Playback stdout: ${stdout}`); }); playbackMessage = ` Playback initiated using command: ${AUDIO_PLAYER_COMMAND}.`; } return { content: [ { type: "text", text: JSON.stringify({ message: `Audio saved successfully.${playbackMessage}`, filePath: relativeFilePath, format: response_format, voiceUsed: finalVoice, // Inform client which voice was actually used }), mimeType: "application/json", }, ], }; } catch (error) { let errorMessage = "Failed to generate speech."; if (error instanceof APIError) { errorMessage = `OpenAI API Error (${error.status}): ${error.message}`; } else if (error instanceof Error) { errorMessage = error.message; } console.error(`[${TEXT_TO_SPEECH_TOOL_NAME} Error]`, errorMessage, error); return { content: [{ type: "text", text: errorMessage }], isError: true } } });
- src/index.ts:68-109 (registration)Tool registration via ListToolsRequestSchema, specifying name, description, and detailed input schema for text_to_speech.server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools: [ { name: TEXT_TO_SPEECH_TOOL_NAME, description: `Converts text into spoken audio using OpenAI TTS (default voice: ${DEFAULT_VOICE}), saves it to a file, and optionally plays it.`, // Updated description inputSchema: { type: "object", properties: { input: { type: "string", description: "The text to synthesize into speech.", }, voice: { type: "string", description: `Optional: The voice to use. Overrides the configured default (${DEFAULT_VOICE}).`, enum: [...ALLOWED_VOICES], // Use the defined constant }, model: { type: "string", description: "The TTS model to use.", enum: ["tts-1", "tts-1-hd"], default: "tts-1", }, response_format: { type: "string", description: "The format of the audio response.", enum: ["mp3", "opus", "aac", "flac"], default: "mp3", }, play: { type: "boolean", description: "Whether to automatically play the generated audio file.", default: false, } }, required: ["input"], }, }, ], }; });
- src/index.ts:113-119 (schema)TypeScript type definition mirroring the input schema for text_to_speech arguments.type TextToSpeechArgs = { input: string; voice?: AllowedVoice; // Use the specific type model?: "tts-1" | "tts-1-hd"; response_format?: "mp3" | "opus" | "aac" | "flac"; play?: boolean; };
- src/index.ts:122-132 (helper)Helper function to validate arguments for the text_to_speech tool.function isValidTextToSpeechArgs(args: any): args is TextToSpeechArgs { return ( typeof args === "object" && args !== null && typeof args.input === "string" && (args.voice === undefined || (ALLOWED_VOICES as readonly string[]).includes(args.voice)) && // Validate against allowed voices (args.model === undefined || ["tts-1", "tts-1-hd"].includes(args.model)) && (args.response_format === undefined || ["mp3", "opus", "aac", "flac"].includes(args.response_format)) && (args.play === undefined || typeof args.play === 'boolean') ); }
- src/index.ts:66-66 (helper)Constant defining the tool name.const TEXT_TO_SPEECH_TOOL_NAME = "text_to_speech";