speak_response

Convert text to speech with language and emotion customization using the Zonos TTS MCP Server.

Input Schema

TableJSON Schema

Name	Required	Default
`text`	Yes
`language`	No	en-us
`emotion`	No	neutral

Implementation Reference

src/server.ts:110-155 (handler)

The handler function for the 'speak_response' tool. It calls the Zonos TTS API with emotion parameters, saves the WAV audio to a temp file, plays it using platform-specific tools, cleans up, and returns a success message. Handles errors by throwing.

    async ({ text, language, emotion }: ZonosRequestParams) => {
        try {
            const emotionParams = this.emotionMap[emotion];
            console.log(`Converting to speech: "${text}" with ${emotion} emotion`);

            // Use new OpenAI-style endpoint
            const response = await axios.post(`${API_BASE_URL}/v1/audio/speech`, {
                model: "Zyphra/Zonos-v0.1-transformer",
                input: text,
                language: language,
                emotion: emotionParams,
                speed: 1.0,
                response_format: "wav"  // Using WAV for better compatibility
            }, {
                responseType: 'arraybuffer'
            });

            // Save the audio response to a temporary file
            const tempAudioPath = `/tmp/tts_output_${Date.now()}.wav`;
            const fs = await import('fs/promises');
            await fs.writeFile(tempAudioPath, response.data);

            // Play the audio
            await this.playAudio(tempAudioPath);

            // Clean up the temporary file
            await fs.unlink(tempAudioPath);

            return {
                content: [
                    {
                        type: "text",
                        text: `Successfully spoke: "${text}" with ${emotion} emotion`,
                    },
                ],
            };
        } catch (error) {
            const errorMessage = error instanceof Error ? error.message : "Unknown error";
            console.error("TTS Error:", errorMessage);
            if (axios.isAxiosError(error) && error.response) {
                console.error("API Response:", error.response.data);
            }
            throw new Error(`TTS failed: ${errorMessage}`);
        }
    }
);

src/server.ts:105-109 (schema)
Zod input schema for the 'speak_response' tool defining parameters: text (required string), language (string default 'en-us'), emotion (enum default 'neutral').
```
{
    text: z.string(),
          language: z.string().default("en-us"),
          emotion: z.enum(["neutral", "happy", "sad", "angry"]).default("neutral"),
},
```

src/server.ts:103-156 (registration)

Registration of the 'speak_response' tool on the MCP server via this.mcp.tool(), providing name, input schema, and inline handler function. Called within setupTools().

    this.mcp.tool(
        "speak_response",
        {
            text: z.string(),
                  language: z.string().default("en-us"),
                  emotion: z.enum(["neutral", "happy", "sad", "angry"]).default("neutral"),
        },
        async ({ text, language, emotion }: ZonosRequestParams) => {
            try {
                const emotionParams = this.emotionMap[emotion];
                console.log(`Converting to speech: "${text}" with ${emotion} emotion`);

                // Use new OpenAI-style endpoint
                const response = await axios.post(`${API_BASE_URL}/v1/audio/speech`, {
                    model: "Zyphra/Zonos-v0.1-transformer",
                    input: text,
                    language: language,
                    emotion: emotionParams,
                    speed: 1.0,
                    response_format: "wav"  // Using WAV for better compatibility
                }, {
                    responseType: 'arraybuffer'
                });

                // Save the audio response to a temporary file
                const tempAudioPath = `/tmp/tts_output_${Date.now()}.wav`;
                const fs = await import('fs/promises');
                await fs.writeFile(tempAudioPath, response.data);

                // Play the audio
                await this.playAudio(tempAudioPath);

                // Clean up the temporary file
                await fs.unlink(tempAudioPath);

                return {
                    content: [
                        {
                            type: "text",
                            text: `Successfully spoke: "${text}" with ${emotion} emotion`,
                        },
                    ],
                };
            } catch (error) {
                const errorMessage = error instanceof Error ? error.message : "Unknown error";
                console.error("TTS Error:", errorMessage);
                if (axios.isAxiosError(error) && error.response) {
                    console.error("API Response:", error.response.data);
                }
                throw new Error(`TTS failed: ${errorMessage}`);
            }
        }
    );
}

src/server.ts:158-189 (helper)

Supporting helper function called by the handler to play the generated TTS audio file using platform-specific commands (afplay on macOS, paplay on Linux with PulseAudio env, PowerShell on Windows).

private async playAudio(audioPath: string): Promise<void> {
    try {
        console.log("Playing audio from:", audioPath);

        switch (process.platform) {
            case "darwin":
                await execAsync(`afplay ${audioPath}`);
                break;
            case "linux":
                // Try paplay for PulseAudio
                const XDG_RUNTIME_DIR = process.env.XDG_RUNTIME_DIR || '/run/user/1000';
                const env = {
                    ...process.env,
                    PULSE_SERVER: `unix:${XDG_RUNTIME_DIR}/pulse/native`,
                    PULSE_COOKIE: `${process.env.HOME}/.config/pulse/cookie`
                };
                await execAsync(`paplay ${audioPath}`, { env });
                break;
            case "win32":
                await execAsync(
                    `powershell -c (New-Object Media.SoundPlayer '${audioPath}').PlaySync()`
                );
                break;
            default:
                throw new Error(`Unsupported platform: ${process.platform}`);
        }
    } catch (error) {
        const errorMessage = error instanceof Error ? error.message : "Unknown error";
        console.error("Audio playback error:", errorMessage);
        throw new Error(`Audio playback failed: ${errorMessage}`);
    }
}

src/server.ts:36-41 (schema)
TypeScript interface defining the parameters for the Zonos TTS request, used in the handler signature.
```
interface ZonosRequestParams {
    text: string;
    language: string;
    emotion: Emotion;
}
```

Zonos TTS MCP Server

speak_response

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API