Skip to main content
Glama
PhialsBasement

Zonos TTS MCP Server

speak_response

Convert text to speech with language and emotion customization using the Zonos TTS MCP Server.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textYes
languageNoen-us
emotionNoneutral

Implementation Reference

  • The handler function for the 'speak_response' tool. It calls the Zonos TTS API with emotion parameters, saves the WAV audio to a temp file, plays it using platform-specific tools, cleans up, and returns a success message. Handles errors by throwing.
        async ({ text, language, emotion }: ZonosRequestParams) => {
            try {
                const emotionParams = this.emotionMap[emotion];
                console.log(`Converting to speech: "${text}" with ${emotion} emotion`);
    
                // Use new OpenAI-style endpoint
                const response = await axios.post(`${API_BASE_URL}/v1/audio/speech`, {
                    model: "Zyphra/Zonos-v0.1-transformer",
                    input: text,
                    language: language,
                    emotion: emotionParams,
                    speed: 1.0,
                    response_format: "wav"  // Using WAV for better compatibility
                }, {
                    responseType: 'arraybuffer'
                });
    
                // Save the audio response to a temporary file
                const tempAudioPath = `/tmp/tts_output_${Date.now()}.wav`;
                const fs = await import('fs/promises');
                await fs.writeFile(tempAudioPath, response.data);
    
                // Play the audio
                await this.playAudio(tempAudioPath);
    
                // Clean up the temporary file
                await fs.unlink(tempAudioPath);
    
                return {
                    content: [
                        {
                            type: "text",
                            text: `Successfully spoke: "${text}" with ${emotion} emotion`,
                        },
                    ],
                };
            } catch (error) {
                const errorMessage = error instanceof Error ? error.message : "Unknown error";
                console.error("TTS Error:", errorMessage);
                if (axios.isAxiosError(error) && error.response) {
                    console.error("API Response:", error.response.data);
                }
                throw new Error(`TTS failed: ${errorMessage}`);
            }
        }
    );
  • Zod input schema for the 'speak_response' tool defining parameters: text (required string), language (string default 'en-us'), emotion (enum default 'neutral').
    {
        text: z.string(),
              language: z.string().default("en-us"),
              emotion: z.enum(["neutral", "happy", "sad", "angry"]).default("neutral"),
    },
  • src/server.ts:103-156 (registration)
    Registration of the 'speak_response' tool on the MCP server via this.mcp.tool(), providing name, input schema, and inline handler function. Called within setupTools().
        this.mcp.tool(
            "speak_response",
            {
                text: z.string(),
                      language: z.string().default("en-us"),
                      emotion: z.enum(["neutral", "happy", "sad", "angry"]).default("neutral"),
            },
            async ({ text, language, emotion }: ZonosRequestParams) => {
                try {
                    const emotionParams = this.emotionMap[emotion];
                    console.log(`Converting to speech: "${text}" with ${emotion} emotion`);
    
                    // Use new OpenAI-style endpoint
                    const response = await axios.post(`${API_BASE_URL}/v1/audio/speech`, {
                        model: "Zyphra/Zonos-v0.1-transformer",
                        input: text,
                        language: language,
                        emotion: emotionParams,
                        speed: 1.0,
                        response_format: "wav"  // Using WAV for better compatibility
                    }, {
                        responseType: 'arraybuffer'
                    });
    
                    // Save the audio response to a temporary file
                    const tempAudioPath = `/tmp/tts_output_${Date.now()}.wav`;
                    const fs = await import('fs/promises');
                    await fs.writeFile(tempAudioPath, response.data);
    
                    // Play the audio
                    await this.playAudio(tempAudioPath);
    
                    // Clean up the temporary file
                    await fs.unlink(tempAudioPath);
    
                    return {
                        content: [
                            {
                                type: "text",
                                text: `Successfully spoke: "${text}" with ${emotion} emotion`,
                            },
                        ],
                    };
                } catch (error) {
                    const errorMessage = error instanceof Error ? error.message : "Unknown error";
                    console.error("TTS Error:", errorMessage);
                    if (axios.isAxiosError(error) && error.response) {
                        console.error("API Response:", error.response.data);
                    }
                    throw new Error(`TTS failed: ${errorMessage}`);
                }
            }
        );
    }
  • Supporting helper function called by the handler to play the generated TTS audio file using platform-specific commands (afplay on macOS, paplay on Linux with PulseAudio env, PowerShell on Windows).
    private async playAudio(audioPath: string): Promise<void> {
        try {
            console.log("Playing audio from:", audioPath);
    
            switch (process.platform) {
                case "darwin":
                    await execAsync(`afplay ${audioPath}`);
                    break;
                case "linux":
                    // Try paplay for PulseAudio
                    const XDG_RUNTIME_DIR = process.env.XDG_RUNTIME_DIR || '/run/user/1000';
                    const env = {
                        ...process.env,
                        PULSE_SERVER: `unix:${XDG_RUNTIME_DIR}/pulse/native`,
                        PULSE_COOKIE: `${process.env.HOME}/.config/pulse/cookie`
                    };
                    await execAsync(`paplay ${audioPath}`, { env });
                    break;
                case "win32":
                    await execAsync(
                        `powershell -c (New-Object Media.SoundPlayer '${audioPath}').PlaySync()`
                    );
                    break;
                default:
                    throw new Error(`Unsupported platform: ${process.platform}`);
            }
        } catch (error) {
            const errorMessage = error instanceof Error ? error.message : "Unknown error";
            console.error("Audio playback error:", errorMessage);
            throw new Error(`Audio playback failed: ${errorMessage}`);
        }
    }
  • TypeScript interface defining the parameters for the Zonos TTS request, used in the handler signature.
    interface ZonosRequestParams {
        text: string;
        language: string;
        emotion: Emotion;
    }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PhialsBasement/Zonos-TTS-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server