speak_response
Generate spoken responses from text in multiple languages and emotions using Claude integration. Allows customization of tone and language for precise TTS output.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| emotion | No | neutral | |
| language | No | en-us | |
| text | Yes |
Implementation Reference
- src/server.ts:103-155 (registration)Registration of the 'speak_response' tool with MCP server, including input schema and inline handler function that generates and plays TTS audio with specified emotion.this.mcp.tool( "speak_response", { text: z.string(), language: z.string().default("en-us"), emotion: z.enum(["neutral", "happy", "sad", "angry"]).default("neutral"), }, async ({ text, language, emotion }: ZonosRequestParams) => { try { const emotionParams = this.emotionMap[emotion]; console.log(`Converting to speech: "${text}" with ${emotion} emotion`); // Use new OpenAI-style endpoint const response = await axios.post(`${API_BASE_URL}/v1/audio/speech`, { model: "Zyphra/Zonos-v0.1-transformer", input: text, language: language, emotion: emotionParams, speed: 1.0, response_format: "wav" // Using WAV for better compatibility }, { responseType: 'arraybuffer' }); // Save the audio response to a temporary file const tempAudioPath = `/tmp/tts_output_${Date.now()}.wav`; const fs = await import('fs/promises'); await fs.writeFile(tempAudioPath, response.data); // Play the audio await this.playAudio(tempAudioPath); // Clean up the temporary file await fs.unlink(tempAudioPath); return { content: [ { type: "text", text: `Successfully spoke: "${text}" with ${emotion} emotion`, }, ], }; } catch (error) { const errorMessage = error instanceof Error ? error.message : "Unknown error"; console.error("TTS Error:", errorMessage); if (axios.isAxiosError(error) && error.response) { console.error("API Response:", error.response.data); } throw new Error(`TTS failed: ${errorMessage}`); } } );
- src/server.ts:110-154 (handler)The core handler function for the speak_response tool. It calls a local TTS API with emotion parameters to generate WAV audio, saves it temporarily, plays it using platform-specific playback, cleans up, and returns a success message.async ({ text, language, emotion }: ZonosRequestParams) => { try { const emotionParams = this.emotionMap[emotion]; console.log(`Converting to speech: "${text}" with ${emotion} emotion`); // Use new OpenAI-style endpoint const response = await axios.post(`${API_BASE_URL}/v1/audio/speech`, { model: "Zyphra/Zonos-v0.1-transformer", input: text, language: language, emotion: emotionParams, speed: 1.0, response_format: "wav" // Using WAV for better compatibility }, { responseType: 'arraybuffer' }); // Save the audio response to a temporary file const tempAudioPath = `/tmp/tts_output_${Date.now()}.wav`; const fs = await import('fs/promises'); await fs.writeFile(tempAudioPath, response.data); // Play the audio await this.playAudio(tempAudioPath); // Clean up the temporary file await fs.unlink(tempAudioPath); return { content: [ { type: "text", text: `Successfully spoke: "${text}" with ${emotion} emotion`, }, ], }; } catch (error) { const errorMessage = error instanceof Error ? error.message : "Unknown error"; console.error("TTS Error:", errorMessage); if (axios.isAxiosError(error) && error.response) { console.error("API Response:", error.response.data); } throw new Error(`TTS failed: ${errorMessage}`); } }
- src/server.ts:105-109 (schema)Zod input schema defining parameters for the speak_response tool: text (required), language (default en-us), emotion (default neutral).{ text: z.string(), language: z.string().default("en-us"), emotion: z.enum(["neutral", "happy", "sad", "angry"]).default("neutral"), },
- src/server.ts:158-189 (helper)Helper method to play the generated audio file using platform-specific commands (afplay on macOS, paplay on Linux, PowerShell on Windows). Called by the speak_response handler.private async playAudio(audioPath: string): Promise<void> { try { console.log("Playing audio from:", audioPath); switch (process.platform) { case "darwin": await execAsync(`afplay ${audioPath}`); break; case "linux": // Try paplay for PulseAudio const XDG_RUNTIME_DIR = process.env.XDG_RUNTIME_DIR || '/run/user/1000'; const env = { ...process.env, PULSE_SERVER: `unix:${XDG_RUNTIME_DIR}/pulse/native`, PULSE_COOKIE: `${process.env.HOME}/.config/pulse/cookie` }; await execAsync(`paplay ${audioPath}`, { env }); break; case "win32": await execAsync( `powershell -c (New-Object Media.SoundPlayer '${audioPath}').PlaySync()` ); break; default: throw new Error(`Unsupported platform: ${process.platform}`); } } catch (error) { const errorMessage = error instanceof Error ? error.message : "Unknown error"; console.error("Audio playback error:", errorMessage); throw new Error(`Audio playback failed: ${errorMessage}`); } }
- src/server.ts:56-97 (helper)Emotion parameter mappings used by the speak_response handler to configure TTS emotions for the API call.this.emotionMap = { neutral: { happiness: 0.2, sadness: 0.2, anger: 0.2, disgust: 0.05, fear: 0.05, surprise: 0.1, other: 0.1, neutral: 0.8, }, happy: { happiness: 1, sadness: 0.05, anger: 0.05, disgust: 0.05, fear: 0.05, surprise: 0.2, other: 0.1, neutral: 0.2, }, sad: { happiness: 0.05, sadness: 1, anger: 0.05, disgust: 0.2, fear: 0.2, surprise: 0.05, other: 0.1, neutral: 0.2, }, angry: { happiness: 0.05, sadness: 0.2, anger: 1, disgust: 0.4, fear: 0.2, surprise: 0.2, other: 0.1, neutral: 0.1, }, };