speak
Enable voice communication by converting text to speech using Rime's API. Speak aloud messages or command completions with customizable voice and speed settings.
Instructions
Speak text aloud using Rime's text-to-speech API. Should be used when user asks you to speak or to announce and explain when you finish a command
User configuration:
WHO_TO_ADDRESS: user
WHEN_TO_SPEAK: when asked to speak or when finishing a command
VOICE: cove
GUIDANCE: Use the speak tool when you need to communicate with the user via voice. The speech should be clear, concise, and convey the intended message effectively.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| reduceLatency | No | Whether to optimize for lower latency (default: false) | |
| speaker | No | The voice to use (defaults to 'cove') | |
| speedAlpha | No | Speech speed multiplier (default: 1.0) | |
| text | Yes | The text to speak aloud |
Implementation Reference
- index.ts:97-130 (handler)The doSpeak function implements the core execution logic for the "speak" tool, processing parameters and delegating TTS playback to playText.async function doSpeak(params: { text: string; speaker?: string; speedAlpha?: number; reduceLatency?: boolean; }) { try { // Use the playText function from stream-audio.ts await playText(params.text, { speaker: params.speaker || "cove", speedAlpha: params.speedAlpha || 1.0, reduceLatency: params.reduceLatency || false, }); return { content: [ { type: "text", text: JSON.stringify({ success: true, text: params.text, speaker: params.speaker || "cove", }), }, ], }; } catch (error: unknown) { log("ERROR", `Error: ${error instanceof Error ? error.message : String(error)}`); throw new McpError( ErrorCode.InternalError, `Rime API error: ${error instanceof Error ? error.message : String(error)}` ); } }
- index.ts:51-87 (schema)The SPEAK_TOOL constant defines the tool's schema, including name, description, and input validation schema.const SPEAK_TOOL: Tool = { name: "speak", description: `Speak text aloud using Rime's text-to-speech API. Should be used when user asks you to speak or to announce and explain when you finish a command User configuration: ${WHO_TO_ADDRESS ? `WHO_TO_ADDRESS: ${WHO_TO_ADDRESS}` : ""} ${WHEN_TO_SPEAK ? `WHEN_TO_SPEAK: ${WHEN_TO_SPEAK}` : ""} ${VOICE ? `VOICE: ${VOICE}` : ""} ${GUIDANCE ? `GUIDANCE: ${GUIDANCE}` : ""} `, inputSchema: { type: "object", properties: { text: { type: "string", description: "The text to speak aloud", }, speaker: { type: "string", description: `The voice to use (defaults to '${VOICE}')`, }, speedAlpha: { type: "number", description: "Speech speed multiplier (default: 1.0)", }, reduceLatency: { type: "boolean", description: "Whether to optimize for lower latency (default: false)", }, }, required: ["text"], }, };
- index.ts:89-91 (registration)Registration of the "speak" tool in the MCP server's list of available tools via ListToolsRequestSchema handler.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [SPEAK_TOOL], }));
- index.ts:132-145 (registration)The CallToolRequestSchema handler dispatches calls to the "speak" tool by invoking the doSpeak function.server.setRequestHandler(CallToolRequestSchema, async (request) => { if (request.params.name === "speak") { console.error("Speak tool called with:", request.params.arguments); const input = request.params.arguments as { text: string; speaker?: string; speedAlpha?: number; reduceLatency?: boolean; }; return doSpeak(input); } throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}`); });
- stream-audio.ts:94-189 (helper)The playText helper function performs the Rime TTS API request, handles audio streaming/download, and manages playback using system audio players.export async function playText(text: string, customConfig?: Partial<TtsConfig>): Promise<void> { const config: TtsConfig = { ...DEFAULT_CONFIG, ...customConfig }; console.error("Starting Rime TTS with text:"); console.error(`"${text}"`); try { const apiKey = getApiKey(); // Create temporary directory for audio files const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "rime-stream-")); const audioFilePath = path.join(tmpDir, "audio.mp3"); const cleanup = () => { try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch (error) { console.error("Failed to clean up temporary directory:", error); } }; // Prepare API request const modelId = findModelId(config.speaker); const options = { method: "POST", headers: { Accept: "audio/mp3", Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json", }, body: JSON.stringify({ speaker: config.speaker, text: text, modelId: modelId, lang: "eng", samplingRate: config.samplingRate, speedAlpha: config.speedAlpha, reduceLatency: config.reduceLatency, }), }; // Make API request console.error("Sending request to Rime API..."); const response = await fetch("https://users.rime.ai/v1/rime-tts", options); if (!response.ok) { const errorText = await response.text(); throw new Error( `API request failed: ${response.status} ${response.statusText} - ${errorText}` ); } // Get audio data as arrayBuffer const audioBuffer = await response.arrayBuffer(); // Write audio data to file fs.writeFileSync(audioFilePath, Buffer.from(audioBuffer)); console.error(`Audio saved to ${audioFilePath}`); return new Promise((resolve, reject) => { try { console.error("Starting audio playback..."); const player = getAudioPlayerCommand(); const playerProcess = spawn(player.cmd, [...player.args, audioFilePath]); playerProcess.stdout?.on("data", (data) => { console.error(`Player output: ${data}`); }); playerProcess.stderr?.on("data", (data) => { console.error(`Player error: ${data}`); }); playerProcess.on("close", (code) => { console.error(`Player process exited with code ${code || 0}`); cleanup(); resolve(); }); playerProcess.on("error", (error: Error) => { console.error("Player process error:", error); cleanup(); reject(error); }); } catch (err) { cleanup(); reject(err); } }); } catch (error) { console.error("Error:", error); throw error; } }