speak
Convert text to speech using Rime's API for audio output when users request spoken responses or need verbal explanations after completing commands.
Instructions
Speak text aloud using Rime's text-to-speech API. Should be used when user asks you to speak or to announce and explain when you finish a command
User configuration:
WHO_TO_ADDRESS: user
WHEN_TO_SPEAK: when asked to speak or when finishing a command
VOICE: cove
GUIDANCE: Use the speak tool to convert text to speech when the user requests audio output or when providing verbal responses
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | The text to speak aloud | |
| speaker | No | The voice to use (defaults to 'cove') | |
| speedAlpha | No | Speech speed multiplier (default: 1.0) | |
| reduceLatency | No | Whether to optimize for lower latency (default: false) |
Implementation Reference
- index.ts:97-130 (handler)The doSpeak function implements the core execution logic for the 'speak' tool, handling parameters and delegating TTS to playText while returning structured content.async function doSpeak(params: { text: string; speaker?: string; speedAlpha?: number; reduceLatency?: boolean; }) { try { // Use the playText function from stream-audio.ts await playText(params.text, { speaker: params.speaker || "cove", speedAlpha: params.speedAlpha || 1.0, reduceLatency: params.reduceLatency || false, }); return { content: [ { type: "text", text: JSON.stringify({ success: true, text: params.text, speaker: params.speaker || "cove", }), }, ], }; } catch (error: unknown) { log("ERROR", `Error: ${error instanceof Error ? error.message : String(error)}`); throw new McpError( ErrorCode.InternalError, `Rime API error: ${error instanceof Error ? error.message : String(error)}` ); } }
- index.ts:51-87 (schema)Defines the SPEAK_TOOL object, including name, description, and detailed inputSchema for validating 'speak' tool parameters.const SPEAK_TOOL: Tool = { name: "speak", description: `Speak text aloud using Rime's text-to-speech API. Should be used when user asks you to speak or to announce and explain when you finish a command User configuration: ${WHO_TO_ADDRESS ? `WHO_TO_ADDRESS: ${WHO_TO_ADDRESS}` : ""} ${WHEN_TO_SPEAK ? `WHEN_TO_SPEAK: ${WHEN_TO_SPEAK}` : ""} ${VOICE ? `VOICE: ${VOICE}` : ""} ${GUIDANCE ? `GUIDANCE: ${GUIDANCE}` : ""} `, inputSchema: { type: "object", properties: { text: { type: "string", description: "The text to speak aloud", }, speaker: { type: "string", description: `The voice to use (defaults to '${VOICE}')`, }, speedAlpha: { type: "number", description: "Speech speed multiplier (default: 1.0)", }, reduceLatency: { type: "boolean", description: "Whether to optimize for lower latency (default: false)", }, }, required: ["text"], }, };
- index.ts:89-91 (registration)Registers the 'speak' tool by including it in the response to ListToolsRequestSchema.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [SPEAK_TOOL], }));
- index.ts:132-145 (handler)The CallToolRequestSchema handler dispatches calls to the 'speak' tool by invoking the doSpeak function.server.setRequestHandler(CallToolRequestSchema, async (request) => { if (request.params.name === "speak") { console.error("Speak tool called with:", request.params.arguments); const input = request.params.arguments as { text: string; speaker?: string; speedAlpha?: number; reduceLatency?: boolean; }; return doSpeak(input); } throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}`); });
- stream-audio.ts:94-189 (helper)Supporting function playText that handles the Rime TTS API call, audio file management, and playback using system audio players.export async function playText(text: string, customConfig?: Partial<TtsConfig>): Promise<void> { const config: TtsConfig = { ...DEFAULT_CONFIG, ...customConfig }; console.error("Starting Rime TTS with text:"); console.error(`"${text}"`); try { const apiKey = getApiKey(); // Create temporary directory for audio files const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "rime-stream-")); const audioFilePath = path.join(tmpDir, "audio.mp3"); const cleanup = () => { try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch (error) { console.error("Failed to clean up temporary directory:", error); } }; // Prepare API request const modelId = findModelId(config.speaker); const options = { method: "POST", headers: { Accept: "audio/mp3", Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json", }, body: JSON.stringify({ speaker: config.speaker, text: text, modelId: modelId, lang: "eng", samplingRate: config.samplingRate, speedAlpha: config.speedAlpha, reduceLatency: config.reduceLatency, }), }; // Make API request console.error("Sending request to Rime API..."); const response = await fetch("https://users.rime.ai/v1/rime-tts", options); if (!response.ok) { const errorText = await response.text(); throw new Error( `API request failed: ${response.status} ${response.statusText} - ${errorText}` ); } // Get audio data as arrayBuffer const audioBuffer = await response.arrayBuffer(); // Write audio data to file fs.writeFileSync(audioFilePath, Buffer.from(audioBuffer)); console.error(`Audio saved to ${audioFilePath}`); return new Promise((resolve, reject) => { try { console.error("Starting audio playback..."); const player = getAudioPlayerCommand(); const playerProcess = spawn(player.cmd, [...player.args, audioFilePath]); playerProcess.stdout?.on("data", (data) => { console.error(`Player output: ${data}`); }); playerProcess.stderr?.on("data", (data) => { console.error(`Player error: ${data}`); }); playerProcess.on("close", (code) => { console.error(`Player process exited with code ${code || 0}`); cleanup(); resolve(); }); playerProcess.on("error", (error: Error) => { console.error("Player process error:", error); cleanup(); reject(error); }); } catch (err) { cleanup(); reject(err); } }); } catch (error) { console.error("Error:", error); throw error; } }