Genkit MCP

Official

Overview Schema Related Servers Score Discussions

generating-speech.prompt•5.15 KiB

--- title: Generating Speech with Gemini description: read this to understand how to generate realistic speech audio from a text script --- The Google Genai plugin provides access to text-to-speech capabilities through Gemini TTS models. These models can convert text into natural-sounding speech for various applications. #### Basic Usage To generate audio using a TTS model: ```ts import { googleAI } from '@genkit-ai/google-genai'; import { writeFile } from 'node:fs/promises'; import wav from 'wav'; // npm install wav && npm install -D @types/wav const ai = genkit({ plugins: [googleAI()], }); const { media } = await ai.generate({ model: googleAI.model('gemini-2.5-flash-preview-tts'), config: { responseModalities: ['AUDIO'], speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Algenib' }, }, }, }, prompt: 'Say that Genkit is an amazing Gen AI library', }); if (!media) { throw new Error('no media returned'); } const audioBuffer = Buffer.from(media.url.substring(media.url.indexOf(',') + 1), 'base64'); // The googleAI plugin returns raw PCM data, which we convert to WAV format. await writeFile('output.wav', await toWav(audioBuffer)); async function toWav(pcmData: Buffer, channels = 1, rate = 24000, sampleWidth = 2): Promise<string> { return new Promise((resolve, reject) => { // This code depends on `wav` npm library. const writer = new wav.Writer({ channels, sampleRate: rate, bitDepth: sampleWidth * 8, }); let bufs = [] as any[]; writer.on('error', reject); writer.on('data', function (d) { bufs.push(d); }); writer.on('end', function () { resolve(Buffer.concat(bufs).toString('base64')); }); writer.write(pcmData); writer.end(); }); } ``` #### Multi-speaker Audio Generation You can generate audio with multiple speakers, each with their own voice: ```ts const response = await ai.generate({ model: googleAI.model('gemini-2.5-flash-preview-tts'), config: { responseModalities: ['AUDIO'], speechConfig: { multiSpeakerVoiceConfig: { speakerVoiceConfigs: [ { speaker: 'Speaker1', voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Algenib' }, }, }, { speaker: 'Speaker2', voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Achernar' }, }, }, ], }, }, }, prompt: `Here's the dialog: Speaker1: "Genkit is an amazing Gen AI library!" Speaker2: "I thought it was a framework."`, }); ``` When using multi-speaker configuration, the model automatically detects speaker labels in the text (like "Speaker1:" and "Speaker2:") and applies the corresponding voice to each speaker's lines. #### Configuration Options The Gemini TTS models support various configuration options: ##### Voice Selection You can choose from different pre-built voices with unique characteristics: ```ts speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Algenib' // Other options: 'Achernar', 'Ankaa', etc. }, }, } ``` Full list of available voices: - `Zephyr`: Bright - `Puck`: Upbeat - `Charon`: Informative - `Kore`: Firm - `Fenrir`: Excitable - `Leda`: Youthful - `Orus`: Firm - `Aoede`: Breezy - `Callirrhoe`: Easy-going - `Autonoe`: Bright - `Enceladus`: Breathy - `Iapetus`: Clear - `Umbriel`: Easy-going - `Algieba`: Smooth - `Despina`: Smooth - `Erinome`: Clear - `Algenib`: Gravelly - `Rasalgethi`: Informative - `Laomedeia`: Upbeat - `Achernar`: Soft - `Alnilam`: Firm - `Schedar`: Even - `Gacrux`: Mature - `Pulcherrima`: Forward - `Achird`: Friendly - `Zubenelgenubi`: Casual - `Vindemiatrix`: Gentle - `Sadachbia`: Lively - `Sadaltager`: Knowledgeable - `Sulafat`: Warm ##### Speech Emphasis You can use markdown-style formatting in your prompt to add emphasis: - Bold text (`**like this**`) for stronger emphasis - Italic text (`*like this*`) for moderate emphasis Example: ```ts prompt: 'Genkit is an **amazing** Gen AI *library*!'; ``` ##### Advanced Speech Parameters For more control over the generated speech: ```ts speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Algenib', speakingRate: 1.0, // Range: 0.25 to 4.0, default is 1.0 pitch: 0.0, // Range: -20.0 to 20.0, default is 0.0 volumeGainDb: 0.0, // Range: -96.0 to 16.0, default is 0.0 }, }, } ``` - `speakingRate`: Controls the speed of speech (higher values = faster speech) - `pitch`: Adjusts the pitch of the voice (higher values = higher pitch) - `volumeGainDb`: Controls the volume (higher values = louder) For more detailed information about the Gemini TTS models and their configuration options, see the [Google AI Speech Generation documentation](https://ai.google.dev/gemini-api/docs/speech-generation). ## Next Steps - Learn about [generating content](/docs/models) to understand how to use these models effectively - Explore [creating flows](/docs/flows) to build structured AI workflows - To use the Gemini API at enterprise scale or leverage Vertex vector search and Model Garden, see the [Vertex AI plugin](/docs/integrations/vertex-ai)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/firebase/genkit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

generating-speech.prompt•5.15 KiB