sayText

sayText

Convert text into spoken audio using customizable voices and formats for accessible content creation.

Instructions

Generate speech that says the provided text verbatim

Input Schema

TableJSON Schema

Name	Required	Description
`text`	Yes	The text to speak verbatim
`voice`	No	Voice to use for audio generation (default: "alloy")
`format`	No	Format of the audio (mp3, wav, etc.)
`voiceInstructions`	No	Additional instructions for voice character/style (e.g., "Speak with enthusiasm" or "Use a calm tone")

Implementation Reference

src/services/audioService.js:126-208 (handler)

The handler function that executes the sayText tool: generates verbatim text-to-speech audio using the Pollinations API, supports voice, format, instructions, and optional playback.

async function sayText(params) {
    const {
        text,
        voice = "alloy",
        format = "mp3",
        voiceInstructions,
        audioPlayer,
        tempDir,
    } = params;

    if (!text || typeof text !== "string") {
        throw new Error("Text is required and must be a string");
    }

    // Prepare the query parameters
    const queryParams = {
        model: "openai-audio",
        voice,
        format,
    };

    // Prepare the prompt with the verbatim instruction
    let finalPrompt = `Say verbatim: ${text}`;

    // Add voice instructions if provided
    if (voiceInstructions) {
        finalPrompt = `${voiceInstructions}\n\n${finalPrompt}`;
    }

    // Build the URL using the utility function
    const url = buildUrl(
        AUDIO_API_BASE_URL,
        encodeURIComponent(finalPrompt),
        queryParams,
    );

    try {
        // Fetch the audio from the URL
        const response = await fetch(url);

        if (!response.ok) {
            throw new Error(
                `Failed to generate speech: ${response.statusText}`,
            );
        }

        // Get the audio data as an ArrayBuffer
        const audioBuffer = await response.arrayBuffer();

        // Convert the ArrayBuffer to a base64 string
        const base64Data = Buffer.from(audioBuffer).toString("base64");

        // Determine the mime type from the format
        const mimeType = `audio/${format === "mp3" ? "mpeg" : format}`;

        // Play the audio if an audio player is provided
        if (audioPlayer) {
            const tempDirPath = tempDir || os.tmpdir();
            await playAudio(
                base64Data,
                mimeType,
                "say_text",
                audioPlayer,
                tempDirPath,
            );
        }

        // Return the response in MCP format
        return createMCPResponse([
            {
                type: "audio",
                data: base64Data,
                mimeType,
            },
            createTextContent(
                `Generated audio for text: "${text}"\n\nVoice: ${voice}\nFormat: ${format}`,
            ),
        ]);
    } catch (error) {
        console.error("Error generating audio:", error);
        throw error;
    }
}

src/services/audioService.js:359-378 (schema)

Input schema using Zod for validating parameters of the sayText tool: text (required), voice, format, voiceInstructions (optional).

{
    text: z.string().describe("The text to speak verbatim"),
    voice: z
        .string()
        .optional()
        .describe(
            'Voice to use for audio generation (default: "alloy")',
        ),
    format: z
        .string()
        .optional()
        .describe("Format of the audio (mp3, wav, etc.)"),
    voiceInstructions: z
        .string()
        .optional()
        .describe(
            'Additional instructions for voice character/style (e.g., "Speak with enthusiasm" or "Use a calm tone")',
        ),
},
sayText,

src/services/audioService.js:356-379 (registration)

Registration entry for the sayText tool in the audioTools export array, formatted for MCP server.tool() calls.

[
    "sayText",
    "Generate speech that says the provided text verbatim",
    {
        text: z.string().describe("The text to speak verbatim"),
        voice: z
            .string()
            .optional()
            .describe(
                'Voice to use for audio generation (default: "alloy")',
            ),
        format: z
            .string()
            .optional()
            .describe("Format of the audio (mp3, wav, etc.)"),
        voiceInstructions: z
            .string()
            .optional()
            .describe(
                'Additional instructions for voice character/style (e.g., "Speak with enthusiasm" or "Use a calm tone")',
            ),
    },
    sayText,
],

Pollinations Multimodal MCP Server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API