Skip to main content
Glama

respondAudio

Generate and play audio responses from text prompts using customizable voice options within the MCPollinations Multimodal MCP Server.

Instructions

Generate an audio response to a text prompt and play it through the system

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesThe text prompt to respond to with audio
voiceNoVoice to use for audio generation (default: "alloy"). Available options: "alloy", "echo", "fable", "onyx", "nova", "shimmer", "coral", "verse", "ballad", "ash", "sage", "amuch", "dan"
seedNoSeed for reproducible results (default: random)

Implementation Reference

  • Core respondAudio function: fetches audio from Pollinations API using text-to-speech endpoint, converts to base64, returns data and metadata.
    export async function respondAudio(prompt, voice = "alloy", seed, voiceInstructions, authConfig = null) {
      if (!prompt || typeof prompt !== 'string') {
        throw new Error('Prompt is required and must be a string');
      }
    
      // Build the query parameters
      const queryParams = new URLSearchParams();
      queryParams.append('model', 'openai-audio'); // Required for audio generation
      queryParams.append('voice', voice);
      if (seed !== undefined) queryParams.append('seed', seed);
    
      // Construct the URL
      let finalPrompt = prompt;
    
      // Add voice instructions if provided
      if (voiceInstructions) {
        finalPrompt = `${voiceInstructions}\n\n${prompt}`;
      }
    
      const encodedPrompt = encodeURIComponent(finalPrompt);
      const baseUrl = 'https://text.pollinations.ai';
      let url = `${baseUrl}/${encodedPrompt}`;
    
      // Add query parameters
      const queryString = queryParams.toString();
      url += `?${queryString}`;
    
      try {
        // Prepare fetch options with optional auth headers
        const fetchOptions = {};
        if (authConfig) {
          fetchOptions.headers = {};
          if (authConfig.token) {
            fetchOptions.headers['Authorization'] = `Bearer ${authConfig.token}`;
          }
          if (authConfig.referrer) {
            fetchOptions.headers['Referer'] = authConfig.referrer;
          }
        }
    
        // Fetch the audio from the URL
        const response = await fetch(url, fetchOptions);
    
        if (!response.ok) {
          throw new Error(`Failed to generate audio: ${response.statusText}`);
        }
    
        // Get the audio data as an ArrayBuffer
        const audioBuffer = await response.arrayBuffer();
    
        // Convert the ArrayBuffer to a base64 string
        const base64Data = Buffer.from(audioBuffer).toString('base64');
    
        // Determine the mime type from the response headers or default to audio/mpeg
        const contentType = response.headers.get('content-type') || 'audio/mpeg';
    
        return {
          data: base64Data,
          mimeType: contentType,
          metadata: {
            prompt,
            voice,
            model: 'openai-audio',
            seed,
            voiceInstructions
          }
        };
      } catch (error) {
        log('Error generating audio:', error);
        throw error;
      }
    }
  • MCP server CallToolRequest handler for respondAudio: calls core function, saves/plays audio file, returns metadata response.
    } else if (name === 'respondAudio') {
      try {
        const { prompt, voice = defaultConfig.audio.voice, seed, voiceInstructions } = args;
        const result = await respondAudio(prompt, voice, seed, voiceInstructions, finalAuthConfig);
    
        // Save audio to a temporary file
        const tempDir = os.tmpdir();
        const tempFilePath = path.join(tempDir, `pollinations-audio-${Date.now()}.mp3`);
    
        // Decode base64 and write to file
        fs.writeFileSync(tempFilePath, Buffer.from(result.data, 'base64'));
    
        // Play the audio file
        audioPlayer.play(tempFilePath, (err) => {
          if (err) log('Error playing audio:', err);
    
          // Clean up the temporary file after playing
          try {
            fs.unlinkSync(tempFilePath);
          } catch (cleanupErr) {
            log('Error cleaning up temp file:', cleanupErr);
          }
        });
    
        return {
          content: [
            {
              type: 'text',
              text: `Audio has been played.\n\nAudio metadata: ${JSON.stringify(result.metadata, null, 2)}`
            }
          ]
        };
      } catch (error) {
        return {
          content: [
            { type: 'text', text: `Error generating audio: ${error.message}` }
          ],
          isError: true
        };
      }
    } else if (name === 'listImageModels') {
  • Input schema for respondAudio tool defining parameters: prompt (required), voice, seed.
    export const respondAudioSchema = {
      name: 'respondAudio',
      description: 'Generate an audio response to a text prompt and play it through the system',
      inputSchema: {
        type: 'object',
        properties: {
          prompt: {
            type: 'string',
            description: 'The text prompt to respond to with audio'
          },
          voice: {
            type: 'string',
            description: 'Voice to use for audio generation (default: "alloy"). Available options: "alloy", "echo", "fable", "onyx", "nova", "shimmer", "coral", "verse", "ballad", "ash", "sage", "amuch", "dan"'
          },
          seed: {
            type: 'number',
            description: 'Seed for reproducible results (default: random)'
          }
        },
        required: ['prompt']
      }
    };
  • MCP server registration of all tools including respondAudio via getAllToolSchemas() in ListToolsRequestHandler.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({
      tools: getAllToolSchemas()
    }));
  • src/index.js:9-23 (registration)
    Import and re-export of respondAudio function for use across the codebase.
    import { respondAudio, listAudioVoices } from './services/audioService.js';
    import { respondText, listTextModels } from './services/textService.js';
    
    
    // Export all service functions
    export {
      // Image services
      generateImageUrl,
      generateImage,
      editImage,
      generateImageFromReference,
      listImageModels,
    
      // Audio services
      respondAudio,
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral details. It states the tool generates and plays audio, but doesn't disclose latency, audio format, duration limits, system requirements, or error conditions. For a tool with no annotations, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's function without unnecessary words. It's front-loaded with the core action and resource, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and three parameters, the description is insufficiently complete. It doesn't explain what the audio output entails (e.g., format, length), potential side effects like system audio playback, or how errors might manifest. For a generative tool with behavioral implications, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all three parameters (prompt, voice, seed). The description adds no parameter-specific information beyond what's in the schema, maintaining the baseline score of 3 for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Generate an audio response') and the resource ('to a text prompt'), specifying it will 'play it through the system'. It distinguishes from sibling tools like respondText (text vs audio) and listAudioVoices (list vs generate), but doesn't explicitly contrast with all siblings like image tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives is provided. The description doesn't mention when to choose respondAudio over respondText for responses, or how it relates to other audio/image tools. Usage context is implied but not stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pinkpixel-dev/MCPollinations'

If you have feedback or need assistance with the MCP directory API, please join our Discord server