Skip to main content
Glama

respondAudio

Generate audio responses from text prompts and play them through your system. Choose from multiple voices and set a seed for reproducible results.

Instructions

Generate an audio response to a text prompt and play it through the system

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesThe text prompt to respond to with audio
voiceNoVoice to use for audio generation (default: "alloy"). Available options: "alloy", "echo", "fable", "onyx", "nova", "shimmer", "coral", "verse", "ballad", "ash", "sage", "amuch", "dan"
seedNoSeed for reproducible results (default: random)

Implementation Reference

  • The main handler function for respondAudio. It takes a text prompt, voice, seed, and voice instructions, builds a URL to the Pollinations Text API, fetches audio data, converts it to base64, and returns the audio data with mime type and metadata.
    export async function respondAudio(prompt, voice = "alloy", seed, voiceInstructions, authConfig = null) {
      if (!prompt || typeof prompt !== 'string') {
        throw new Error('Prompt is required and must be a string');
      }
    
      // Build the query parameters
      const queryParams = new URLSearchParams();
      queryParams.append('model', 'openai-audio'); // Required for audio generation
      queryParams.append('voice', voice);
      if (seed !== undefined) queryParams.append('seed', seed);
    
      // Construct the URL
      let finalPrompt = prompt;
    
      // Add voice instructions if provided
      if (voiceInstructions) {
        finalPrompt = `${voiceInstructions}\n\n${prompt}`;
      }
    
      const encodedPrompt = encodeURIComponent(finalPrompt);
      const baseUrl = 'https://text.pollinations.ai';
      let url = `${baseUrl}/${encodedPrompt}`;
    
      // Add query parameters
      const queryString = queryParams.toString();
      url += `?${queryString}`;
    
      try {
        // Prepare fetch options with optional auth headers
        const fetchOptions = {};
        if (authConfig) {
          fetchOptions.headers = {};
          if (authConfig.token) {
            fetchOptions.headers['Authorization'] = `Bearer ${authConfig.token}`;
          }
          if (authConfig.referrer) {
            fetchOptions.headers['Referer'] = authConfig.referrer;
          }
        }
    
        // Fetch the audio from the URL
        const response = await fetch(url, fetchOptions);
    
        if (!response.ok) {
          throw new Error(`Failed to generate audio: ${response.statusText}`);
        }
    
        // Get the audio data as an ArrayBuffer
        const audioBuffer = await response.arrayBuffer();
    
        // Convert the ArrayBuffer to a base64 string
        const base64Data = Buffer.from(audioBuffer).toString('base64');
    
        // Determine the mime type from the response headers or default to audio/mpeg
        const contentType = response.headers.get('content-type') || 'audio/mpeg';
    
        return {
          data: base64Data,
          mimeType: contentType,
          metadata: {
            prompt,
            voice,
            model: 'openai-audio',
            seed,
            voiceInstructions
          }
        };
      } catch (error) {
        log('Error generating audio:', error);
        throw error;
      }
    }
  • The input schema definition for respondAudio, specifying 'prompt' (required string), 'voice' (optional string), and 'seed' (optional number).
    export const respondAudioSchema = {
      name: 'respondAudio',
      description: 'Generate an audio response to a text prompt and play it through the system',
      inputSchema: {
        type: 'object',
        properties: {
          prompt: {
            type: 'string',
            description: 'The text prompt to respond to with audio'
          },
          voice: {
            type: 'string',
            description: 'Voice to use for audio generation (default: "alloy"). Available options: "alloy", "echo", "fable", "onyx", "nova", "shimmer", "coral", "verse", "ballad", "ash", "sage", "amuch", "dan"'
          },
          seed: {
            type: 'number',
            description: 'Seed for reproducible results (default: random)'
          }
        },
        required: ['prompt']
      }
    };
  • The MCP server registration/handler for respondAudio. Extracts args (prompt, voice, seed, voiceInstructions), calls the respondAudio function, saves the resulting audio to a temp file, plays it via the system audio player, and returns metadata as text content.
    } else if (name === 'respondAudio') {
      try {
        const { prompt, voice = defaultConfig.audio.voice, seed, voiceInstructions } = args;
        const result = await respondAudio(prompt, voice, seed, voiceInstructions, finalAuthConfig);
    
        // Save audio to a temporary file
        const tempDir = os.tmpdir();
        const tempFilePath = path.join(tempDir, `pollinations-audio-${Date.now()}.mp3`);
    
        // Decode base64 and write to file
        fs.writeFileSync(tempFilePath, Buffer.from(result.data, 'base64'));
    
        // Play the audio file
        audioPlayer.play(tempFilePath, (err) => {
          if (err) log('Error playing audio:', err);
    
          // Clean up the temporary file after playing
          try {
            fs.unlinkSync(tempFilePath);
          } catch (cleanupErr) {
            log('Error cleaning up temp file:', cleanupErr);
          }
        });
    
        return {
          content: [
            {
              type: 'text',
              text: `Audio has been played.\n\nAudio metadata: ${JSON.stringify(result.metadata, null, 2)}`
            }
          ]
        };
      } catch (error) {
        return {
          content: [
            { type: 'text', text: `Error generating audio: ${error.message}` }
          ],
          isError: true
        };
      }
  • Central schema re-export: imports and re-exports respondAudioSchema for use by the MCP server.
    import { generateImageUrlSchema, generateImageSchema, editImageSchema, generateImageFromReferenceSchema, listImageModelsSchema } from './services/imageSchema.js';
    import { respondAudioSchema, listAudioVoicesSchema } from './services/audioSchema.js';
    import { respondTextSchema, listTextModelsSchema } from './services/textSchema.js';
    
    
    // Re-export all schemas
    export {
      // Image schemas
      generateImageUrlSchema,
      generateImageSchema,
      editImageSchema,
      generateImageFromReferenceSchema,
      listImageModelsSchema,
    
      // Audio schemas
      respondAudioSchema,
      listAudioVoicesSchema,
    
      // Text schemas
      respondTextSchema,
      listTextModelsSchema
    };
  • Central service re-export: imports and re-exports the respondAudio function from audioService.js for consumption by the MCP server.
    import { generateImageUrl, generateImage, editImage, generateImageFromReference, listImageModels } from './services/imageService.js';
    import { respondAudio, listAudioVoices } from './services/audioService.js';
    import { respondText, listTextModels } from './services/textService.js';
    
    
    // Export all service functions
    export {
      // Image services
      generateImageUrl,
      generateImage,
      editImage,
      generateImageFromReference,
      listImageModels,
    
      // Audio services
      respondAudio,
      listAudioVoices,
    
      // Text services
      respondText,
      listTextModels,
    };
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions playing audio, but does not disclose potential side effects, system requirements, or resource implications. For a generative tool, more behavioral context is needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is both concise and front-loaded with the key action 'Generate'. No unnecessary details, achieving maximum conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (audio generation and playback), the description is too sparse. It omits information about return values, output format, and potential limitations. No output schema and no annotations further reduce completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all three parameters (prompt, voice, seed). The tool description does not add any additional meaning beyond what the schema already provides, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Generate an audio response' to a 'text prompt', and mentions playing it through the system. It distinguishes from the sibling 'respondText' tool by specifying audio output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'listAudioVoices' or 'respondText'. The description lacks context on prerequisites or scenarios where audio generation is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pinkpixel-dev/MCPollinations'

If you have feedback or need assistance with the MCP directory API, please join our Discord server