Skip to main content
Glama
hoangdn3

OpenRouter MCP Multimodal Server

by hoangdn3

mcp_openrouter_analyze_audio

Transcribe audio files from URLs or local paths to extract text content. Supports wav and mp3 formats using OpenRouter models for accurate speech-to-text conversion.

Instructions

Transcribe audio files and provide raw content. Supports wav/mp3 files from CDN URLs or local paths.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_urlYesPath or URL to the audio file (supports CDN URLs, local file paths, wav/mp3 formats)
modelNoOpenRouter model to use (e.g., "mistralai/voxtral-small-24b-2507", "openai/gpt-4o-audio-preview")

Implementation Reference

  • Core handler function for the mcp_openrouter_analyze_audio tool. Fetches audio from URL or local path, encodes it to base64, detects format (wav/mp3), constructs OpenAI chat completion request with input_audio, calls OpenRouter models (with fallbacks to backup and free models), and returns JSON-structured analysis result or error.
    export async function handleAnalyzeAudio(
      request: { params: { arguments: AnalyzeAudioToolRequest } },
      openai: OpenAI,
      defaultModel?: string
    ): Promise<any> {
      try {
        const args = request.params.arguments;
    
        // Validate input
        if (!args.audio_url) {
          throw new McpError(
            ErrorCode.InvalidParams,
            'audio_url parameter is required'
          );
        }
    
        // Fetch audio
        const buffer = await fetchAudio(args.audio_url);
        const format = detectAudioFormat(buffer, args.audio_url);
        const base64 = buffer.toString('base64');
    
        // Build content array
        const content: Array<{
          type: string;
          text?: string;
          input_audio?: {
            data: string;
            format: string;
          };
        }> = [];
    
        // Add fixed transcription instruction
        content.push({
          type: 'text',
          text: 'Please transcribe and provide me the raw content of this audio.'
        });
    
        // Add audio
        content.push({
          type: 'input_audio',
          input_audio: {
            data: base64,
            format: format
          }
        });
    
        // Select model
        let model = args.model || defaultModel || DEFAULT_AUDIO_MODEL;
        console.error(`[Audio Tool] Using AUDIO model: ${model}`);
    
        // Try primary model first
        try {
          const completion = await openai.chat.completions.create({
            model,
            messages: [{
              role: 'user',
              content
            }] as any
          });
    
          const response = completion as any;
          return {
            content: [
              {
                type: 'text',
                text: JSON.stringify({
                  id: response.id,
                  analysis: completion.choices[0].message.content || '',
                  model: response.model,
                  usage: response.usage
                }),
              },
            ],
          };
        } catch (primaryError: any) {
          // Try backup model
          const backupModel = process.env.OPENROUTER_DEFAULT_MODEL_AUDIO_BACKUP;
          if (backupModel && backupModel !== model) {
            try {
              const completion = await openai.chat.completions.create({
                model: backupModel,
                messages: [{
                  role: 'user',
                  content
                }] as any
              });
    
              const resp = completion as any;
              return {
                content: [
                  {
                    type: 'text',
                    text: JSON.stringify({
                      id: resp.id,
                      analysis: completion.choices[0].message.content || '',
                      model: resp.model,
                      usage: resp.usage
                    }),
                  },
                ],
              };
            } catch (backupError: any) {
              // Try free audio model
              const freeModel = await findSuitableFreeAudioModel(openai);
              if (freeModel && freeModel !== model && freeModel !== backupModel) {
                const completion = await openai.chat.completions.create({
                  model: freeModel,
                  messages: [{
                    role: 'user',
                    content
                  }] as any
                });
    
                const resp = completion as any;
                return {
                  content: [
                    {
                      type: 'text',
                      text: JSON.stringify({
                        id: resp.id,
                        analysis: completion.choices[0].message.content || '',
                        model: resp.model,
                        usage: resp.usage
                      }),
                    },
                  ],
                };
              } else {
                throw backupError;
              }
            }
          } else {
            // No backup, try free model directly
            const freeModel = await findSuitableFreeAudioModel(openai);
            if (freeModel && freeModel !== model) {
              const completion = await openai.chat.completions.create({
                model: freeModel,
                messages: [{
                  role: 'user',
                  content
                }] as any
              });
    
              const resp = completion as any;
              return {
                content: [
                  {
                    type: 'text',
                    text: JSON.stringify({
                      id: resp.id,
                      analysis: completion.choices[0].message.content || '',
                      model: resp.model,
                      usage: resp.usage
                    }),
                  },
                ],
              };
            } else {
              throw primaryError;
            }
          }
        }
      } catch (error) {
        if (error instanceof McpError) {
          throw error;
        }
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                error: error instanceof Error ? error.message : String(error),
                model: request.params.arguments.model || defaultModel || DEFAULT_AUDIO_MODEL,
                usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 }
              }),
            },
          ],
          isError: true,
        };
      }
    }
  • Registers the mcp_openrouter_analyze_audio tool in the MCP server's listTools handler, defining its name, description, and input schema.
    {
      name: 'mcp_openrouter_analyze_audio',
      description: 'Transcribe audio files and provide raw content. Supports wav/mp3 files from CDN URLs or local paths.',
      inputSchema: {
        type: 'object',
        properties: {
          audio_url: {
            type: 'string',
            description: 'Path or URL to the audio file (supports CDN URLs, local file paths, wav/mp3 formats)',
          },
          model: {
            type: 'string',
            description: 'OpenRouter model to use (e.g., "mistralai/voxtral-small-24b-2507", "openai/gpt-4o-audio-preview")',
          },
        },
        required: ['audio_url'],
      },
    },
  • Switch case in CallToolRequestSchema handler that routes calls to 'mcp_openrouter_analyze_audio' to the handleAnalyzeAudio function.
    case 'mcp_openrouter_analyze_audio':
      return handleAnalyzeAudio({
        params: {
          arguments: request.params.arguments as unknown as AnalyzeAudioToolRequest
        }
      }, this.openai, this.defaultAudioModel);
  • TypeScript interface defining the input parameters for the analyze audio tool.
    export interface AnalyzeAudioToolRequest {
      audio_url: string;
      model?: string;
    }
  • Helper function to fetch audio data from remote URL or local file path and return as Buffer.
    async function fetchAudio(audioPath: string): Promise<Buffer> {
      if (isUrl(audioPath)) {
        // Fetch from URL
        const response = await fetch(audioPath);
        if (!response.ok) {
          throw new Error(`Failed to fetch audio: ${response.statusText}`);
        }
        const arrayBuffer = await response.arrayBuffer();
        return Buffer.from(arrayBuffer);
      } else {
        // Read from local file
        const normalizedPath = normalizePath(audioPath);
        let resolvedPath = normalizedPath;
    
        if (!path.isAbsolute(resolvedPath)) {
          resolvedPath = path.resolve(process.cwd(), resolvedPath);
        }
    
        return await fs.readFile(resolvedPath);
      }
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hoangdn3/mcp-ocr-fallback'

If you have feedback or need assistance with the MCP directory API, please join our Discord server