Audio Transcriber MCP Server

transcribe_audio

Convert audio files to text using OpenAI Whisper API, supporting multiple languages and optional file saving for transcription results.

Instructions

Transcribe an audio file using OpenAI Whisper API

Input Schema

TableJSON Schema

Name	Required	Description
`filepath`	Yes	Absolute path to the audio file
`language`	No	Language of the audio in ISO-639-1 format (e.g. "en", "es"). Default is "en".
`save_to_file`	No	Whether to save the transcription to a file next to the audio file

Implementation Reference

src/index.ts:101-202 (handler)

Main execution handler for the 'transcribe_audio' tool. Validates input, reads audio file, transcribes using OpenAI Whisper API, optionally saves transcription to file, and returns the result.

this.server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name !== 'transcribe_audio') {
    throw new McpError(
      ErrorCode.MethodNotFound,
      `Unknown tool: ${request.params.name}`
    );
  }
  
  if (!isValidTranscribeArgs(request.params.arguments)) {
    throw new McpError(
      ErrorCode.InvalidParams,
      'Invalid transcribe arguments'
    );
  }
  
  let fileStream = null;
  
  try {
    const { filepath, save_to_file, language = "en" } = request.params.arguments;
    
    // Normalize and decode path properly
    const decodedPath = decodeURIComponent(filepath.replace(/\\/g, '').trim());
    
    console.error(`[DEBUG] Requested file path: ${decodedPath}`);
    
    // Verify file exists
    if (!fs.existsSync(decodedPath)) {
      throw new Error(`Audio file not found: ${decodedPath}`);
    }
    
    // Check if file is readable
    try {
      await promisify(fs.access)(decodedPath, fs.constants.R_OK);
    } catch (err) {
      throw new Error(`Audio file not readable: ${decodedPath}`);
    }
    
    console.error(`[DEBUG] File exists and is readable: ${decodedPath}`);
    
    // Create transcription
    console.error(`[DEBUG] Sending transcription request to OpenAI API`);
    fileStream = fs.createReadStream(decodedPath);
    
    const response = await openai.audio.transcriptions.create({
      file: fileStream,
      model: OPENAI_MODEL,
      language: language
    });
    
    // Close the file stream immediately after use
    fileStream.destroy();
    fileStream = null;
    
    const transcription = response.text;
    console.error(`[DEBUG] Transcription completed successfully`);
    
    // Handle save_to_file parameter
    const shouldSaveToFile = typeof save_to_file === 'string'
      ? save_to_file.toLowerCase() === 'true'
      : Boolean(save_to_file);
      
    if (shouldSaveToFile) {
      const audioDir = path.dirname(decodedPath);
      const audioName = path.basename(decodedPath, path.extname(decodedPath));
      const transcriptionPath = path.join(audioDir, `${audioName}.txt`);
      
      console.error(`[DEBUG] Saving transcription to: ${transcriptionPath}`);
      await promisify(fs.writeFile)(transcriptionPath, transcription);
      console.error(`[DEBUG] File saved successfully`);
    }
    
    return {
      content: [
        {
          type: 'text',
          text: transcription,
        },
      ],
    };
  } catch (error: any) {
    console.error('[ERROR] Transcription failed:', error);
    return {
      content: [
        {
          type: 'text',
          text: `Error transcribing audio: ${error?.message || String(error)}`,
        },
      ],
      isError: true,
    };
  } finally {
    // Ensure file stream is closed even if there's an error
    if (fileStream) {
      try {
        fileStream.destroy();
        console.error("[DEBUG] File stream closed");
      } catch (err) {
        console.error("[ERROR] Failed to close file stream:", err);
      }
    }
  }
});

src/index.ts:74-99 (registration)

Tool registration in the ListToolsRequestSchema handler, defining the name, description, and input schema for 'transcribe_audio'.

this.server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: 'transcribe_audio',
      description: 'Transcribe an audio file using OpenAI Whisper API',
      inputSchema: {
        type: 'object',
        properties: {
          filepath: {
            type: 'string',
            description: 'Absolute path to the audio file',
          },
          save_to_file: {
            type: 'boolean',
            description: 'Whether to save the transcription to a file next to the audio file',
          },
          language: {
            type: 'string',
            description: 'Language of the audio in ISO-639-1 format (e.g. "en", "es"). Default is "en".',
          },
        },
        required: ['filepath'],
      },
    },
  ],
}));

src/index.ts:34-37 (schema)
TypeScript interface defining the input arguments for the transcribe_audio tool.
```
interface TranscribeArgs {
  filepath: string;
  save_to_file?: boolean | string;
  language?: string;
```

src/index.ts:40-47 (helper)

Helper function to validate the input arguments for the transcribe_audio tool against the TranscribeArgs interface.

const isValidTranscribeArgs = (args: any): args is TranscribeArgs =>
  typeof args === 'object' &&
  args !== null &&
  typeof args.filepath === 'string' &&
  (args.save_to_file === undefined || 
   typeof args.save_to_file === 'boolean' || 
   typeof args.save_to_file === 'string') &&
  (args.language === undefined || typeof args.language === 'string');

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the API but doesn't disclose key behavioral traits: whether it's read-only or mutative, error handling, rate limits, authentication needs, or what happens with the 'save_to_file' option. The description is minimal and misses critical operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste. It's front-loaded with the core purpose and implementation detail. Every word earns its place, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete. It doesn't explain what the tool returns (e.g., transcription text, file path), error conditions, or behavioral details. For a tool with 3 parameters and potential side effects (saving files), more context is needed for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents parameters. The description adds no additional meaning beyond implying audio file processing. It doesn't explain parameter interactions or provide examples, so it meets the baseline but doesn't enhance understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Transcribe') and resource ('an audio file'), specifying the implementation method ('using OpenAI Whisper API'). It's specific enough to understand the core function, though without sibling tools, differentiation isn't applicable. The purpose is unambiguous but could be slightly more detailed about output format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or typical use cases. It mentions the API but doesn't explain limitations or ideal scenarios. With no sibling tools, this is less critical, but still lacks context for effective agent decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

transcribe_audioC

Related Tools

mcp_openai_transcribeC
@bigdata-coss/agent_mcp
kobold_transcribe
@PhialsBasement/KoboldCPP-MCP-Server
mcp_openai_ttsA
@bigdata-coss/agent_mcp
diarize_speech
@WhissleAI/whissle-mcp
gemini-transcribe-audio
@Garblesnarff/gemini-mcp-server
speech_to_text
@WhissleAI/whissle-mcp

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ichigo3766/audio-transcriber-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server