transcribe_audio
Convert audio files to text using OpenAI Whisper API, supporting multiple languages and optional file saving for transcription results.
Instructions
Transcribe an audio file using OpenAI Whisper API
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filepath | Yes | Absolute path to the audio file | |
| language | No | Language of the audio in ISO-639-1 format (e.g. "en", "es"). Default is "en". | |
| save_to_file | No | Whether to save the transcription to a file next to the audio file |
Implementation Reference
- src/index.ts:101-202 (handler)Main execution handler for the 'transcribe_audio' tool. Validates input, reads audio file, transcribes using OpenAI Whisper API, optionally saves transcription to file, and returns the result.this.server.setRequestHandler(CallToolRequestSchema, async (request) => { if (request.params.name !== 'transcribe_audio') { throw new McpError( ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}` ); } if (!isValidTranscribeArgs(request.params.arguments)) { throw new McpError( ErrorCode.InvalidParams, 'Invalid transcribe arguments' ); } let fileStream = null; try { const { filepath, save_to_file, language = "en" } = request.params.arguments; // Normalize and decode path properly const decodedPath = decodeURIComponent(filepath.replace(/\\/g, '').trim()); console.error(`[DEBUG] Requested file path: ${decodedPath}`); // Verify file exists if (!fs.existsSync(decodedPath)) { throw new Error(`Audio file not found: ${decodedPath}`); } // Check if file is readable try { await promisify(fs.access)(decodedPath, fs.constants.R_OK); } catch (err) { throw new Error(`Audio file not readable: ${decodedPath}`); } console.error(`[DEBUG] File exists and is readable: ${decodedPath}`); // Create transcription console.error(`[DEBUG] Sending transcription request to OpenAI API`); fileStream = fs.createReadStream(decodedPath); const response = await openai.audio.transcriptions.create({ file: fileStream, model: OPENAI_MODEL, language: language }); // Close the file stream immediately after use fileStream.destroy(); fileStream = null; const transcription = response.text; console.error(`[DEBUG] Transcription completed successfully`); // Handle save_to_file parameter const shouldSaveToFile = typeof save_to_file === 'string' ? save_to_file.toLowerCase() === 'true' : Boolean(save_to_file); if (shouldSaveToFile) { const audioDir = path.dirname(decodedPath); const audioName = path.basename(decodedPath, path.extname(decodedPath)); const transcriptionPath = path.join(audioDir, `${audioName}.txt`); console.error(`[DEBUG] Saving transcription to: ${transcriptionPath}`); await promisify(fs.writeFile)(transcriptionPath, transcription); console.error(`[DEBUG] File saved successfully`); } return { content: [ { type: 'text', text: transcription, }, ], }; } catch (error: any) { console.error('[ERROR] Transcription failed:', error); return { content: [ { type: 'text', text: `Error transcribing audio: ${error?.message || String(error)}`, }, ], isError: true, }; } finally { // Ensure file stream is closed even if there's an error if (fileStream) { try { fileStream.destroy(); console.error("[DEBUG] File stream closed"); } catch (err) { console.error("[ERROR] Failed to close file stream:", err); } } } });
- src/index.ts:74-99 (registration)Tool registration in the ListToolsRequestSchema handler, defining the name, description, and input schema for 'transcribe_audio'.this.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ { name: 'transcribe_audio', description: 'Transcribe an audio file using OpenAI Whisper API', inputSchema: { type: 'object', properties: { filepath: { type: 'string', description: 'Absolute path to the audio file', }, save_to_file: { type: 'boolean', description: 'Whether to save the transcription to a file next to the audio file', }, language: { type: 'string', description: 'Language of the audio in ISO-639-1 format (e.g. "en", "es"). Default is "en".', }, }, required: ['filepath'], }, }, ], }));
- src/index.ts:34-37 (schema)TypeScript interface defining the input arguments for the transcribe_audio tool.interface TranscribeArgs { filepath: string; save_to_file?: boolean | string; language?: string;
- src/index.ts:40-47 (helper)Helper function to validate the input arguments for the transcribe_audio tool against the TranscribeArgs interface.const isValidTranscribeArgs = (args: any): args is TranscribeArgs => typeof args === 'object' && args !== null && typeof args.filepath === 'string' && (args.save_to_file === undefined || typeof args.save_to_file === 'boolean' || typeof args.save_to_file === 'string') && (args.language === undefined || typeof args.language === 'string');