mcp_openai_transcribe
Convert speech to text using OpenAI Whisper API. Transcribe audio files with customizable options like model, language, and prompt.
Instructions
OpenAI Whisper API를 사용하여 음성을 텍스트로 변환합니다. 변환된 텍스트를 반환합니다.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audioPath | Yes | 변환할 오디오 파일 경로 | |
| language | No | 오디오 언어 (예: ko, en, ja) | |
| model | No | 사용할 모델 (예: whisper-1) | |
| prompt | No | 인식을 도울 힌트 텍스트 |
Input Schema (JSON Schema)
{
"properties": {
"audioPath": {
"description": "변환할 오디오 파일 경로",
"type": "string"
},
"language": {
"description": "오디오 언어 (예: ko, en, ja)",
"type": "string"
},
"model": {
"description": "사용할 모델 (예: whisper-1)",
"type": "string"
},
"prompt": {
"description": "인식을 도울 힌트 텍스트",
"type": "string"
}
},
"required": [
"audioPath"
],
"type": "object"
}
Implementation Reference
- src/tools/index.ts:742-759 (handler)The inline handler function for the mcp_openai_transcribe tool. It delegates to openaiService.speechToText and handles the response or error.async handler(args: any): Promise<ToolResponse> { try { const result = await openaiService.speechToText(args); return { content: [{ type: 'text', text: result }] }; } catch (error) { return { content: [{ type: 'text', text: `OpenAI Whisper 오류: ${error instanceof Error ? error.message : String(error)}` }] }; } }
- src/tools/index.ts:720-741 (schema)The input schema defining parameters for the transcription tool: audioPath (required), model, language, prompt.inputSchema: { type: 'object', properties: { audioPath: { type: 'string', description: '변환할 오디오 파일 경로' }, model: { type: 'string', description: '사용할 모델 (예: whisper-1)' }, language: { type: 'string', description: '오디오 언어 (예: ko, en, ja)' }, prompt: { type: 'string', description: '인식을 도울 힌트 텍스트' } }, required: ['audioPath'] },
- src/index.ts:24-54 (registration)MCP server capabilities registration declaring mcp_openai_transcribe as available (true). The server uses imported tools array to handle calls.capabilities: { tools: { mcp_sparql_execute_query: true, mcp_sparql_update: true, mcp_sparql_list_repositories: true, mcp_sparql_list_graphs: true, mcp_sparql_get_resource_info: true, mcp_ollama_run: true, mcp_ollama_show: true, mcp_ollama_pull: true, mcp_ollama_list: true, mcp_ollama_rm: true, mcp_ollama_chat_completion: true, mcp_ollama_status: true, mcp_http_request: true, mcp_openai_chat: true, mcp_openai_image: true, mcp_openai_tts: true, mcp_openai_transcribe: true, mcp_openai_embedding: true, mcp_gemini_generate_text: true, mcp_gemini_chat_completion: true, mcp_gemini_list_models: true, mcp_gemini_generate_images: false, mcp_gemini_generate_image: false, mcp_gemini_generate_videos: false, mcp_gemini_generate_multimodal_content: false, mcp_imagen_generate: false, mcp_gemini_create_image: false, mcp_gemini_edit_image: false },
- The core speechToText method in OpenAIService that performs the actual OpenAI Whisper API call for audio transcription using multipart form data.async speechToText(args: { audioPath: string; model?: string; language?: string; prompt?: string; }): Promise<string> { try { if (!OPENAI_API_KEY) { throw new McpError( ErrorCode.InternalError, 'OPENAI_API_KEY가 설정되지 않았습니다.' ); } if (!fs.existsSync(args.audioPath)) { throw new McpError( ErrorCode.InternalError, `오디오 파일을 찾을 수 없습니다: ${args.audioPath}` ); } const formData = new FormData(); const fileBlob = new Blob([fs.readFileSync(args.audioPath)]); formData.append('file', fileBlob, path.basename(args.audioPath)); formData.append('model', args.model || 'whisper-1'); if (args.language) { formData.append('language', args.language); } if (args.prompt) { formData.append('prompt', args.prompt); } const response = await axios.post( `${OPENAI_API_BASE}/audio/transcriptions`, formData, { headers: { 'Content-Type': 'multipart/form-data', 'Authorization': `Bearer ${OPENAI_API_KEY}` } } ); return JSON.stringify(response.data, null, 2); } catch (error) { if (axios.isAxiosError(error)) { const statusCode = error.response?.status; const responseData = error.response?.data; throw new McpError( ErrorCode.InternalError, `OpenAI API 오류 (${statusCode}): ${ typeof responseData === 'object' ? JSON.stringify(responseData, null, 2) : responseData || error.message }` ); } throw new McpError(ErrorCode.InternalError, `음성 인식 요청 실패: ${formatError(error)}`); } }