audio_recognition
Analyze and transcribe audio files using Google Gemini AI. Provide a filepath and optional prompts or models for accurate content recognition and transcription.
Instructions
Analyze and transcribe audio using Google Gemini AI
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filepath | Yes | Path to the media file to analyze | |
| modelname | No | Gemini model to use for recognition | gemini-2.0-flash |
| prompt | No | Custom prompt for the recognition | Describe this content |
Implementation Reference
- src/tools/audio-recognition.ts:21-87 (handler)The callback function implementing the audio_recognition tool's core logic: file validation, upload to Gemini service, processing with optional prompt and model, error handling, and returning structured CallToolResult.callback: async (args: AudioRecognitionParams): Promise<CallToolResult> => { try { log.info(`Processing audio recognition request for file: ${args.filepath}`); log.verbose('Audio recognition request', JSON.stringify(args)); // Verify file exists if (!fs.existsSync(args.filepath)) { throw new Error(`Audio file not found: ${args.filepath}`); } // Verify file is an audio const ext = path.extname(args.filepath).toLowerCase(); if (!['.mp3', '.wav', '.ogg'].includes(ext)) { throw new Error(`Unsupported audio format: ${ext}. Supported formats are: .mp3, .wav, .ogg`); } // Default prompt if not provided const prompt = args.prompt || 'Describe this audio'; const modelName = args.modelname || 'gemini-2.0-flash'; // Upload the file log.info('Uploading audio file...'); const file = await geminiService.uploadFile(args.filepath); // Process with Gemini log.info('Generating content from audio...'); const result = await geminiService.processFile(file, prompt, modelName); if (result.isError) { log.error(`Error in audio recognition: ${result.text}`); return { content: [ { type: 'text', text: result.text } ], isError: true }; } log.info('Audio recognition completed successfully'); log.verbose('Audio recognition result', JSON.stringify(result)); return { content: [ { type: 'text', text: result.text } ] }; } catch (error) { log.error('Error in audio recognition tool', error); const errorMessage = error instanceof Error ? error.message : String(error); return { content: [ { type: 'text', text: `Error processing audio: ${errorMessage}` } ], isError: true }; } }
- src/types/index.ts:11-35 (schema)Defines the input schema for audio_recognition tool using Zod: common RecognitionParamsSchema extended to AudioRecognitionParamsSchema with filepath, optional prompt, and modelname.export const RecognitionParamsSchema = z.object({ filepath: z.string().describe('Path to the media file to analyze'), prompt: z.string().default('Describe this content').describe('Custom prompt for the recognition'), modelname: z.string().default('gemini-2.0-flash').describe('Gemini model to use for recognition') }); export type RecognitionParams = z.infer<typeof RecognitionParamsSchema>; /** * Video recognition specific types */ export const VideoRecognitionParamsSchema = RecognitionParamsSchema.extend({}); export type VideoRecognitionParams = z.infer<typeof VideoRecognitionParamsSchema>; /** * Image recognition specific types */ export const ImageRecognitionParamsSchema = RecognitionParamsSchema.extend({}); export type ImageRecognitionParams = z.infer<typeof ImageRecognitionParamsSchema>; /** * Audio recognition specific types */ export const AudioRecognitionParamsSchema = RecognitionParamsSchema.extend({}); export type AudioRecognitionParams = z.infer<typeof AudioRecognitionParamsSchema>;
- src/server.ts:54-70 (registration)Creates the audio_recognition tool instance and registers it with the MCP server using mcpServer.tool().const audioRecognitionTool = createAudioRecognitionTool(this.geminiService); const videoRecognitionTool = createVideoRecognitionTool(this.geminiService); // Register tools with MCP server this.mcpServer.tool( imageRecognitionTool.name, imageRecognitionTool.description, imageRecognitionTool.inputSchema.shape, imageRecognitionTool.callback ); this.mcpServer.tool( audioRecognitionTool.name, audioRecognitionTool.description, audioRecognitionTool.inputSchema.shape, audioRecognitionTool.callback );
- src/tools/audio-recognition.ts:16-89 (helper)Factory function that creates the tool definition object with name, description, schema, and handler callback for audio_recognition.export const createAudioRecognitionTool = (geminiService: GeminiService) => { return { name: 'audio_recognition', description: 'Analyze and transcribe audio using Google Gemini AI', inputSchema: AudioRecognitionParamsSchema, callback: async (args: AudioRecognitionParams): Promise<CallToolResult> => { try { log.info(`Processing audio recognition request for file: ${args.filepath}`); log.verbose('Audio recognition request', JSON.stringify(args)); // Verify file exists if (!fs.existsSync(args.filepath)) { throw new Error(`Audio file not found: ${args.filepath}`); } // Verify file is an audio const ext = path.extname(args.filepath).toLowerCase(); if (!['.mp3', '.wav', '.ogg'].includes(ext)) { throw new Error(`Unsupported audio format: ${ext}. Supported formats are: .mp3, .wav, .ogg`); } // Default prompt if not provided const prompt = args.prompt || 'Describe this audio'; const modelName = args.modelname || 'gemini-2.0-flash'; // Upload the file log.info('Uploading audio file...'); const file = await geminiService.uploadFile(args.filepath); // Process with Gemini log.info('Generating content from audio...'); const result = await geminiService.processFile(file, prompt, modelName); if (result.isError) { log.error(`Error in audio recognition: ${result.text}`); return { content: [ { type: 'text', text: result.text } ], isError: true }; } log.info('Audio recognition completed successfully'); log.verbose('Audio recognition result', JSON.stringify(result)); return { content: [ { type: 'text', text: result.text } ] }; } catch (error) { log.error('Error in audio recognition tool', error); const errorMessage = error instanceof Error ? error.message : String(error); return { content: [ { type: 'text', text: `Error processing audio: ${errorMessage}` } ], isError: true }; } } }; };