image_recognition
Analyzes and describes images using Google Gemini AI by processing file paths and custom prompts, enabling detailed content recognition for diverse use cases.
Instructions
Analyze and describe images using Google Gemini AI
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filepath | Yes | Path to the media file to analyze | |
| modelname | No | Gemini model to use for recognition | gemini-2.0-flash |
| prompt | No | Custom prompt for the recognition | Describe this content |
Implementation Reference
- src/tools/image-recognition.ts:21-87 (handler)The main handler function (callback) that implements the image_recognition tool logic. It validates the image file, uploads it using GeminiService, processes it with a prompt, and returns the result or error.callback: async (args: ImageRecognitionParams): Promise<CallToolResult> => { try { log.info(`Processing image recognition request for file: ${args.filepath}`); log.verbose('Image recognition request', JSON.stringify(args)); // Verify file exists if (!fs.existsSync(args.filepath)) { throw new Error(`Image file not found: ${args.filepath}`); } // Verify file is an image const ext = path.extname(args.filepath).toLowerCase(); if (!['.jpg', '.jpeg', '.png', '.webp'].includes(ext)) { throw new Error(`Unsupported image format: ${ext}. Supported formats are: .jpg, .jpeg, .png, .webp`); } // Default prompt if not provided const prompt = args.prompt || 'Describe this image'; const modelName = args.modelname || 'gemini-2.0-flash'; // Upload the file log.info('Uploading image file...'); const file = await geminiService.uploadFile(args.filepath); // Process with Gemini log.info('Generating content from image...'); const result = await geminiService.processFile(file, prompt, modelName); if (result.isError) { log.error(`Error in image recognition: ${result.text}`); return { content: [ { type: 'text', text: result.text } ], isError: true }; } log.info('Image recognition completed successfully'); log.verbose('Image recognition result', JSON.stringify(result)); return { content: [ { type: 'text', text: result.text } ] }; } catch (error) { log.error('Error in image recognition tool', error); const errorMessage = error instanceof Error ? error.message : String(error); return { content: [ { type: 'text', text: `Error processing image: ${errorMessage}` } ], isError: true }; } }
- src/types/index.ts:28-29 (schema)Zod schema for input parameters of the image_recognition tool. Extends the common RecognitionParamsSchema with filepath, optional prompt, and modelname.export const ImageRecognitionParamsSchema = RecognitionParamsSchema.extend({}); export type ImageRecognitionParams = z.infer<typeof ImageRecognitionParamsSchema>;
- src/server.ts:58-62 (registration)Registration of the image_recognition tool with the MCP server using mcpServer.tool() method, passing name, description, inputSchema, and callback.this.mcpServer.tool( imageRecognitionTool.name, imageRecognitionTool.description, imageRecognitionTool.inputSchema.shape, imageRecognitionTool.callback
- src/server.ts:53-53 (registration)Creation of the imageRecognitionTool instance using createImageRecognitionTool factory function before registration.const imageRecognitionTool = createImageRecognitionTool(this.geminiService);
- src/tools/image-recognition.ts:16-89 (handler)Factory function that creates the full tool definition object for image_recognition, including name, description, schema, and handler callback.export const createImageRecognitionTool = (geminiService: GeminiService) => { return { name: 'image_recognition', description: 'Analyze and describe images using Google Gemini AI', inputSchema: ImageRecognitionParamsSchema, callback: async (args: ImageRecognitionParams): Promise<CallToolResult> => { try { log.info(`Processing image recognition request for file: ${args.filepath}`); log.verbose('Image recognition request', JSON.stringify(args)); // Verify file exists if (!fs.existsSync(args.filepath)) { throw new Error(`Image file not found: ${args.filepath}`); } // Verify file is an image const ext = path.extname(args.filepath).toLowerCase(); if (!['.jpg', '.jpeg', '.png', '.webp'].includes(ext)) { throw new Error(`Unsupported image format: ${ext}. Supported formats are: .jpg, .jpeg, .png, .webp`); } // Default prompt if not provided const prompt = args.prompt || 'Describe this image'; const modelName = args.modelname || 'gemini-2.0-flash'; // Upload the file log.info('Uploading image file...'); const file = await geminiService.uploadFile(args.filepath); // Process with Gemini log.info('Generating content from image...'); const result = await geminiService.processFile(file, prompt, modelName); if (result.isError) { log.error(`Error in image recognition: ${result.text}`); return { content: [ { type: 'text', text: result.text } ], isError: true }; } log.info('Image recognition completed successfully'); log.verbose('Image recognition result', JSON.stringify(result)); return { content: [ { type: 'text', text: result.text } ] }; } catch (error) { log.error('Error in image recognition tool', error); const errorMessage = error instanceof Error ? error.message : String(error); return { content: [ { type: 'text', text: `Error processing image: ${errorMessage}` } ], isError: true }; } } }; };