Skip to main content
Glama

image_recognition

Analyzes and describes images using Google Gemini AI by processing file paths and custom prompts, enabling detailed content recognition for diverse use cases.

Instructions

Analyze and describe images using Google Gemini AI

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filepathYesPath to the media file to analyze
modelnameNoGemini model to use for recognitiongemini-2.0-flash
promptNoCustom prompt for the recognitionDescribe this content

Implementation Reference

  • The main handler function (callback) that implements the image_recognition tool logic. It validates the image file, uploads it using GeminiService, processes it with a prompt, and returns the result or error.
    callback: async (args: ImageRecognitionParams): Promise<CallToolResult> => {
      try {
        log.info(`Processing image recognition request for file: ${args.filepath}`);
        log.verbose('Image recognition request', JSON.stringify(args));
        
        // Verify file exists
        if (!fs.existsSync(args.filepath)) {
          throw new Error(`Image file not found: ${args.filepath}`);
        }
        
        // Verify file is an image
        const ext = path.extname(args.filepath).toLowerCase();
        if (!['.jpg', '.jpeg', '.png', '.webp'].includes(ext)) {
          throw new Error(`Unsupported image format: ${ext}. Supported formats are: .jpg, .jpeg, .png, .webp`);
        }
        
        // Default prompt if not provided
        const prompt = args.prompt || 'Describe this image';
        const modelName = args.modelname || 'gemini-2.0-flash';
        
        // Upload the file
        log.info('Uploading image file...');
        const file = await geminiService.uploadFile(args.filepath);
        
        // Process with Gemini
        log.info('Generating content from image...');
        const result = await geminiService.processFile(file, prompt, modelName);
        
        if (result.isError) {
          log.error(`Error in image recognition: ${result.text}`);
          return {
            content: [
              {
                type: 'text',
                text: result.text
              }
            ],
            isError: true
          };
        }
        
        log.info('Image recognition completed successfully');
        log.verbose('Image recognition result', JSON.stringify(result));
        
        return {
          content: [
            {
              type: 'text',
              text: result.text
            }
          ]
        };
      } catch (error) {
        log.error('Error in image recognition tool', error);
        const errorMessage = error instanceof Error ? error.message : String(error);
        
        return {
          content: [
            {
              type: 'text',
              text: `Error processing image: ${errorMessage}`
            }
          ],
          isError: true
        };
      }
    }
  • Zod schema for input parameters of the image_recognition tool. Extends the common RecognitionParamsSchema with filepath, optional prompt, and modelname.
    export const ImageRecognitionParamsSchema = RecognitionParamsSchema.extend({});
    export type ImageRecognitionParams = z.infer<typeof ImageRecognitionParamsSchema>;
  • src/server.ts:58-62 (registration)
    Registration of the image_recognition tool with the MCP server using mcpServer.tool() method, passing name, description, inputSchema, and callback.
    this.mcpServer.tool(
      imageRecognitionTool.name,
      imageRecognitionTool.description,
      imageRecognitionTool.inputSchema.shape,
      imageRecognitionTool.callback
  • src/server.ts:53-53 (registration)
    Creation of the imageRecognitionTool instance using createImageRecognitionTool factory function before registration.
    const imageRecognitionTool = createImageRecognitionTool(this.geminiService);
  • Factory function that creates the full tool definition object for image_recognition, including name, description, schema, and handler callback.
    export const createImageRecognitionTool = (geminiService: GeminiService) => {
      return {
        name: 'image_recognition',
        description: 'Analyze and describe images using Google Gemini AI',
        inputSchema: ImageRecognitionParamsSchema,
        callback: async (args: ImageRecognitionParams): Promise<CallToolResult> => {
          try {
            log.info(`Processing image recognition request for file: ${args.filepath}`);
            log.verbose('Image recognition request', JSON.stringify(args));
            
            // Verify file exists
            if (!fs.existsSync(args.filepath)) {
              throw new Error(`Image file not found: ${args.filepath}`);
            }
            
            // Verify file is an image
            const ext = path.extname(args.filepath).toLowerCase();
            if (!['.jpg', '.jpeg', '.png', '.webp'].includes(ext)) {
              throw new Error(`Unsupported image format: ${ext}. Supported formats are: .jpg, .jpeg, .png, .webp`);
            }
            
            // Default prompt if not provided
            const prompt = args.prompt || 'Describe this image';
            const modelName = args.modelname || 'gemini-2.0-flash';
            
            // Upload the file
            log.info('Uploading image file...');
            const file = await geminiService.uploadFile(args.filepath);
            
            // Process with Gemini
            log.info('Generating content from image...');
            const result = await geminiService.processFile(file, prompt, modelName);
            
            if (result.isError) {
              log.error(`Error in image recognition: ${result.text}`);
              return {
                content: [
                  {
                    type: 'text',
                    text: result.text
                  }
                ],
                isError: true
              };
            }
            
            log.info('Image recognition completed successfully');
            log.verbose('Image recognition result', JSON.stringify(result));
            
            return {
              content: [
                {
                  type: 'text',
                  text: result.text
                }
              ]
            };
          } catch (error) {
            log.error('Error in image recognition tool', error);
            const errorMessage = error instanceof Error ? error.message : String(error);
            
            return {
              content: [
                {
                  type: 'text',
                  text: `Error processing image: ${errorMessage}`
                }
              ],
              isError: true
            };
          }
        }
      };
    };
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'analyze and describe' implies a read-only operation, it doesn't specify whether this requires API keys, has rate limits, handles errors, or what the output format looks like. For a tool with no annotations and no output schema, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise - a single sentence that directly states the tool's purpose without any unnecessary words. It's front-loaded with the core functionality and uses efficient language. Every word earns its place in this minimal description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there are no annotations, no output schema, and this is an AI analysis tool with potential behavioral complexities, the description is insufficiently complete. It doesn't explain what kind of analysis or description will be returned, doesn't mention authentication requirements for Google Gemini AI, and provides no context about limitations or error handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (filepath, modelname, prompt) with clear descriptions. The tool description adds no additional parameter semantics beyond what's in the schema. According to the rules, when schema coverage is high (>80%), the baseline score is 3 even with no param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Analyze and describe images using Google Gemini AI'. It specifies the verb ('analyze and describe'), resource ('images'), and technology ('Google Gemini AI'), making the purpose unambiguous. However, it doesn't explicitly differentiate from sibling tools like audio_recognition or video_recognition, which would require mentioning it's specifically for images.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools (audio_recognition, video_recognition) or any context for choosing this specific image analysis tool over others. There's no information about prerequisites, limitations, or typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mario-andreschak/mcp_video_recognition'

If you have feedback or need assistance with the MCP directory API, please join our Discord server