Skip to main content
Glama
Garblesnarff

Gemini MCP Server for Claude Desktop

gemini-analyze-image

Analyze images using Gemini's vision capabilities to extract summaries, identify objects, read text, or provide detailed insights based on user preferences and context.

Instructions

Analyze images using Gemini's multimodal vision capabilities (with learned user preferences)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPath to the image file to analyze (supports JPEG, PNG, WebP, HEIC, HEIF, BMP, GIF)
analysis_typeNoType of analysis to perform: "summary", "objects", "text", "detailed", or "custom"
contextNoOptional context for intelligent enhancement (e.g., "medical", "architectural", "nature")

Implementation Reference

  • The core handler function that executes the image analysis using Gemini's vision model. Validates inputs, loads and encodes the image, generates tailored prompts based on analysis_type and context, enhances with intelligence system if available, calls the Gemini service, handles learning, and returns formatted response.
    async execute(args) {
      const filePath = validateNonEmptyString(args.file_path, 'file_path');
      const analysisType = args.analysis_type ? validateString(args.analysis_type, 'analysis_type', ['summary', 'objects', 'text', 'detailed', 'custom']) : 'summary';
      const context = args.context ? validateString(args.context, 'context') : null;
    
      log(`Analyzing image file: "${filePath}" with analysis type: "${analysisType}" and context: ${context || 'general'}`, this.name);
    
      try {
        validateFileSize(filePath, config.MAX_IMAGE_SIZE_MB);
        const imageBuffer = readFileAsBuffer(filePath);
        const imageBase64 = imageBuffer.toString('base64');
        const mimeType = getMimeType(filePath, config.SUPPORTED_IMAGE_MIMES);
    
        log(`Image file loaded: ${(imageBuffer.length / 1024).toFixed(2)}KB, MIME type: ${mimeType}`, this.name);
    
        let baseAnalysisPrompt;
        switch (analysisType) {
          case 'summary':
            baseAnalysisPrompt = 'Please provide a comprehensive summary of this image. Describe what you see, including objects, people, settings, colors, composition, and overall content.'; // eslint-disable-line max-len
            break;
          case 'objects':
            baseAnalysisPrompt = 'Please identify and describe all objects, people, text, and visual elements visible in this image. List them systematically with their locations and characteristics.'; // eslint-disable-line max-len
            break;
          case 'text':
            baseAnalysisPrompt = 'Please extract and transcribe all text visible in this image. Include any signs, labels, captions, or written content you can read.'; // eslint-disable-line max-len
            break;
          case 'detailed':
            baseAnalysisPrompt = 'Please provide a detailed analysis of this image including: visual description, objects and people present, text content, colors and composition, mood or atmosphere, and any notable details or artistic elements.'; // eslint-disable-line max-len
            break;
          case 'custom':
            baseAnalysisPrompt = context || 'Please analyze this image and describe what you observe.';
            break;
          default:
            baseAnalysisPrompt = 'Please provide a summary of this image content.';
        }
    
        let enhancedAnalysisPrompt = baseAnalysisPrompt;
        if (this.intelligenceSystem.initialized) {
          try {
            enhancedAnalysisPrompt = await this.intelligenceSystem.enhancePrompt(baseAnalysisPrompt, context, this.name);
            log('Applied Tool Intelligence enhancement', this.name);
          } catch (err) {
            log(`Tool Intelligence enhancement failed: ${err.message}`, this.name);
          }
        }
    
        let analysisPrompt = enhancedAnalysisPrompt;
        if (context && analysisType !== 'custom') {
          analysisPrompt += ` Additional context: ${context}`;
        }
    
        const analysisText = await this.geminiService.analyzeImage('IMAGE_ANALYSIS', analysisPrompt, imageBase64, mimeType);
    
        if (analysisText) {
          log('Image analysis completed successfully', this.name);
    
          if (this.intelligenceSystem.initialized) {
            try {
              const resultSummary = `Image analysis completed successfully: ${analysisText.length} characters, type: ${analysisType}`; // eslint-disable-line max-len
              await this.intelligenceSystem.learnFromInteraction(baseAnalysisPrompt, enhancedAnalysisPrompt, resultSummary, context, this.name);
              log('Tool Intelligence learned from interaction', this.name);
            } catch (err) {
              log(`Tool Intelligence learning failed: ${err.message}`, this.name);
            }
          }
    
          let finalResponse = `✓ Image file analyzed successfully:\n\n**File:** ${filePath}\n**Size:** ${(imageBuffer.length / 1024).toFixed(2)}KB\n**Format:** ${filePath.split('.').pop().toUpperCase()}\n**Analysis Type:** ${analysisType}\n\n**Analysis:**\n${analysisText}`; // eslint-disable-line max-len
          if (context && this.intelligenceSystem.initialized) {
            finalResponse += `\n\n---\n_Enhancement applied based on context: ${context}_`;
          }
    
          return {
            content: [
              {
                type: 'text',
                text: finalResponse,
              },
            ],
          };
        }
        log('No analysis text generated', this.name);
        return {
          content: [
            {
              type: 'text',
              text: `Could not analyze image file: "${filePath}". The image may be corrupted, too complex, or in an unsupported format.`,
            },
          ],
        };
      } catch (error) {
        log(`Error analyzing image: ${error.message}`, this.name);
        throw new Error(`Error analyzing image: ${error.message}`);
      }
    }
  • JSON schema defining the input parameters for the gemini-analyze-image tool, including file_path (required), analysis_type (enum), and optional context.
    {
      type: 'object',
      properties: {
        file_path: {
          type: 'string',
          description: 'Path to the image file to analyze (supports JPEG, PNG, WebP, HEIC, HEIF, BMP, GIF)',
        },
        analysis_type: {
          type: 'string',
          description: 'Type of analysis to perform: "summary", "objects", "text", "detailed", or "custom"',
          enum: ['summary', 'objects', 'text', 'detailed', 'custom'],
        },
        context: {
          type: 'string',
          description: 'Optional context for intelligent enhancement (e.g., "medical", "architectural", "nature")',
        },
      },
      required: ['file_path'],
    },
  • Registers the ImageAnalysisTool instance (named 'gemini-analyze-image') with the tool registry by calling registerTool.
    registerTool(new ImageAnalysisTool(intelligenceSystem, geminiService));
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'multimodal vision capabilities' and 'learned user preferences' but doesn't explain what these mean operationally. It doesn't disclose whether this is a read-only operation, what permissions are needed, rate limits, error conditions, or what the output format looks like. For a tool with no annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that states the core functionality. The parenthetical about 'learned user preferences' adds some context without being verbose. However, the phrase 'learned user preferences' is somewhat vague and could be more precisely explained to earn full marks.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of image analysis with multiple parameter options and no output schema, the description is insufficiently complete. It doesn't explain what different analysis types produce, how 'learned user preferences' affect results, or what format the analysis returns. For a tool with 3 parameters (including an enum with 5 options) and no annotations, more contextual information is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no parameter-specific information beyond what's already in the schema. Since schema description coverage is 100%, the baseline score is 3. The description doesn't explain the meaning of 'learned user preferences' in relation to parameters, nor does it provide additional context about parameter interactions or usage examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Analyze images using Gemini's multimodal vision capabilities' with the specific verb 'analyze' and resource 'images'. It distinguishes from siblings like 'gemini-edit-image' (editing) and 'gemini-analyze-video' (video analysis) by focusing on image analysis. However, it doesn't explicitly differentiate from 'gemini-advanced-image' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions 'learned user preferences' but doesn't explain how this affects tool selection. There's no mention of when to choose this over 'gemini-advanced-image', 'gemini-analyze-video', or other sibling tools, nor any prerequisites or exclusions for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Garblesnarff/gemini-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server