Skip to main content
Glama
gurveeer

MCP Server Gemini

by gurveeer

analyze_image

Analyze images with Gemini AI to answer questions about visual content, identify objects, or extract information from photos using vision capabilities.

Instructions

Analyze images using Gemini vision capabilities

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesQuestion or instruction about the image
imageUrlNoURL of the image to analyze
imageBase64NoBase64-encoded image data (alternative to URL)
modelNoVision-capable Gemini modelgemini-2.5-flash

Implementation Reference

  • Primary implementation of the analyze_image tool handler. Processes arguments, handles image data (URL or base64), calls Gemini vision API, and returns MCP response.
    private async analyzeImage(id: any, args: any): Promise<MCPResponse> {
      try {
        const model = args.model || 'gemini-2.5-flash';
    
        // Validate inputs
        if (!args.imageUrl && !args.imageBase64) {
          throw new Error('Either imageUrl or imageBase64 must be provided');
        }
    
        // Prepare image part
        let imagePart: any;
        if (args.imageUrl) {
          // For URL, we'd need to fetch and convert to base64
          // For now, we'll just pass the URL as instruction
          imagePart = {
            text: `[Image URL: ${args.imageUrl}]`
          };
        } else if (args.imageBase64) {
          // Log base64 data size for debugging
          console.error(`Image base64 length: ${args.imageBase64.length}`);
    
          // Extract MIME type and data
          const matches = args.imageBase64.match(/^data:(.+);base64,(.+)$/);
          if (matches) {
            console.error(`MIME type: ${matches[1]}, Data length: ${matches[2].length}`);
            imagePart = {
              inlineData: {
                mimeType: matches[1],
                data: matches[2]
              }
            };
          } else {
            // If no data URI format, assume raw base64
            console.error('Raw base64 data detected');
            imagePart = {
              inlineData: {
                mimeType: 'image/jpeg',
                data: args.imageBase64
              }
            };
          }
        }
    
        const result = await this.genAI.models.generateContent({
          model,
          contents: [
            {
              parts: [{ text: args.prompt }, imagePart],
              role: 'user'
            }
          ]
        });
    
        const text = result.text || '';
    
        return {
          jsonrpc: '2.0',
          id,
          result: {
            content: [
              {
                type: 'text',
                text
              }
            ]
          }
        };
      } catch (error) {
        console.error('Error in analyzeImage:', error);
        return {
          jsonrpc: '2.0',
          id,
          error: {
            code: -32603,
            message: `Image analysis failed: ${error instanceof Error ? error.message : 'Unknown error'}`
          }
        };
      }
  • Zod schema for validating analyze_image tool inputs in ToolSchemas
    analyzeImage: z
      .object({
        prompt: z.string().min(1, 'Prompt is required'),
        imageUrl: CommonSchemas.imageUrl.optional(),
        imageBase64: CommonSchemas.base64Image.optional(),
        model: z.enum(['gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.0-flash']).optional()
      })
      .refine(
        data => data.imageUrl || data.imageBase64,
        'Either imageUrl or imageBase64 must be provided'
      ),
  • Tool registration entry returned by tools/list, including description and input schema
    {
      name: 'analyze_image',
      description: 'Analyze images using Gemini vision capabilities',
      inputSchema: {
        type: 'object',
        properties: {
          prompt: {
            type: 'string',
            description: 'Question or instruction about the image'
          },
          imageUrl: {
            type: 'string',
            description: 'URL of the image to analyze'
          },
          imageBase64: {
            type: 'string',
            description: 'Base64-encoded image data (alternative to URL)'
          },
          model: {
            type: 'string',
            description: 'Vision-capable Gemini model',
            enum: ['gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.0-flash'],
            default: 'gemini-2.5-flash'
          }
        },
        required: ['prompt']
      }
    },
  • Dispatch case in handleToolCall that routes analyze_image calls to the handler method.
    case 'analyze_image':
      return await this.analyzeImage(request.id, args);
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'analyze images' but doesn't describe what the analysis entails (e.g., object detection, captioning, classification), potential limitations (e.g., image size, format constraints), or response format. This is a significant gap for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste. It's front-loaded with the core purpose and uses clear language, making it easy to parse quickly without unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of image analysis (a non-trivial operation), no annotations, and no output schema, the description is incomplete. It lacks details on behavioral traits, output format, and usage context, which are crucial for an AI agent to invoke this tool effectively in practice.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 4 parameters thoroughly. The description adds no additional meaning beyond what's in the schema (e.g., it doesn't explain parameter interactions like choosing between 'imageUrl' and 'imageBase64'). Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('analyze images') and the capability used ('Gemini vision capabilities'), providing a specific verb+resource combination. However, it doesn't explicitly differentiate this tool from its siblings like 'generate_text' or 'embed_text', which might also handle text generation or embeddings potentially involving images, so it misses full sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention any context for choosing this over other sibling tools (e.g., 'generate_text' for text generation or 'list_models' for model information), nor does it specify prerequisites or exclusions, leaving usage ambiguous.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gurveeer/mcp-server-gemini-pro'

If you have feedback or need assistance with the MCP directory API, please join our Discord server