Gemini MCP Server

analyze_image

Analyze images using Gemini vision capabilities to answer questions or follow instructions about visual content.

Instructions

Analyze images using Gemini vision capabilities

Input Schema

TableJSON Schema

Name	Required	Description	Default
`prompt`	Yes	Question or instruction about the image
`imageUrl`	No	URL of the image to analyze
`imageBase64`	No	Base64-encoded image data (alternative to URL)
`model`	No	Vision-capable Gemini model	gemini-2.5-flash

Implementation Reference

src/enhanced-stdio-server.ts:619-696 (handler)

The core handler function that executes the analyze_image tool. Processes input arguments, handles image data (URL or base64), calls the Gemini vision API, and returns the analysis result as an MCP response.

private async analyzeImage(id: any, args: any): Promise<MCPResponse> {
  try {
    const model = args.model || 'gemini-2.5-flash';

    // Validate inputs
    if (!args.imageUrl && !args.imageBase64) {
      throw new Error('Either imageUrl or imageBase64 must be provided');
    }

    // Prepare image part
    let imagePart: any;
    if (args.imageUrl) {
      // For URL, we'd need to fetch and convert to base64
      // For now, we'll just pass the URL as instruction
      imagePart = {
        text: `[Image URL: ${args.imageUrl}]`
      };
    } else if (args.imageBase64) {
      // Log base64 data size for debugging
      console.error(`Image base64 length: ${args.imageBase64.length}`);
      
      // Extract MIME type and data
      const matches = args.imageBase64.match(/^data:(.+);base64,(.+)$/);
      if (matches) {
        console.error(`MIME type: ${matches[1]}, Data length: ${matches[2].length}`);
        imagePart = {
          inlineData: {
            mimeType: matches[1],
            data: matches[2]
          }
        };
      } else {
        // If no data URI format, assume raw base64
        console.error('Raw base64 data detected');
        imagePart = {
          inlineData: {
            mimeType: 'image/jpeg',
            data: args.imageBase64
          }
        };
      }
    }

    const result = await this.genAI.models.generateContent({
      model,
      contents: [{
        parts: [
          { text: args.prompt },
          imagePart
        ],
        role: 'user'
      }]
    });

    const text = result.text || '';

    return {
      jsonrpc: '2.0',
      id,
      result: {
        content: [{
          type: 'text',
          text: text
        }]
      }
    };
  } catch (error) {
    console.error('Error in analyzeImage:', error);
    return {
      jsonrpc: '2.0',
      id,
      error: {
        code: -32603,
        message: `Image analysis failed: ${error instanceof Error ? error.message : 'Unknown error'}`
      }
    };
  }
}

src/enhanced-stdio-server.ts:276-303 (schema)

Input schema definition for the analyze_image tool, specifying parameters like prompt, imageUrl or imageBase64 (one required), and model.

inputSchema: {
  type: 'object',
  properties: {
    prompt: {
      type: 'string',
      description: 'Question or instruction about the image'
    },
    imageUrl: {
      type: 'string',
      description: 'URL of the image to analyze'
    },
    imageBase64: {
      type: 'string',
      description: 'Base64-encoded image data (alternative to URL)'
    },
    model: {
      type: 'string',
      description: 'Vision-capable Gemini model',
      enum: ['gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.0-flash'],
      default: 'gemini-2.5-flash'
    }
  },
  required: ['prompt'],
  oneOf: [
    { required: ['imageUrl'] },
    { required: ['imageBase64'] }
  ]
}

src/enhanced-stdio-server.ts:273-304 (registration)

Tool registration entry returned by tools/list, defining the name, description, and full input schema for analyze_image.

{
  name: 'analyze_image',
  description: 'Analyze images using Gemini vision capabilities',
  inputSchema: {
    type: 'object',
    properties: {
      prompt: {
        type: 'string',
        description: 'Question or instruction about the image'
      },
      imageUrl: {
        type: 'string',
        description: 'URL of the image to analyze'
      },
      imageBase64: {
        type: 'string',
        description: 'Base64-encoded image data (alternative to URL)'
      },
      model: {
        type: 'string',
        description: 'Vision-capable Gemini model',
        enum: ['gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.0-flash'],
        default: 'gemini-2.5-flash'
      }
    },
    required: ['prompt'],
    oneOf: [
      { required: ['imageUrl'] },
      { required: ['imageBase64'] }
    ]
  }
},

src/enhanced-stdio-server.ts:477-478 (registration)
Dispatch logic in handleToolCall method that maps 'analyze_image' tool calls to the analyzeImage handler.
```
case 'analyze_image':
  return await this.analyzeImage(request.id, args);
```

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'Gemini vision capabilities' but doesn't detail what this entails—such as rate limits, authentication needs, output format, or potential costs. For a tool with no annotations, this leaves significant gaps in understanding how it behaves beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded with a single, clear sentence: 'Analyze images using Gemini vision capabilities.' There is no wasted verbiage, and it efficiently communicates the core purpose without unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a vision analysis tool with no annotations and no output schema, the description is incomplete. It lacks information on behavioral traits, output format, error handling, or integration context. While the schema covers inputs well, the overall context for an AI agent to use this tool effectively is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters. The description adds no additional meaning beyond what's in the schema, such as explaining the interplay between prompt and image inputs or model selection nuances. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but also doesn't detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Analyze images using Gemini vision capabilities.' It specifies the action (analyze) and resource (images) with the technology context (Gemini vision). However, it doesn't explicitly differentiate from sibling tools like generate_text or embed_text, which might also process text or have different vision-related functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like generate_text (which might handle text generation) or list_models (which could list available models), nor does it specify contexts or exclusions for image analysis. Usage is implied but not explicitly defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aliargun/mcp-server-gemini'

If you have feedback or need assistance with the MCP directory API, please join our Discord server