Grok MCP Plugin

image_understanding

Analyze images using Grok AI vision capabilities to extract information, answer questions, and understand visual content based on text prompts.

Instructions

Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)

Input Schema

TableJSON Schema

Name	Required	Description	Default
`base64_image`	No	Base64-encoded image data (without the data:image prefix)
`image_url`	No	URL of the image to analyze
`model`	No	Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)	grok-2-vision-latest
`prompt`	Yes	Text prompt to accompany the image

Implementation Reference

src/index.ts:260-327 (handler)

The primary handler function for the image_understanding tool. It validates inputs (prompt required, image_url or base64_image), constructs a user message with image content and text prompt, calls the Grok API client for image understanding, and formats the response.

private async handleImageUnderstanding(args: any) {
  console.error('[Tool] Handling image_understanding tool call');
  
  const { image_url, base64_image, prompt, model, ...otherOptions } = args;
  
  // Validate inputs
  if (!prompt) {
    throw new Error('Prompt is required');
  }
  
  if (!image_url && !base64_image) {
    throw new Error('Either image_url or base64_image is required');
  }
  
  // Prepare message content
  const content: any[] = [];
  
  // Add image
  if (image_url) {
    content.push({
      type: 'image_url',
      image_url: {
        url: image_url,
        detail: 'high',
      },
    });
  } else if (base64_image) {
    content.push({
      type: 'image_url',
      image_url: {
        url: `data:image/jpeg;base64,${base64_image}`,
        detail: 'high',
      },
    });
  }
  
  // Add text prompt
  content.push({
    type: 'text',
    text: prompt,
  });
  
  // Create messages array
  const messages = [
    {
      role: 'user',
      content,
    },
  ];
  
  // Create options object
  const options = {
    model: model || 'grok-2-vision-latest',
    ...otherOptions
  };
  
  // Call Grok API
  const response = await this.grokClient.createImageUnderstanding(messages, options);
  
  return {
    content: [
      {
        type: 'text',
        text: response.choices[0].message.content,
      },
    ],
  };
}

src/index.ts:107-133 (registration)

Tool registration in the ListTools response, including name, description, and input schema definition.

{
  name: 'image_understanding',
  description: 'Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)',
  inputSchema: {
    type: 'object',
    properties: {
      image_url: {
        type: 'string',
        description: 'URL of the image to analyze'
      },
      base64_image: {
        type: 'string',
        description: 'Base64-encoded image data (without the data:image prefix)'
      },
      prompt: {
        type: 'string',
        description: 'Text prompt to accompany the image'
      },
      model: {
        type: 'string',
        description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)',
        default: 'grok-2-vision-latest'
      }
    },
    required: ['prompt']
  }
},

src/index.ts:110-131 (schema)

Input schema defining the expected parameters for the image_understanding tool: prompt (required), image_url or base64_image, model.

inputSchema: {
  type: 'object',
  properties: {
    image_url: {
      type: 'string',
      description: 'URL of the image to analyze'
    },
    base64_image: {
      type: 'string',
      description: 'Base64-encoded image data (without the data:image prefix)'
    },
    prompt: {
      type: 'string',
      description: 'Text prompt to accompany the image'
    },
    model: {
      type: 'string',
      description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)',
      default: 'grok-2-vision-latest'
    }
  },
  required: ['prompt']

src/grok-api-client.ts:86-101 (helper)

Helper method in GrokApiClient that sends the vision-enabled chat completion request to the xAI API endpoint /chat/completions.

async createImageUnderstanding(messages: any[], options: any = {}): Promise<any> {
  try {
    console.error('[API] Creating image understanding request...');
    
    const requestBody = {
      messages,
      model: options.model || 'grok-2-vision-latest',
      ...options
    };
    
    const response = await this.axiosInstance.post('/chat/completions', requestBody);
    return response.data;
  } catch (error) {
    console.error('[Error] Failed to create image understanding request:', error);
    throw error;
  }

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool analyzes images but does not describe what the analysis entails (e.g., object detection, captioning, OCR), potential limitations (e.g., image size restrictions, rate limits), or authentication needs. The note about Grok 3 adds confusion rather than transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief but includes a parenthetical note that is speculative and not directly relevant to the tool's current functionality, reducing efficiency. It is front-loaded with the core purpose, but the extra sentence detracts from conciseness without adding value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations and no output schema, the description is incomplete. It lacks details on what the analysis returns (e.g., text descriptions, structured data), error conditions, or behavioral traits like rate limits. The note about Grok 3 does not compensate for these gaps, leaving the agent with insufficient context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all four parameters thoroughly. The description adds no additional meaning about parameters beyond what the schema provides, such as explaining interactions between base64_image and image_url or elaborating on model options. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Analyze images using Grok AI vision capabilities' with a specific verb ('Analyze') and resource ('images'), distinguishing it from sibling tools like chat_completion and function_calling. However, it includes a parenthetical note about Grok 3 potentially supporting image creation, which slightly dilutes the clarity by introducing unrelated future capabilities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like chat_completion or function_calling. It mentions Grok 3 may support image creation, but this is speculative and not actionable for current usage decisions. No explicit when/when-not scenarios or prerequisites are included.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Bob-lance/grok-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server