Skip to main content
Glama

image_understanding

Analyze images using Grok AI vision capabilities to extract information, answer questions, and understand visual content based on text prompts.

Instructions

Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
base64_imageNoBase64-encoded image data (without the data:image prefix)
image_urlNoURL of the image to analyze
modelNoGrok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)grok-2-vision-latest
promptYesText prompt to accompany the image

Implementation Reference

  • The primary handler function for the image_understanding tool. It validates inputs (prompt required, image_url or base64_image), constructs a user message with image content and text prompt, calls the Grok API client for image understanding, and formats the response.
    private async handleImageUnderstanding(args: any) {
      console.error('[Tool] Handling image_understanding tool call');
      
      const { image_url, base64_image, prompt, model, ...otherOptions } = args;
      
      // Validate inputs
      if (!prompt) {
        throw new Error('Prompt is required');
      }
      
      if (!image_url && !base64_image) {
        throw new Error('Either image_url or base64_image is required');
      }
      
      // Prepare message content
      const content: any[] = [];
      
      // Add image
      if (image_url) {
        content.push({
          type: 'image_url',
          image_url: {
            url: image_url,
            detail: 'high',
          },
        });
      } else if (base64_image) {
        content.push({
          type: 'image_url',
          image_url: {
            url: `data:image/jpeg;base64,${base64_image}`,
            detail: 'high',
          },
        });
      }
      
      // Add text prompt
      content.push({
        type: 'text',
        text: prompt,
      });
      
      // Create messages array
      const messages = [
        {
          role: 'user',
          content,
        },
      ];
      
      // Create options object
      const options = {
        model: model || 'grok-2-vision-latest',
        ...otherOptions
      };
      
      // Call Grok API
      const response = await this.grokClient.createImageUnderstanding(messages, options);
      
      return {
        content: [
          {
            type: 'text',
            text: response.choices[0].message.content,
          },
        ],
      };
    }
  • src/index.ts:107-133 (registration)
    Tool registration in the ListTools response, including name, description, and input schema definition.
    {
      name: 'image_understanding',
      description: 'Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)',
      inputSchema: {
        type: 'object',
        properties: {
          image_url: {
            type: 'string',
            description: 'URL of the image to analyze'
          },
          base64_image: {
            type: 'string',
            description: 'Base64-encoded image data (without the data:image prefix)'
          },
          prompt: {
            type: 'string',
            description: 'Text prompt to accompany the image'
          },
          model: {
            type: 'string',
            description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)',
            default: 'grok-2-vision-latest'
          }
        },
        required: ['prompt']
      }
    },
  • Input schema defining the expected parameters for the image_understanding tool: prompt (required), image_url or base64_image, model.
    inputSchema: {
      type: 'object',
      properties: {
        image_url: {
          type: 'string',
          description: 'URL of the image to analyze'
        },
        base64_image: {
          type: 'string',
          description: 'Base64-encoded image data (without the data:image prefix)'
        },
        prompt: {
          type: 'string',
          description: 'Text prompt to accompany the image'
        },
        model: {
          type: 'string',
          description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)',
          default: 'grok-2-vision-latest'
        }
      },
      required: ['prompt']
  • Helper method in GrokApiClient that sends the vision-enabled chat completion request to the xAI API endpoint /chat/completions.
    async createImageUnderstanding(messages: any[], options: any = {}): Promise<any> {
      try {
        console.error('[API] Creating image understanding request...');
        
        const requestBody = {
          messages,
          model: options.model || 'grok-2-vision-latest',
          ...options
        };
        
        const response = await this.axiosInstance.post('/chat/completions', requestBody);
        return response.data;
      } catch (error) {
        console.error('[Error] Failed to create image understanding request:', error);
        throw error;
      }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Bob-lance/grok-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server