Skip to main content
Glama

image_understanding

Analyze images using Grok AI vision capabilities to extract information, answer questions, and understand visual content based on text prompts.

Instructions

Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
base64_imageNoBase64-encoded image data (without the data:image prefix)
image_urlNoURL of the image to analyze
modelNoGrok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)grok-2-vision-latest
promptYesText prompt to accompany the image

Implementation Reference

  • The primary handler function for the image_understanding tool. It validates inputs (prompt required, image_url or base64_image), constructs a user message with image content and text prompt, calls the Grok API client for image understanding, and formats the response.
    private async handleImageUnderstanding(args: any) {
      console.error('[Tool] Handling image_understanding tool call');
      
      const { image_url, base64_image, prompt, model, ...otherOptions } = args;
      
      // Validate inputs
      if (!prompt) {
        throw new Error('Prompt is required');
      }
      
      if (!image_url && !base64_image) {
        throw new Error('Either image_url or base64_image is required');
      }
      
      // Prepare message content
      const content: any[] = [];
      
      // Add image
      if (image_url) {
        content.push({
          type: 'image_url',
          image_url: {
            url: image_url,
            detail: 'high',
          },
        });
      } else if (base64_image) {
        content.push({
          type: 'image_url',
          image_url: {
            url: `data:image/jpeg;base64,${base64_image}`,
            detail: 'high',
          },
        });
      }
      
      // Add text prompt
      content.push({
        type: 'text',
        text: prompt,
      });
      
      // Create messages array
      const messages = [
        {
          role: 'user',
          content,
        },
      ];
      
      // Create options object
      const options = {
        model: model || 'grok-2-vision-latest',
        ...otherOptions
      };
      
      // Call Grok API
      const response = await this.grokClient.createImageUnderstanding(messages, options);
      
      return {
        content: [
          {
            type: 'text',
            text: response.choices[0].message.content,
          },
        ],
      };
    }
  • src/index.ts:107-133 (registration)
    Tool registration in the ListTools response, including name, description, and input schema definition.
    {
      name: 'image_understanding',
      description: 'Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)',
      inputSchema: {
        type: 'object',
        properties: {
          image_url: {
            type: 'string',
            description: 'URL of the image to analyze'
          },
          base64_image: {
            type: 'string',
            description: 'Base64-encoded image data (without the data:image prefix)'
          },
          prompt: {
            type: 'string',
            description: 'Text prompt to accompany the image'
          },
          model: {
            type: 'string',
            description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)',
            default: 'grok-2-vision-latest'
          }
        },
        required: ['prompt']
      }
    },
  • Input schema defining the expected parameters for the image_understanding tool: prompt (required), image_url or base64_image, model.
    inputSchema: {
      type: 'object',
      properties: {
        image_url: {
          type: 'string',
          description: 'URL of the image to analyze'
        },
        base64_image: {
          type: 'string',
          description: 'Base64-encoded image data (without the data:image prefix)'
        },
        prompt: {
          type: 'string',
          description: 'Text prompt to accompany the image'
        },
        model: {
          type: 'string',
          description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)',
          default: 'grok-2-vision-latest'
        }
      },
      required: ['prompt']
  • Helper method in GrokApiClient that sends the vision-enabled chat completion request to the xAI API endpoint /chat/completions.
    async createImageUnderstanding(messages: any[], options: any = {}): Promise<any> {
      try {
        console.error('[API] Creating image understanding request...');
        
        const requestBody = {
          messages,
          model: options.model || 'grok-2-vision-latest',
          ...options
        };
        
        const response = await this.axiosInstance.post('/chat/completions', requestBody);
        return response.data;
      } catch (error) {
        console.error('[Error] Failed to create image understanding request:', error);
        throw error;
      }

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Bob-lance/grok-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server