Skip to main content
Glama

image_understanding

Analyze images using Grok AI vision capabilities to extract information, answer questions, and understand visual content based on text prompts.

Instructions

Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)

Input Schema

NameRequiredDescriptionDefault
base64_imageNoBase64-encoded image data (without the data:image prefix)
image_urlNoURL of the image to analyze
modelNoGrok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)grok-2-vision-latest
promptYesText prompt to accompany the image

Input Schema (JSON Schema)

{ "properties": { "base64_image": { "description": "Base64-encoded image data (without the data:image prefix)", "type": "string" }, "image_url": { "description": "URL of the image to analyze", "type": "string" }, "model": { "default": "grok-2-vision-latest", "description": "Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)", "type": "string" }, "prompt": { "description": "Text prompt to accompany the image", "type": "string" } }, "required": [ "prompt" ], "type": "object" }

Implementation Reference

  • The primary handler function for the image_understanding tool. It validates inputs (prompt required, image_url or base64_image), constructs a user message with image content and text prompt, calls the Grok API client for image understanding, and formats the response.
    private async handleImageUnderstanding(args: any) { console.error('[Tool] Handling image_understanding tool call'); const { image_url, base64_image, prompt, model, ...otherOptions } = args; // Validate inputs if (!prompt) { throw new Error('Prompt is required'); } if (!image_url && !base64_image) { throw new Error('Either image_url or base64_image is required'); } // Prepare message content const content: any[] = []; // Add image if (image_url) { content.push({ type: 'image_url', image_url: { url: image_url, detail: 'high', }, }); } else if (base64_image) { content.push({ type: 'image_url', image_url: { url: `data:image/jpeg;base64,${base64_image}`, detail: 'high', }, }); } // Add text prompt content.push({ type: 'text', text: prompt, }); // Create messages array const messages = [ { role: 'user', content, }, ]; // Create options object const options = { model: model || 'grok-2-vision-latest', ...otherOptions }; // Call Grok API const response = await this.grokClient.createImageUnderstanding(messages, options); return { content: [ { type: 'text', text: response.choices[0].message.content, }, ], }; }
  • src/index.ts:107-133 (registration)
    Tool registration in the ListTools response, including name, description, and input schema definition.
    { name: 'image_understanding', description: 'Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)', inputSchema: { type: 'object', properties: { image_url: { type: 'string', description: 'URL of the image to analyze' }, base64_image: { type: 'string', description: 'Base64-encoded image data (without the data:image prefix)' }, prompt: { type: 'string', description: 'Text prompt to accompany the image' }, model: { type: 'string', description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)', default: 'grok-2-vision-latest' } }, required: ['prompt'] } },
  • Input schema defining the expected parameters for the image_understanding tool: prompt (required), image_url or base64_image, model.
    inputSchema: { type: 'object', properties: { image_url: { type: 'string', description: 'URL of the image to analyze' }, base64_image: { type: 'string', description: 'Base64-encoded image data (without the data:image prefix)' }, prompt: { type: 'string', description: 'Text prompt to accompany the image' }, model: { type: 'string', description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)', default: 'grok-2-vision-latest' } }, required: ['prompt']
  • Helper method in GrokApiClient that sends the vision-enabled chat completion request to the xAI API endpoint /chat/completions.
    async createImageUnderstanding(messages: any[], options: any = {}): Promise<any> { try { console.error('[API] Creating image understanding request...'); const requestBody = { messages, model: options.model || 'grok-2-vision-latest', ...options }; const response = await this.axiosInstance.post('/chat/completions', requestBody); return response.data; } catch (error) { console.error('[Error] Failed to create image understanding request:', error); throw error; }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Bob-lance/grok-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server