image_understanding

Analyze images using Grok AI vision capabilities to extract information, answer questions, and understand visual content based on text prompts.

Instructions

Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)

Input Schema

TableJSON Schema

Name	Required	Description	Default
`base64_image`	No	Base64-encoded image data (without the data:image prefix)
`image_url`	No	URL of the image to analyze
`model`	No	Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)	grok-2-vision-latest
`prompt`	Yes	Text prompt to accompany the image

Implementation Reference

src/index.ts:260-327 (handler)
The primary handler function for the image_understanding tool. It validates inputs (prompt required, image_url or base64_image), constructs a user message with image content and text prompt, calls the Grok API client for image understanding, and formats the response.
private async handleImageUnderstanding(args: any) { console.error('[Tool] Handling image_understanding tool call'); const { image_url, base64_image, prompt, model, ...otherOptions } = args; // Validate inputs if (!prompt) { throw new Error('Prompt is required'); } if (!image_url && !base64_image) { throw new Error('Either image_url or base64_image is required'); } // Prepare message content const content: any[] = []; // Add image if (image_url) { content.push({ type: 'image_url', image_url: { url: image_url, detail: 'high', }, }); } else if (base64_image) { content.push({ type: 'image_url', image_url: { url: `data:image/jpeg;base64,${base64_image}`, detail: 'high', }, }); } // Add text prompt content.push({ type: 'text', text: prompt, }); // Create messages array const messages = [ { role: 'user', content, }, ]; // Create options object const options = { model: model || 'grok-2-vision-latest', ...otherOptions }; // Call Grok API const response = await this.grokClient.createImageUnderstanding(messages, options); return { content: [ { type: 'text', text: response.choices[0].message.content, }, ], }; }
src/index.ts:107-133 (registration)
Tool registration in the ListTools response, including name, description, and input schema definition.
{ name: 'image_understanding', description: 'Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)', inputSchema: { type: 'object', properties: { image_url: { type: 'string', description: 'URL of the image to analyze' }, base64_image: { type: 'string', description: 'Base64-encoded image data (without the data:image prefix)' }, prompt: { type: 'string', description: 'Text prompt to accompany the image' }, model: { type: 'string', description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)', default: 'grok-2-vision-latest' } }, required: ['prompt'] } },
src/index.ts:110-131 (schema)
Input schema defining the expected parameters for the image_understanding tool: prompt (required), image_url or base64_image, model.
inputSchema: { type: 'object', properties: { image_url: { type: 'string', description: 'URL of the image to analyze' }, base64_image: { type: 'string', description: 'Base64-encoded image data (without the data:image prefix)' }, prompt: { type: 'string', description: 'Text prompt to accompany the image' }, model: { type: 'string', description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)', default: 'grok-2-vision-latest' } }, required: ['prompt']
src/grok-api-client.ts:86-101 (helper)
Helper method in GrokApiClient that sends the vision-enabled chat completion request to the xAI API endpoint /chat/completions.
async createImageUnderstanding(messages: any[], options: any = {}): Promise<any> { try { console.error('[API] Creating image understanding request...'); const requestBody = { messages, model: options.model || 'grok-2-vision-latest', ...options }; const response = await this.axiosInstance.post('/chat/completions', requestBody); return response.data; } catch (error) { console.error('[Error] Failed to create image understanding request:', error); throw error; }

Grok MCP Plugin

image_understanding

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API