image_understanding
Analyze images using Grok AI vision capabilities to extract information, answer questions, and understand visual content based on text prompts.
Instructions
Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| base64_image | No | Base64-encoded image data (without the data:image prefix) | |
| image_url | No | URL of the image to analyze | |
| model | No | Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants) | grok-2-vision-latest |
| prompt | Yes | Text prompt to accompany the image |
Input Schema (JSON Schema)
{
"properties": {
"base64_image": {
"description": "Base64-encoded image data (without the data:image prefix)",
"type": "string"
},
"image_url": {
"description": "URL of the image to analyze",
"type": "string"
},
"model": {
"default": "grok-2-vision-latest",
"description": "Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)",
"type": "string"
},
"prompt": {
"description": "Text prompt to accompany the image",
"type": "string"
}
},
"required": [
"prompt"
],
"type": "object"
}
Implementation Reference
- src/index.ts:260-327 (handler)The primary handler function for the image_understanding tool. It validates inputs (prompt required, image_url or base64_image), constructs a user message with image content and text prompt, calls the Grok API client for image understanding, and formats the response.private async handleImageUnderstanding(args: any) { console.error('[Tool] Handling image_understanding tool call'); const { image_url, base64_image, prompt, model, ...otherOptions } = args; // Validate inputs if (!prompt) { throw new Error('Prompt is required'); } if (!image_url && !base64_image) { throw new Error('Either image_url or base64_image is required'); } // Prepare message content const content: any[] = []; // Add image if (image_url) { content.push({ type: 'image_url', image_url: { url: image_url, detail: 'high', }, }); } else if (base64_image) { content.push({ type: 'image_url', image_url: { url: `data:image/jpeg;base64,${base64_image}`, detail: 'high', }, }); } // Add text prompt content.push({ type: 'text', text: prompt, }); // Create messages array const messages = [ { role: 'user', content, }, ]; // Create options object const options = { model: model || 'grok-2-vision-latest', ...otherOptions }; // Call Grok API const response = await this.grokClient.createImageUnderstanding(messages, options); return { content: [ { type: 'text', text: response.choices[0].message.content, }, ], }; }
- src/index.ts:107-133 (registration)Tool registration in the ListTools response, including name, description, and input schema definition.{ name: 'image_understanding', description: 'Analyze images using Grok AI vision capabilities (Note: Grok 3 may support image creation)', inputSchema: { type: 'object', properties: { image_url: { type: 'string', description: 'URL of the image to analyze' }, base64_image: { type: 'string', description: 'Base64-encoded image data (without the data:image prefix)' }, prompt: { type: 'string', description: 'Text prompt to accompany the image' }, model: { type: 'string', description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)', default: 'grok-2-vision-latest' } }, required: ['prompt'] } },
- src/index.ts:110-131 (schema)Input schema defining the expected parameters for the image_understanding tool: prompt (required), image_url or base64_image, model.inputSchema: { type: 'object', properties: { image_url: { type: 'string', description: 'URL of the image to analyze' }, base64_image: { type: 'string', description: 'Base64-encoded image data (without the data:image prefix)' }, prompt: { type: 'string', description: 'Text prompt to accompany the image' }, model: { type: 'string', description: 'Grok vision model to use (e.g., grok-2-vision-latest, potentially grok-3 variants)', default: 'grok-2-vision-latest' } }, required: ['prompt']
- src/grok-api-client.ts:86-101 (helper)Helper method in GrokApiClient that sends the vision-enabled chat completion request to the xAI API endpoint /chat/completions.async createImageUnderstanding(messages: any[], options: any = {}): Promise<any> { try { console.error('[API] Creating image understanding request...'); const requestBody = { messages, model: options.model || 'grok-2-vision-latest', ...options }; const response = await this.axiosInstance.post('/chat/completions', requestBody); return response.data; } catch (error) { console.error('[Error] Failed to create image understanding request:', error); throw error; }