chat_with_vision
Analyze images and answer questions about their content using AI vision models. Upload local files or provide URLs to identify objects, read text, or understand visual information.
Instructions
Analyzes images and answers questions about them using Grok's vision models.
This is your go-to tool when you need to understand what's in an image. You can
provide local image files, URLs, or both. Ask questions like "What's in this image?"
or "Read the text from this screenshot." Supports JPG, JPEG, and PNG formats.
Args:
prompt: Your question or instruction about the image(s)
image_paths: List of local file paths to images (optional)
image_urls: List of image URLs from the web (optional)
detail: How closely to analyze ("auto", "low", or "high")
model: Which vision-capable model to use (default is grok-4-0709)
Returns the AI's response as a string describing or analyzing the images.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| detail | No | auto | |
| image_paths | No | ||
| image_urls | No | ||
| model | No | grok-4-0709 | |
| prompt | Yes |
Input Schema (JSON Schema)
{
"properties": {
"detail": {
"default": "auto",
"title": "Detail",
"type": "string"
},
"image_paths": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"title": "Image Paths"
},
"image_urls": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"title": "Image Urls"
},
"model": {
"default": "grok-4-0709",
"title": "Model",
"type": "string"
},
"prompt": {
"title": "Prompt",
"type": "string"
}
},
"required": [
"prompt"
],
"type": "object"
}