chat_with_vision
Analyze images and answer questions about their content using AI vision models. Upload local files or provide URLs to identify objects, read text, and understand visual information in JPG, JPEG, or PNG formats.
Instructions
Analyzes images and answers questions about them using Grok's vision models.
This is your go-to tool when you need to understand what's in an image. You can
provide local image files, URLs, or both. Ask questions like "What's in this image?"
or "Read the text from this screenshot." Supports JPG, JPEG, and PNG formats.
Args:
prompt: Your question or instruction about the image(s)
image_paths: List of local file paths to images (optional)
image_urls: List of image URLs from the web (optional)
detail: How closely to analyze ("auto", "low", or "high")
model: Which vision-capable model to use (default is grok-4-0709)
Returns the AI's response as a string describing or analyzing the images.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | ||
| image_paths | No | ||
| image_urls | No | ||
| detail | No | auto | |
| model | No | grok-4-0709 |
Input Schema (JSON Schema)
{
"properties": {
"detail": {
"default": "auto",
"title": "Detail",
"type": "string"
},
"image_paths": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"title": "Image Paths"
},
"image_urls": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"title": "Image Urls"
},
"model": {
"default": "grok-4-0709",
"title": "Model",
"type": "string"
},
"prompt": {
"title": "Prompt",
"type": "string"
}
},
"required": [
"prompt"
],
"type": "object"
}
Implementation Reference
- src/server.py:106-173 (handler)The primary handler for the 'chat_with_vision' tool. Decorated with @mcp.tool() for registration. Handles input parameters, encodes local images, constructs multimodal messages, calls xAI chat completions API with vision model, and returns the response.@mcp.tool() async def chat_with_vision( prompt: str, image_paths: Optional[List[str]] = None, image_urls: Optional[List[str]] = None, detail: str = "auto", model: str = "grok-4-0709" ) -> str: """ Analyzes images and answers questions about them using Grok's vision models. This is your go-to tool when you need to understand what's in an image. You can provide local image files, URLs, or both. Ask questions like "What's in this image?" or "Read the text from this screenshot." Supports JPG, JPEG, and PNG formats. Args: prompt: Your question or instruction about the image(s) image_paths: List of local file paths to images (optional) image_urls: List of image URLs from the web (optional) detail: How closely to analyze ("auto", "low", or "high") model: Which vision-capable model to use (default is grok-4-0709) Returns the AI's response as a string describing or analyzing the images. """ content_items = [] if image_paths: for path in image_paths: ext = Path(path).suffix.lower().replace('.', '') if ext not in ["jpg", "jpeg", "png"]: raise ValueError(f"Unsupported image type: {ext}") base64_img = encode_image_to_base64(path) content_items.append({ "type": "image_url", "image_url": { "url": f"data:image/{ext};base64,{base64_img}", "detail": detail } }) if image_urls: for url in image_urls: content_items.append({ "type": "image_url", "image_url": { "url": url, "detail": detail } }) if prompt: content_items.append({ "type": "text", "text": prompt }) messages = [ { "role": "user", "content": content_items } ] request_data = { "model": model, "messages": messages } client = create_client() response = await client.post("/chat/completions", json=request_data) response.raise_for_status() data = response.json() await client.aclose() return data["choices"][0]["message"]["content"]
- src/utils.py:32-45 (helper)Utility function to encode local image files to base64 strings, specifically used in chat_with_vision to prepare local image_paths for the API request.def encode_image_to_base64(image_path: str) -> str: path = Path(image_path) if not path.exists(): raise FileNotFoundError(f"Image file not found: {image_path}") valid_extensions = {'.jpg', '.jpeg', '.png', '.gif', '.webp'} if path.suffix.lower() not in valid_extensions: raise ValueError( f"Unsupported image type: {path.suffix}. " f"Supported types: {', '.join(valid_extensions)}" ) with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8")