describe_image
Generate detailed descriptions of images using base64-encoded data. Ideal for uploaded images in chat conversations, providing accurate analysis via advanced vision APIs.
Instructions
Describe an image from base64-encoded data. Use for images directly uploaded to chat.
Best for: Images uploaded to the current conversation where no public URL exists.
Not for: Local files on your computer or images with public URLs.
Args:
image: Base64-encoded image data
prompt: Optional prompt to guide the description
Returns:
str: Detailed description of the image
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| image | Yes | ||
| prompt | No | Please describe this image in detail. |
Implementation Reference
- Main handler for the MCP 'describe_image' tool. Validates input, calls process_image_with_ocr, sanitizes and returns the description.@mcp.tool() async def describe_image( image: str, prompt: str = "Please describe this image in detail." ) -> str: """Describe the contents of an image using vision AI. Args: image: Image data and MIME type prompt: Optional prompt to use for the description. Returns: str: Detailed description of the image """ try: logger.info(f"Processing image description request with prompt: {prompt}") logger.debug(f"Image data length: {len(image)}") # Validate image data if not validate_base64_image(image): raise ValueError("Invalid base64 image data") result = await process_image_with_ocr(image, prompt) if not result: raise ValueError("Received empty response from processing") logger.info("Successfully processed image") return sanitize_output(result) except ValueError as e: logger.error(f"Input error: {str(e)}") raise except Exception as e: logger.error(f"Error describing image: {str(e)}", exc_info=True) raise
- Core helper function that invokes the vision client (Anthropic or OpenAI) to describe the image and optionally appends OCR text.async def process_image_with_ocr(image_data: str, prompt: str) -> str: """Process image with both vision AI and OCR. Args: image_data: Base64 encoded image data prompt: Prompt for vision AI Returns: str: Combined description from vision AI and OCR """ # Get vision AI description client = get_vision_client() # Handle both sync (Anthropic) and async (OpenAI) clients if isinstance(client, OpenAIVision): description = await client.describe_image(image_data, prompt) else: description = client.describe_image(image_data, prompt) # Check for empty or default response if not description or description == "No description available.": raise ValueError("Vision API returned empty or default response") # Handle OCR if enabled ocr_enabled = os.getenv("ENABLE_OCR", "false").lower() == "true" if ocr_enabled: try: # Convert base64 to PIL Image image_bytes = base64.b64decode(image_data) image = Image.open(io.BytesIO(image_bytes)) # Extract text with OCR required flag if ocr_text := extract_text_from_image(image, ocr_required=True): description += ( f"\n\nAdditionally, this is the output of tesseract-ocr: {ocr_text}" ) except OCRError as e: # Propagate OCR errors when OCR is enabled logger.error(f"OCR processing failed: {str(e)}") raise ValueError(f"OCR Error: {str(e)}") except Exception as e: logger.error(f"Unexpected error during OCR: {str(e)}") raise return sanitize_output(description)
- src/image_recognition_server/server.py:127-127 (registration)The @mcp.tool() decorator registers the describe_image function as an MCP tool.@mcp.tool()