understand_image

Analyze image content with quick summaries or structured 7-dimension analysis. Ask specific questions to extract text, objects, charts, and evidence from images.

Instructions

分析图片内容 — 借鉴 OpenHanako Vision Bridge 设计

两种模式：

quick: 简洁描述（~300词），适合快速了解图片内容
detailed: 结构化 7 维度分析（image_overview / visible_text / objects_and_layout / charts_or_data / answer_to_request / evidence / uncertainty），参考 OpenHanako vision-bridge.js

支持缓存：同一图片+相同prompt不重复调用API（LRU + 磁盘持久化）

Args: image_url: 图片 URL（HTTP/HTTPS）或本地文件路径，支持 JPEG/PNG/GIF/WebP (≤20MB) prompt: 对图片的具体问题，如 "这张图片里有什么错误提示？" mode: "quick" 或 "detailed"，默认 detailed use_cache: 是否使用缓存，默认 true

Input Schema

TableJSON Schema

Name	Required	Default
`image_url`	Yes
`prompt`	No
`mode`	No	detailed
`use_cache`	No

Implementation Reference

src/minimax_mcp/server.py:80-104 (registration)

MCP tool registration via @mcp.tool() decorator — registers understand_image as a FastMCP tool with parameters image_url, prompt, mode, use_cache.

@mcp.tool()
def understand_image(
    image_url: str,
    prompt: str = "",
    mode: str = "detailed",
    use_cache: bool = True,
) -> dict:
    """分析图片内容 — 借鉴 OpenHanako Vision Bridge 设计

    两种模式：
    - quick:    简洁描述（~300词），适合快速了解图片内容
    - detailed: 结构化 7 维度分析（image_overview / visible_text /
                objects_and_layout / charts_or_data / answer_to_request /
                evidence / uncertainty），参考 OpenHanako vision-bridge.js

    支持缓存：同一图片+相同prompt不重复调用API（LRU + 磁盘持久化）

    Args:
        image_url: 图片 URL（HTTP/HTTPS）或本地文件路径，支持 JPEG/PNG/GIF/WebP (≤20MB)
        prompt:    对图片的具体问题，如 "这张图片里有什么错误提示？"
        mode:      "quick" 或 "detailed"，默认 detailed
        use_cache: 是否使用缓存，默认 true
    """
    from minimax_mcp.tools.image_understand import understand_image as _run
    return _run(get_client(), image_url, prompt, mode, use_cache)

src/minimax_mcp/tools/image_understand.py:39-88 (handler)

Core handler function — normalizes image URL (local file → base64 data URL), validates mode, delegates to analyze_image(), and attaches cache stats.

def understand_image(
    client: MiniMaxClient,
    image_url: str,
    prompt: str = "",
    mode: str = "",
    use_cache: bool = True,
) -> dict:
    """分析图片内容，支持本地文件路径

    借鉴 OpenHanako 的设计，提供两种分析模式：
    - quick:  简洁描述（~300 词），对应 _analyzeImageAsNote()
    - detailed: 结构化 7 维度分析，对应 _analyzeImageWithPrimitives()

    Args:
        client: MiniMax API 客户端
        image_url: 图片 URL（HTTP/HTTPS）或本地文件路径（自动转 base64）
        prompt: 用户对图片的具体问题（可选），如 "这张截图有什么错误？"
        mode: "quick" 或 "detailed"，默认 detailed
        use_cache: 是否使用缓存，默认 true

    Returns:
        {
            success: bool,
            mode: str,
            analysis: str,
            cached: bool,
            image_url: str,
            cache_stats: {...},
        }
    """
    if not image_url:
        return {"success": False, "error": "image_url 不能为空"}

    if mode not in ("quick", "detailed"):
        mode = VISION_DEFAULT_MODE

    # 本地文件自动转 base64 data URL
    normalized_url = _normalize_image_url(image_url)
    print(f"[Vision] Source: {image_url[:80]}", file=sys.stderr)

    result = analyze_image(
        client=client,
        image_url=normalized_url,
        prompt=prompt.strip(),
        mode=mode,
        use_cache=use_cache,
    )

    result["cache_stats"] = get_cache_stats()
    return result

src/minimax_mcp/tools/image_understand.py:25-36 (helper)

Helper that converts local file paths to base64 data URLs for API consumption (supports jpg, png, gif, webp).

def _normalize_image_url(raw: str) -> str:
    """将本地文件路径转为 base64 data URL，HTTP/HTTPS URL 原样返回"""
    if raw.startswith("http://") or raw.startswith("https://") or raw.startswith("data:"):
        return raw
    path = Path(raw).expanduser()
    if path.is_file():
        ext = path.suffix.lower()
        mime_map = {".jpg": "jpeg", ".jpeg": "jpeg", ".png": "png", ".gif": "gif", ".webp": "webp"}
        mime = mime_map.get(ext, "jpeg")
        b64 = base64.b64encode(path.read_bytes()).decode()
        return f"data:image/{mime};base64,{b64}"
    return raw

src/minimax_mcp/vision/analyzer.py:33-121 (helper)

Core analysis orchestrator — checks VisionCache, builds prompts (quick/detailed), calls MiniMax API, caches results, and returns formatted response.

def analyze_image(
    client: MiniMaxClient,
    image_url: str,
    prompt: str = "",
    mode: str = "detailed",
    use_cache: bool = True,
) -> dict:
    """分析图片内容

    借鉴 OpenHanako VisionBridge.prepare() + _analyzeImage() 流程：
    1. 检查缓存
    2. 构建 prompt（quick/detailed）
    3. 调用 MiniMax understand_image API
    4. 缓存结果
    5. 格式化返回

    Args:
        client: MiniMax API 客户端
        image_url: 图片 URL 或本地路径
        prompt: 用户对图片的具体问题（可选）
        mode: "quick" 或 "detailed"
        use_cache: 是否使用缓存

    Returns:
        {
            success: bool,
            mode: str,
            analysis: str,
            cached: bool,
            image_url: str,
        }
    """
    cache = _get_cache()

    # Step 1: 查缓存
    if use_cache:
        cached = cache.get(image_url, prompt, mode)
        if cached:
            print(f"[Vision] Cache hit for {image_url[:60]}", file=sys.stderr)
            return {
                "success": True,
                "mode": mode,
                "analysis": cached,
                "cached": True,
                "image_url": image_url,
            }

    # Step 2: 构建 prompt
    if mode == "detailed":
        api_prompt = build_detailed_prompt(prompt)
    else:
        api_prompt = build_quick_prompt(prompt)

    print(f"[Vision] Analyzing image (mode={mode}): {image_url[:80]}", file=sys.stderr)

    # Step 3: 调用 API
    result = client.understand_image(prompt=api_prompt, image_url=image_url)

    if not result.get("success"):
        return {
            "success": False,
            "mode": mode,
            "analysis": "",
            "error": result.get("error", "Unknown error"),
            "detail": result.get("detail", ""),
            "cached": False,
            "image_url": image_url,
        }

    # Step 4: 提取分析文本
    # 实际 API 响应: {"content": "...", "base_resp": {...}, "success": true}
    raw_analysis = (
        result.get("content")
        or result.get("analysis")
        or result.get("data", {}).get("reply")
        or str(result)
    )

    # Step 5: 缓存结果
    if use_cache and raw_analysis:
        cache.put(image_url, prompt, mode, raw_analysis)

    return {
        "success": True,
        "mode": mode,
        "analysis": raw_analysis,
        "cached": False,
        "image_url": image_url,
    }

src/minimax_mcp/vision/prompts.py:20-77 (schema)

Detailed mode prompt template defining the 7-dimension structured analysis schema (image_overview, visible_text, objects_and_layout, charts_or_data, answer_to_request, evidence, uncertainty).

# ── Detailed Mode Prompt ───────────────────────────────────
# 借鉴 OpenHanako Vision Bridge 的分析维度：
# image_overview / visible_text / objects_and_layout /
# charts_or_data / user_request_answer / evidence / uncertainty

DETAILED_PROMPT_TEMPLATE = """Analyze this image thoroughly and return a structured response in the following format.

## image_overview
A concise description of what this image shows overall. Include the type of image (screenshot, photo, chart, document, UI, etc.), the context/setting, and the main subject.

## visible_text
List all readable text visible in the image. Include labels, titles, buttons, menu items, error messages, code snippets, document text, etc. Be as complete as possible with exact wording when legible.

## objects_and_layout
Describe the spatial layout: what objects/elements appear where. Note their relative positions (top-left, center, bottom-right, etc.), approximate sizes, and relationships between elements. For UI screenshots, describe the window structure, panels, toolbars, content areas.

## charts_or_data
If the image contains charts, graphs, tables, or structured data, extract and describe the data. Include axis labels, data series, numerical values, trends, and table headers/rows where visible.

## answer_to_request
{request_section}

## evidence
Cite specific visual evidence from the image that supports your analysis. Reference exact positions, colors, text, or patterns that back up your conclusions.

## uncertainty
Note anything that is unclear, ambiguous, partially hidden, cropped, or that you are uncertain about. Be honest about the limits of what you can see.

{user_request}"""


def build_detailed_prompt(user_request: str = "") -> str:
    """构建 detailed 模式 prompt

    如果用户有具体问题，会生成 answer_to_request 段；
    否则用通用描述段替代。

    借鉴 OpenHanako: user_request 字段驱动定向分析
    """
    if user_request.strip():
        request_section = f"Answer the following request using the image:\n{user_request}"
        user_request_line = f"User's request about this image: {user_request}"
    else:
        request_section = "Describe the main purpose or takeaway of this image."
        user_request_line = ""

    return DETAILED_PROMPT_TEMPLATE.format(
        request_section=request_section,
        user_request=user_request_line,
    )


def build_quick_prompt(user_request: str = "") -> str:
    """构建 quick 模式 prompt"""
    if user_request.strip():
        return f"{QUICK_PROMPT}\n\nSpecifically, the user wants to know: {user_request}"
    return QUICK_PROMPT

MiniMax MCP Server

understand_image

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API